Monday, January 9, 2017

Bucket Sort in Java with Example

In recent years, one of the question I have increasingly seen in programming job interviews is about constant time sorting algorithms e.g.  do you know any O(n) sorting algorithm? When I first encountered this question, I had no idea whether we can sort in constant time because even some of the fastest sorting algorithms e.g. QuickSort or MergeSort takes O(N log N) time for sorting. After some research, I come to know that there are some constant time algorithms e.g. bucket sort, counting sort and radix sort, which can sort an array in O(n) time.  The idea of bucket sort is quite simple, you distributes the elements of an array into a number of buckets and then sort those individual bucket by a differnet sorting algorithm or by recursively applying the bucket sorting algorithm.

You might have remembered, how your shopkeepers used to prepare bundles of notes. They usually a bucket load of cash with different denominations e.g. 5, 10, 20, 50, or 100. They just put all 5 Rs notes into one place, all Rs 10 notes into another place and so on. This way all notes are sorted in O(n) time because you don't need to sort individual buckets, they are all same notes there.

Anyway, for bucketsort to work at its super fast speed, there are multiple prerequisites. First, the hash function that is used to partition the elements must to be very good and must produce ordered hash: if i < j then hash(i) < hash(j). If this is not the cash then you cannot concatenate individual buckets in O(n) time. Second, the elements to be sorted must be uniformly distributed.


If you keep these prerequisites aside, bucket sort is actually very good considering that counting sort is reasonably equal to its upper bound and counting sort is also super fast. The particular distinction for bucket sort is that it uses a hash function to partition the keys of the input array, so that multiple keys may hash to the same bucket. Hence each bucket must effectively be a growable list; similar to radix sort.

Many programmers get confused between Counting Sort and Bucket sort as they work quite similar, but the most important difference between them is that Bucket sort uses a hash function to distribute keys while Counting sort creates a bucket for each key. I think there are perhaps greater similarities between radix sort and bucket sort than there are between counting sort and bucket sort. You can read Introduction to Algorithm by Thomas H. Cormen to learn more about the difference between these two constant time sorting algorithms.

In our example, I have used JDK's Collections.sort() method to sort each bucket. This is to indicate that the bucket sort algorithm does not specify which sorting algorithm should be used to sort individual buckets. A programmer may choose to recursively use bucket sort on each bucket until the input array is sorted, similar to radix sort. Anyway, whichever sorting method you use to sort individual buckets, the time complexity of bucket sort will still tend towards O(n).




Bucket Sort - FAQ

Now, let's see some of the frequently asked questions about Bucket sort in programming job interviews.

Is the bucket sort a stable algorithm?
Bucket sort is not a stable sorting algorithm. If you remember,  A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, or Bubble Sort, etc. In the case of Bucket sort, input sorts themselves by moving into a bucket and their order is not guaranteed to be retained.


Is the bucket sort in-place algorithm?
Bucket sort is also not an in-place sorting algorithm because you create buckets which can be array, linked list, or hashtable and they take additional spaces. In the worst of the good cases (sequential values, but no repetition) the additional space needed is as big as the original array.


What is time complexity of Bucket sort?
The time complexity of bucket sort in the O(n) in the best and average case and O(n^2) in the worst case. The creation of bucket will take O(1) time and assume moving elements into bucket will take O(1) time e.g. in the case of hashtable and linked list. The main step is to sort elements on individual elements. This step also takes O(n) time on average if all numbers are uniformly distributed (please refer CLRS book for more details). The last step to concatenate all bucket will also take O(n) time as there will be n items in all buckets.


What is the time complexity of Bucket sort in the worst case?
The time complexity of bucket sort is O(n^2) in worst case i.e. when all elements at allocated to the same bucket. Since individual buckets are sorted using another algorithm, if only a single bucket needs to be sorted, bucket sort will take on the complexity of the inner sorting algorithm. This is why bucket sort is only useful when the input is uniformly distributed in a range. This way they will not end up in the same bucket.


What is the space complexity of Bucket sort?
The space complexity of the bucket sort algorithm is O(n) because even in the worst of the good cases (sequential values, but no repetition) the additional space needed is as big as the original array. Please refer Data Structures and Algorithms Made Easy in Java for more details on complexity analysis of bucket sort in best, average, and worst case.

 Bucket Sort in Java with Example



Problem Statement:
Given an unordered list of integers, rearrange them in the natural order.

Sample Input: {8,5,3,1,9,6,0,7,4,2,5}
Sample Output: {0,1,2,3,4,5,6,7,8,9,5}



Java Program to implement Bucket Sort in Java

Here is our complete Java program to implement bucket sort in Java. This program sorts an integer array using bucket sort algorithm.

How to implement Bucket Sort in Java



Program to implement Bucket sort in Java
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

/*
 * Java Program sort an integer array using radix sort algorithm.
 * input: [80, 50, 30, 10, 90, 60, 0, 70, 40, 20, 50]
 * output: [0, 10, 20, 30, 40, 50, 50, 60, 70, 80, 90]
 * 
 * Time Complexity of Solution:
 *     Best Case O(n); Average Case O(n); Worst Case O(n^2).
 * 
 */

public class BuckeSort {

  public static void main(String[] args) {

    System.out.println("Bucket sort in Java");
    int[] input = { 80, 50, 30, 10, 90, 60, 0, 70, 40, 20, 50 };

    System.out.println("integer array before sorting");
    System.out.println(Arrays.toString(input));

    // sorting array using radix Sort Algorithm
    bucketSort(input);

    System.out.println("integer array after sorting using bucket sort algorithm");
    System.out.println(Arrays.toString(input));

  }

  /**
   * 
   * @param input
   */
  public static void bucketSort(int[] input) {
    // get hash codes
    final int[] code = hash(input);
    
    // create and initialize buckets to ArrayList: O(n)
    List[] buckets = new List[code[1]];
    for (int i = 0; i < code[1]; i++) {
      buckets[i] = new ArrayList();
    }
    
    // distribute data into buckets: O(n)
    for (int i : input) {
      buckets[hash(i, code)].add(i);
    }
    
    // sort each bucket O(n)
    for (List bucket : buckets) {
      Collections.sort(bucket);
    }
    
    int ndx = 0;
    // merge the buckets: O(n)
    for (int b = 0; b < buckets.length; b++) {
      for (int v : buckets[b]) {
        input[ndx++] = v;
      }
    }
  }

  /**
   * 
   * @param input
   * @return an array containing hash of input
   */
  private static int[] hash(int[] input) {
    int m = input[0];
    for (int i = 1; i < input.length; i++) {
      if (m < input[i]) {
        m = input[i];
      }
    }
    return new int[] { m, (int) Math.sqrt(input.length) };
  }

  /**
   * 
   * @param i
   * @param code
   * @return
   */
  private static int hash(int i, int[] code) {
    return (int) ((double) i / code[0] * (code[1] - 1));
  }

}


Output
Bucket sort in Java
integer array before sorting
[80, 50, 30, 10, 90, 60, 0, 70, 40, 20, 50]
integer array after sorting using bucket sort algorithm
[0, 10, 20, 30, 40, 50, 50, 60, 70, 80, 90]




Bucket Sort - Important points to remember

Here are some important points about bucket sort you should remember, this is useful from both interview and understanding point of view and Interviewer expects that you know about them when you say that you know bucket sort.

1) Bucket Sort is also known as bin sort because you create bins or buckets to sort inputs.

2) Bucket sort is only useful when the input is uniformly distributed over a range e.g. coins, numbers 1 to 100 etc.

3) You can use a linked list or array as a bucket. The choice of data structure will affect the insertion time e.g. if you use linked list then adding on the head could take O(1) time. You can also use hash tables as buckets.

4) The bucket sort is one of the rare O(n) sorting algorithm i.e. time complexity of Bucket sort is the liner in best and average case and not NLogN like Quicksort or Mergesort.

5) Bucket sort is not a stable sorting algorithm because in a stable algorithm if two input is same they retain their place in sorted order and in the bucket it depends upon how you sort the individual bucket. Though, bucket sort can be made stable, known as radix sort, which we'll see in future articles.

6) You can sort the elements inside individual buckets either by recursively calling the bucket sort or using a separate sorting algorithm e.g. insertion sort, bubble sort, or quick sort.

7) Is bucket sort an in-place sorting algorithm? No, it's not an in-place sorting algorithm. The whole idea is that input sorts themselves as they are moved to the buckets. In the worst of the good cases (sequential values, but no repetition) the additional space needed is as big as the original array.

8) The worst case complexity of bucket sort, when all elements will end up in the same bucket is O(n^2) because then it has to be sorted by a different sorting algorithm.

9) The space complexity of bucket sort is O(n) because even to sort sequential values, without repetition, you need an array as big as the original array.


That's all about bucket sort in Java. It's one of the interesting sorting algorithm which gives you O(n) performance in best and average case. Bucket sort should only be used when input is uniformly distributed over a range e.g. numbers up to 1 to 1000. You should also remember that Bucket sort is not a stable algorithm hence it's not guarnateed that equal keys on input array will retain their places. It is also not an inplace algorith, as it will require additional space as big as original array in worst case. You can also refer classic CLRS book for more details on bucket sort and other O(n) sorting algorithms e.g. Counting sort, Radix sort etc.

Further Reading
Algorithms and Data Structures - Part 1
Algorithms and Data Structures - Part 2
Introduction to Algorithms

Related Data Structure and Algorithm Interview Questions from Javarevisited Blog
  • Top 15 Data Structure and Algorithm Interview Questions (see here)
  • Top 20 String coding interview questions (see here)
  • 133 core Java interview questions of last 5 years (see here)
  • Top 30 Array Coding Interview Questions with Answers (see here)
  • Top 30 linked list coding interview questions (see here)
  • Top 50 Java Programs from Coding Interviews (see here)
  • Top 5 books on Programming/Coding Interviews (list)
Thanks for reading this article so far. If you like this article then please share with your friends and colleagues. If you have any question or doubt then please let us know and I'll try to find an answer for you.

4 comments :

financial engineer said...

nice.
for compilation,
change: List[] buckets
to: List<Integer>[] buckets

Javin Paul said...

@financial engineer, yes, you are correct, the Integer along with angle bracket is lost in formatting.

Shaffy Singla said...

or you can change this peace of code
for (int v : buckets[b]) {
input[ndx++] = v;

}
to
for(Object v : buckets[b])
input[ndx++] = (Integer) v;

as there is a type mismatch error

gurnoor singh said...

Thank you for the detailed explanation.
Can you please explain the hash functions? (or point me in the right direction to read the basics). More specifically:
- how do we know the function "hash(int i, int[] code)" satisfies the condition "hash(i) > hash(j) if i>j"
- In the first hash function, how do we know that the number of buckets we need is sqrt(input.length)?

Post a Comment