This article should, hopefully serve as an introduction to simple sorting algorithms. Sorting has always seemed to be a fascination to many people, it's pretty handy too. The scope of this article is sorting simple arrays of characters or numbers. In a future article I plan to look at the sorting of more complex stuff, such as strings and structures (gripping, huh?)
Now, C heads among you might be screaming, "bah, we got qsort() in our standard C library", which is true enough, I guess. But look at the function prototype for qsort(), it's written in such a way as to handle a wide variety of data-types, which makes it run more slowly than a dedicated algorithm. Besides this, it isn't always possible to apply a generalised function like qsort() to every given situation. Annnyhow, on with the article...
What is sorting?
By definition, Sorting is the process of arranging some set of similar data into an increasing or decreasing order. Most commonly, when sorting a bunch of information we need to decide upon a sort key. The sort key is used to decide where an element should be placed in the list (array), and thus is used in comparisons. In this article, our sort key is simply an element in the array, however, when sorting structures it could be any structure member. More on this later.
The three general methods used to sort information are:
Exchange : Say we have a ten coins, each of different monetry values. To exchange sort them we lay them down in a line on a table, and exchange any coins that are out of order until we have a line of coins in ascending (or descending) order.
Insertion : Again we have the same ten coins. Hold them all in your hand. Take a coin and place it on the table. Taking each coin in turn, insert it into the correct position in the line. When you have no coins left, they are in order.
Selection : Lay out those fantastic coins on the table, and select the coin of lowest worth. Place this at first position in a new line. Continue by selecting the lowest remaining coin and placing it in the next position in the line. Et Voila!
Now we know a little about what sorting is, and the different methods of sorting 'stuff', Let's look at one of the most well known, but most pathetic sorting algorithms, The Bubble Sort.
The Bubble Sort
The bubble sort is an exchange sort. It involves the comparison of adjacent elements, swapping them if they are out of order. Let's look at some (trippy, pink) source:
void Bubble( char *list, unsigned int num_elements )
{
In this example, num_elements is the number of elements in the array, list is a pointer to an array of characters to be sorted. The outer loop i ensures that in the worst case, by the time the loop terminates, the array is sorted. The inner loop j performs the actual sort.
This version of the bubble sort, sorts the array in ascending order, it would be trivial to modify the code to sort the list in decending order. Why not try it?
Let's look at the algorithm in action, assume that the list contains the elements n r o c :
| Pass Number | Array Contents |
| Start | n r o c |
| 1st Pass | c n r o |
| 2nd Pass | c n o r |
The array is now sorted in ascending alphabetical order.
Before we continue, its worth a note that the bubble sort can be improved slightly if we look at its behaviour. Notice, elements that are at the end of the array move to their true positions very quickly? Note the 'c' above. Elements at the 'beginning' tend to move quite slowly, (not so much in the above example, but they do, really)
To improve the sort, we could modify the algorithm to make subsequent passes from either end of the list.
i.e. beginning>>end, end>>beginning, etc.
Write a function to do it and send me, at CoRNSouP, I'll post the better ones =] (fame! fortune!)
Simple Selection Sort.
A selection sort selects the smallest (in value) array element and exchanges it with the first element. The remaining elements are selected in order of lowest value, and exchanged with the second element, third, and so on. More source:
void Selection( char *list, unsigned int num_elements )
{
As usual, these algorithms are a pain to follow easily.
The outer loop i runs through the array, this gives us our exchange point in the array, i.e. first, second, third etc. For each point the inner loop j looks at each of the remaining elements.
Each element is checked in turn to find the smallest element, once we've looked at all the values, the j loop terminates, and the position of the smallest element in the array is held in k.
list[k] and list[i] are then exchanged.
Let's look at an example, again, n r o c :
| Pass Number | Array Contents |
| Start | n r o c |
| 1st Pass | c r o n |
| 2nd Pass | c n o r |
Simple Insertion Sort.
The insertion begins by sorting the first two elements of the array. The algorithm then inserts the third element into its position relative to the first two elements. The fourth element is inserted into its place within the first three elements, and so forth. Code:
void Insertion( char *list, unsigned int num_elements )
{
*Deep Breath*, here we go.
The outer loop i keeps count of where we are in the list. list[i] is compared against every element before it (loop j). If the element is greater than list[i] then it is moved up one element. This repeats until we find an element smaller than list[i]. At this point, list[i] is then placed in its new position.
Let's look at another example:
| Pass Number | Array Contents |
| Start | n r o c |
| 1st Pass | n r o c |
| 2nd Pass | n o r c |
| 3rd Pass | c n o r |
Thats basically all there is to Sorting Simply in C :-)
I realise that I haven't mentioned analyzing sorting algorithms, that is, determining just how they compare to one another. Let's do that now.
A bit about efficiency.
Basically, in analyzing any sort, you should determine how many comparisons and exchanges will take place for the best, worst, and average case.
With both the Bubble Sort and the simple Insertion sort, the number of comparisons is always the same, since both for loops execute the same amount of times, independent on how well sorted the data is before processing. So, how do we analyse this?
The outer loop always executes (n-1) times ( i = 1 .. num_elements-1 ) where n is the number of elements to sort, or array size. The inner loop executes an average of about (n/2) times ( since, less comparisons are made as we move toward the end of the array ). The two loop execution counts are multiplied to give 1/2(n2-n) comparisons.
Ignoring the time taken to exchange elements, let's say that it takes 0.001s to perform a comparison. Sorting ten elements takes 0.045 seconds, since:
| Number of Comparisons | = | 1/2(n2-n) |
| = | 1/2(102-10) | |
| = | 45 | |
| Takes 0.001s per Comparison | => | 45*0.001 = 0.045 |
Thus, 100 elements take 4.95 seconds, 1000 elements, 499.5 seconds. To sort 1,000,000 elements would take approximately 499,999,500 seconds. Which is about, hmm, apparently over 16 years of sorting.
I'd like to point out that these figures are probably very innacurate, I'm just trying to show the speed of the algorithm in relation to array size.
The Insertion sort is a little different. It exhibits what we call natural behaviour, in that the algorithm works hardest when the list is completely out of order, and least when the list is already sorted. This means that an insertion sort is a good choice, if used to sort a list which is almost in order. In the worst case however, it performs as badly as the bubble and selection sorts. (n2-n)
Thanks for listening.
Well, I hope this wasn't too confusing. There is quite a bit to digest. The best way to understand the algorithms is to take out a pen and paper and step through the algorithm, scribbling down numbers as you go.
I have about 3 sides of paper with numbers and letters scrawled all over them ;-}
In the next article we'll be looking at improved sorting algorithms, such as the shell sort and quick sort (qsort() for the C heads). Maybe I'll throw something in about sorting structures and strings too.
-- NRoC