An introductory guide to optimizing your program, Part II


by: Robert J Ohannessian

Notes about the previous article:

Well, you should at least read Part I of this series (Pixelate #2). I should have cleared some things up in that past article though. When I said that you should plan your algorithms early, I meant only the algorithms, and not their actual implementation. So it doesn't matter if you decide to code Painter's Algorithm using highly optimized assembly with SSE2 or whatever, the fact remains that if the algorithm sucks, no amount of fine tuning will make it run fast. A good BSP tree sort written even in C++ with nice classes will crush Painter's Algorithm every time (given enough polygons :-). Another problem with optimizing the implementation first is that you may not have checked if the algorithm works (and thus you'd spend countless hours on some code which may need to be thrown out the window), or you may not be using the correct algorithm at all! As Michael Abrash said in the Black Book, (I paraphrase) "Early optimization is the root of all evil." I guess he knows what he's talking about. By the way, he was one of the graphics coders in iD's Quake. Now that doesn't mean you shouldn't write good algorithms, but rather that you shouldn't try to optimize poor ones and expect good results, because that won't get you anywhere.
Well then, on with Part II now!

Part II:

Introduction:

Welcome to the second installment of the code optimization guide. I will only discuss sorting algorithms in this article, since it's already long as it is :)
You'll have to excuse my absence from Pixelate #3 and #4 though: I've been rather busy during that time. I'll try and make it a regular column starting with this issue. Enough of the excuses now, let's get on with the article already!

Sorting:

Here I shall introduce several techniques for sorting. In the previous column, I gave you a small taste of how to optimize sorting algorithms, but now, we shall explore much more advanced techniques.

QuickSort

We'll begin with the Quick Sort. It's one of the more popular sorting algorithms, and is implemented in libc as the function qsort(). So for all you C coders out there, you have most of the work already done for you ;) The Quick Sort, however, is less than optimal as we shall see.

First, let's say we have an unsorted array of numbers:

45 6 76 12 8 34

Quick Sort works by selecting one of the numbers in the array, which we call the 'pivot'. The pivot works best (as we shall see) if it's near the median of the values in the array. The unsorted array is then split according to that pivot by moving all numbers smaller than it to the left of the array, and all numbers larger to the right. Finally, we apply the algorithm on each half, and the halves of each half, recursively, until everything is sorted. The question now is, how do we select a good pivot? The efficiency of our algorithm will depend on this. We don't want to pass through all values and find a median, because that's an O(n^2) algorithm, which will turn our sort into an O(n^2) at best. (See Part I in Pixelate #2 for what the Big-Oh Notation is...) This is unacceptable. Some more advanced implementations of the Quick Sort pick three or four values at random from the array, and select the one that's in the middle of the others as the pivot. This gives good results usually, but not always, since there's the question of whether or not we fall on good random values. A simple solution, which we'll adopt, is to take the first element of the array as the pivot. This works out nicely in ideal situations, where the data are thought of as being random. That way, the first number will be very close to the median most of the time. However, this is hardly ever the case, as we will see.

Let's take a short array of integer values. We have:

(45) 6 76 12 8 34

with 45 as our pivot.

Now we must put all numbers smaller than 45 in the left half of the array, and all larger numbers in the right half. The order within each set is unimportant for now.
This gives us:

34 6 12 8 (45) 76

Hardly sorted, but we're getting closer. Notice that 45 is now in the right place. At the pivot, we split the array in two again. So now we have:

34 6 12 8 and 76

Note that we don't include the pivot in either set (see later for why it would be bad for us to do so). In any case, the pivot is in its correct position, since every smaller value is to the left and every larger value is to the right - so we don't need to sort it.

The second part consists of just one number, so it is already 'sorted'. If this weren't the case, it would be trivial to sort it - we'd just do what we're about to do with the first part. So we have to sort the first part of the array now. We can just pick the first element as the pivot, and split the array again. From

(34) 6 12 8

we get

8 6 12 (34)

from which we obtain two arrays again:

8 6 12 and (null)

Again, there's no sense in sorting the second array - it's empty! Now we repeat the same algorithm on the first part, and get

6 8 12

which is sorted. Merging back the arrays, we now have the sequence:

6 8 12 34 45 76

And we're done!

So you're asking where's the code for all this? Here's the recursive (easy-but-inefficient) version:

void Sort(int *data, int first, int length) {

    int pivot_index;
	
    /* Array is sorted if it has one or no elements. */
    if (length <= 1)
        return;
		
    /* Move all values to the correct sides of the pivot. */
    pivot_index = Partition(data, first, length);	
	
    /* Sort subsets recursively. */
    Sort(data, first, pivot_index - first);
    Sort(data, pivot_index + 1, first + length - pivot_index - 1);
	
    return;
}
	

/* This function moves all values less than the pivot to the
 * left of it, and all values greater than the pivot to the right.
 * It returns the index of the pivot in the resulting array.
 */
int Partition(int *data, int first, int length) {

    int pivot = data[first];             /* The pivot is the first element. */
    int bigindex = first + 1;            /* Index first element after Pivot. */
    int smallindex = first + length - 1;

    /* Special case, if there is nothing to sort. */
    if (bigindex == smallindex && pivot < data[bigindex])
        return smallindex;

    while (bigindex < smallindex) {

        /* Scan for elements not in their proper positions. */
        while (bigindex < (first + length - 1) && data[bigindex] <= pivot)
            bigindex++;

        /* When smallindex reaches the pivot (first element), the loop
         *  will exit since data[first] == pivot
         */
        while (data[smallindex] > pivot)
            smallindex--;

        /* No more room to grow - swap the values. */
        if (bigindex < smallindex) {
            data[bigindex]   ^= data[smallindex];
            data[smallindex] ^= data[bigindex];
            data[bigindex]   ^= data[smallindex];
        }
    }

    /* Reposition the pivot. */
    data[first] = data[smallindex];
    data[smallindex] = pivot;

    return smallindex;
}

Making an iterative version of the sort is left as an exercise for the reader. (Hint: use a stack!)

Now, what happens if the array is already sorted? Then our pivot will end up being the smallest value. This will mean that the array won't get shifted, since all values must be larger than the pivot (the smallest value) and hence go to the right of it. Therefore, when we split our array into two parts, one of the parts will be of zero size, and the other of n-1 elements!

Let's do a timing analysis of this algorithm. At worst, we pick the wrong pivot, which will cause all data to be moved. If that happens, the array will not be split in half for the recursion, but in two sets, one of which is empty, the other containing n-1 elements (as we mentioned above). So we end up recursing n-1 times into the algorithm, moving m items every time, m being the number of items in the set to be sorted, which is linearly dependent on n. This gives us (at worst) O(n^2) (we strip coefficients for the Big-Oh Notation). How is this better than Bubble Sort, you may ask? After all, Bubble Sort is also O(n^2). The difference is, Quick Sort is O(n^2) at worst, but can be better. However, Bubble Sort is always O(n^2). In the best case of Quick Sort, the pivot will lie in the middle of the array every time, which means we'll cut our array in two near-equal parts every time. Divide by two every time and you fall in logarithmic complexity. Quick Sort has an average case and best case of n*ln(n).

This is pretty good. Let's look at some numbers:

n              n^2    n*ln(n)
----------------------------------
   10          100        23
  100        10000       460
 1000      1000000      6907
10000    100000000     92103

Note that these numbers don't include the coefficients or lower powers. If we coded a very sloppy Quick Sort and a highly optimized Bubble Sort, we might get something like 100*n*log2(n) and 2 * n^2. In that (unlikely) case, the Bubble Sort would outperform the Quick Sort when n < 500 (roughly). But as n becomes larger, Quick Sort will be able to cope much better with the load, and will be orders of magnitude faster than Bubble Sort.

HeapSort:

This is another popular algorithm. Unlike Quick Sort, its worst case is O(n*ln(n)), making it very, very useful. If you want a live example of Heap sorting, try out Team Zig's entry into The Allegro Team Competition, Set Up Us The Bomb !!! - it uses a heap to sort all the polygons and sprites in the game very efficiently (less than 2% of CPU with up to 15000 polygons). As you may have guessed, Heap Sort is based on a heap. A heap is a binary tree that is left-complete (it fills left to right, and you don't go to the next level until you've filled up the level you're on). Also, the parent node must contain a value larger than (or equal to) both its children's values.

Here's a sample Heap from the array we had in the previous example:

     76
   /    \
  45    34
 / \    /
6  12  8   

As you can see, each parent node contains a value larger then its children. The tree is also left-complete, since the last line fills up left to right. So the tree above is a heap.

To generate this initial heap, we'll need to use a technique called 'reheapification'. See later on in this article on how to do that. For now, let's assume our data actually build into a heap.

How can we easily organize these data in memory, without resorting to complicated structures and dependencies? We can just lay them out in an array, like so:

Array:    76 45 34  6 12  8
Position:  0  1  2  3  4  5

Notice that as we add elements to the array, the corresponding heap will be left-complete - the final row will grow from left to right. Also notice that if you know the position of the parent, then its direct children are located at i*2 + 1 and i*2 + 2. This makes it easy to know what is where. However, we left out one critical point: how do we make sure that the parent node will have a value larger than its children's values? Let's work with an example to make things easier. Let's assume we want to add the value 102 to the heap. We'd normally place it at the end of the array:

76 45 34  6 12  8 102

which will give us this tree:

     76
   /    \
  45    34
 / \    / \
6  12  8  102

This is not a heap. To turn it into a heap, we swap 102 with its parent (34). We now get:

     76
   /    \
  45    102
 / \    / \
6  12  8   34

We're getting closer, but we're not quite there yet. If we do this operation again, we obtain:

    102
   /    \
  45    76
 / \    / \ 
6  12  8  34

which is a heap.

The code for doing this operation is as follows:

/* Adds an element, 'value', to the heap represented by 'array'. You
 * must make sure that 'array' is large enough to hold the new value.
 */
void add_to_heap(int *array, int *size, int value) {

    int i = *size;

    /* Add the value at the end of the array, or the next free slot in
     * the heap. */
    array[*size] = value;
    (*size)++;

    /* While the array isn't a heap, keep moving the value up */
    while (i > 0 && array[(i-1) / 2] < array[i]) {
        int temp = array[i];
        array[i] = array[(i-1) / 2];
        array[(i-1) / 2] = temp;
        i = (i-1) / 2;
    }
    return;
}

As you saw, adding 102 involved only 2 iterations of our algorithm. Actually, it would need log2(n) iterations at worst, where n is the total number of elements in the tree. This can easily be demonstrated using the fact that each level has (at most) twice as many nodes as the level above it. We know that inserting one value into the heap is O(ln(n)). So, adding n values to the heap will take O(n*ln(n)). Bear in mind that this is a worst-case scenario - you may get better results in practice!

So how to we get a sorted array out of that, you may ask? I shall explain.

Let's take the heap we built in the last paragraphs. It's represented by this array:

102 45 76 6 12 8 34

Now if we take the first element of the array, and swap it with the last, we get:

34 45 76 6 12 8 102

But wait! If we do this, it's not a heap anymore! Well it could be, if we considered the array representing the heap as having only n - 1 elements (instead of n). So, we actually split our array into two parts. The first part will be our heap, and the second will be our sorted array. As we perform operations, the heap will shrink, and the sorted array will grow. So, we now have:

34 45 76 6 12 8 - 102

(I'm using the '-' sign to separate the two parts in the diagrams, but in reality, they're still one array.)

We still don't have a heap though, so we need to apply 'reheapification', an algorithm which will turn our array back into a heap. This is the same algorithm that you had to use to get your initial array into a heap (if you don't remember hearing about it, look towards the beginning of the part about heap sort :-).

The algorithm is again very simple. For every parent, check if either of its children are larger than it, and if so, swap them. Otherwise, continue to the next parent node. It resembles what we did when we inserted 102 in our data set.

Start with the root node (34 in our case). Compare it with its children, 45 and 76. 76 is the larger child, and it's larger than the parent (34), so we swap them. 76 is now the parent, and 45 and 34 are the children. You should now check if 76 is larger than its new parent, and if so, swap them again. Since 76 is the root node in our case, it doesn't have a parent, so it's in its correct position. Now repeat the algorithm for both 45 and 34. Continue for any children, until there are no more nodes. We now have a heap!

Okay okay, we still haven't sorted our array. Patience, my young apprentice. In time knowledge shall come. In the meantime however, feel free to finish reading this article :)

Well, now we've managed to split our array into a sorted part and an unsorted parts, and we've made a heap of the 'unsorted' part. If we do the same thing that we did to get the 102 out intially, that is, to take the root node out of the tree and place it in the sorted part of the array, we get:

8 45 34 6 12 - 76 102

Hey, those two numbers 76 and 102 are sorted! Lather, rinse, repeat, and finally we get:

- 6 8 12 34 45 76 102

Our array is sorted!

Let's see why this works now. When we have a heap, we know that the root of the heap (the first node) is the largest value in the heap, by definition. So we can extract this value. If we rebuild the heap without it, the new heap will have the new largest value. In other words, the second larger value in the array will be the new root. We can then take that second largest value and add it to the sorted part of our array. If we repeat for all values in the unsorted part, we'll end up with a fully sorted array!

But how good is this algorithm? A bit of timing analysis is required. The first step in our algorithm was to turn our array into a heap. For that, we went to every node, and swapped it upwards if it was larger than its parent. Since we need to do another swap if the node is larger than its 'grandparent' (as we saw when inserting 102), we end up with an O(n*ln(n)) algorithm. Re-forming the heap after extracting the root takes log2(n) steps at worst, and we need to do this n times, for a total of O(n*ln(n)). But the initial step is also O(n*ln(n)), and since it's done only once, the total is O(2*n*ln(n)) = O(n*ln(n)).

I know you're all itching to see the code, so here's the one I used for Set Up Us The Bomb !!!, with a minor tweak to get it to sort in ascending order:

/* Macro to swap two numbers */
#define SWAP(a, b) { \
    (a) ^= (b); \
    (b) ^= (a); \
    (a) ^= (b); \
}

/* Macro for reheapification */
#define SORT_DOWN(pos, len) for (k = pos;;) { \
        int right, left, max; \
        left = 2 * k + 1;     \
        right = left + 1;     \
        if (left >= len)      \
            break;            \
        else if ((right >= len) || (array[left] > array[right])) \
            max = left;       \
        else                  \
            max = right;      \
        if (array[k] > array[max]) \
            break;            \
        SWAP(array[k], array[max]); \
        k = max;              \
    }

/* Sorts a list of ints using Heap Sort */
void sort_ints(int *array, int length) {

    int i, k;

    /* Builds a heap out of the list */
    for (i = length / 2;  i >= 0; i--)
        SORT_DOWN(i, length);

    for (i = length - 1; i > 0; i--) {
        /* Move the root node to the sorted part */
        SWAP(array[i], array[0]);

        /* Rebuild a heap out of the unsorted part */
        SORT_DOWN(0, i);
    }

    return;
}

Concluding remarks:

Well, that was a long article. I didn't have time to do anything on searching, nor on minimizing work done on graphics, so I guess I'll have to leave those for Part III. See you in the next issue!


Thanks to Ben Davis for his execllent grammar skills and superhuman persistence
while battling the evil spelling mistakes that were hiding in this article :)

Robert J Ohannessian