3. Searching

Computer systems are often used to store large amounts of data from which individual records must be retrieved according to some search criterion. Thus the efficient storage of data to facilitate fast searching is an important issue. In this section, we shall investigate the performance of some searching algorithms and the data structures which they use.

3.1 Sequential Searches

Let's examine how long it will take to find an item matching a key in the collections we have discussed so far. We're interested in:

a) the average time

b) the worst case time and

c) the best possible time.

However, we will generally be most concerned with the worst case time as calculations based on worst case times can lead to guaranteed performance predictions. Conveniently, the worst-case times are often easier to calculate than average times.

If there are n items in our collection - whether it is stored as an array or as a linked list - then it is obvious that in the worst case, when there is no item in the collection with the desired key, then n comparisons of the key with keys of the items in the collection will have to be made.

To simplify analysis and comparison of algorithms, we look for a dominant operation and count the number of times that dominant operation has to be performed. In the case of searching, the dominant operation is the comparison, since the search requires n comparisons in the worst case, we say this is a O(n)[12 algorithm. The best case - in which the first comparison returns a match - requires a single comparison and is O(1). The average time depends on the probability that the key will be found in the collection - this is something that we would not expect to know in the majority of cases. Thus in this case, as in most others, estimation of the average time is of little utility. If the performance of the system is vital, i.e. it's part of a life-critical system, then we must use the worst case in our design calculations as it represents the best guaranteed performance13. ]

3.2 Binary Search

However, if we place our items in an array and sort them in either ascending or descending order on the key first, then we can obtain much better performance with an algorithm called binary search.

In binary search, we first compare the key with the item in the middle position of the array. If there's a match, we can return immediately. If the key is less than the middle key, then the item sought must lie in the lower half of the array; if it's greater then the item sought must lie in the upper half of the array. So we repeat the procedure on the lower (or upper) half of the array.

Our FindInCollection function can now be implemented:

static bin_search( collection c, int low, int high, void *key ) { int mid; /* Termination check */ if (low > high) return NULL; mid = (high-low)/2; switch (memcmp(ItemKey(c->items[mid]),key,c->size)) { /* Match, return item found */ case 0: return c->items[mid]; /* key is less than mid, search lower half */ case -1: return bin_search( c, low, mid-1, key ); /* key is greater than mid, search upper half */ case 1: return bin_search( c, mid+1, high, key ); default : return NULL; } }

void *FindInCollection( collection c, void *key ); /* Find an item in a collection Pre-condition: c is a collection created by ConsCollection c is sorted in ascending order of the key key != NULL Post-condition: returns an item identified by key if one exists, otherwise returns NULL */ { int low, high; low = 0; high = c->item_cnt-1; return bin_search( c, low, high, key ); }

3. Searching

3.1 Sequential Searches

3.2 Binary Search

3.3 Complexity
3.4 Trees

4. Queues

4.1 Priority
4.2 Heap

5. Sorting

5.1 Bubble
5.2 Heap
5.3 Quick
5.4 Radix

6. Searching Revisited

6.1 General n-ary trees
6.2 Hash

7. Graphs
8. FFT