CLRS Solutions | Exercise 7.2-4

Banks often record transactions on an account in order of the times of the transactions, but many people like to receive their bank statements with checks listed in order by check number. People usually write checks in order by check number, and merchants usually cash them with reasonable dispatch. The problem of converting time-of-transaction ordering to check-number ordering is therefore the problem of sorting almost-sorted input. Argue that the procedure \(\textsc{Insertion-Sort}\) would tend to beat the procedure \(\textsc{Quicksort}\) on this problem.

For nearly sorted data, insertion sort significantly outperforms quicksort because it can exploit the existing order, while quicksort cannot.

Think about how checks are processed in practice. You write check #101, then #102, then #103. Each check is cashed within a few days, but not always in the exact order written. Check #102 might be cashed before #101 if the merchant deposits it faster. The result is a sequence that is mostly sorted, with occasional inversions (pairs of elements that are out of order).

Why insertion sort excels here

Insertion sort scans through the array, and for each element, it shifts it backward only until it finds its correct position. When the array is nearly sorted, most elements are already very close to their final positions. An element might need to move back only 1 or 2 positions instead of traveling across the entire array.

If there are only \(k\) inversions in the data (where \(k\) is small), insertion sort runs in \(\Theta(n + k)\) time. For nearly sorted input where \(k = O(n)\), this gives \(O(n)\) performance, which is optimal for a comparison-based sort.

For example, if each check can be at most 5 positions away from its correct location, insertion sort does only about \(5n\) comparisons and shifts, far better than \(n \lg n\).

Why quicksort struggles here

Quicksort’s performance depends on achieving balanced partitions, which requires that the pivot falls near the median of the elements. However, quicksort cannot detect or exploit existing order in the data. Even if the array is 99% sorted, quicksort still performs the same partitioning operations as it would on random data.

Worse, if the nearly sorted data happens to be close to fully sorted, quicksort experiences its worst-case behavior. Using the last element as pivot on a nearly sorted array often creates unbalanced partitions, leading to \(O(n^2)\) running time.

Quicksort must always make \(\Theta(n \lg n)\) comparisons (in the best case), and cannot take advantage of the \(O(n)\) work that suffices for nearly sorted data.