We can improve the running time of quicksort in practice by taking advantage of the fast running time of insertion sort when its input is “nearly” sorted. Upon calling quicksort on a subarray with fewer than \(k\) elements, let it simply return without sorting the subarray. After the top-level call to quicksort returns, run insertion sort on the entire array to finish the sorting process. Argue that this sorting algorithm runs in \(O(nk + n \lg(n/k))\) expected time. How should we pick \(k\), both in theory and in practice?

This hybrid approach combines quicksort’s efficient partitioning with insertion sort’s speed on nearly sorted data. Let’s analyze its running time and determine the optimal value of \(k\).

Phase 1: Quicksort down to size \(k\)

When we stop quicksort at subarrays of size \(k\), the recursion tree has depth approximately \(\lg(n/k)\) instead of \(\lg n\). Here’s why:

At each level, subarrays are partitioned roughly in half. Starting with size \(n\), after \(d\) levels, subarray sizes are approximately \(n/2^d\). We stop when:

\[\frac{n}{2^d} = k \quad \Rightarrow \quad d = \lg(n/k)\]

At each of these \(\lg(n/k)\) levels, we do \(O(n)\) total work for partitioning. Therefore, the quicksort phase costs:

\[O(n \lg(n/k))\]

After this phase, the array consists of approximately \(n/k\) sorted “blocks” of size at most \(k\), with elements in each block unordered but all elements in one block smaller than all elements in later blocks.

Phase 2: Insertion sort on the entire array

After quicksort stops, we have roughly \(n/k\) unsorted groups of size \(k\). Within each group, elements can be arbitrarily ordered. However, elements from different groups are already in the correct relative order (all elements in one group are smaller than all elements in the next group).

When insertion sort runs on this array, each of the \(n\) elements needs to move at most \(k\) positions (within its group). An element from a group of size \(k\) makes at most \(k\) comparisons and swaps. With \(n\) elements:

\[O(nk)\]

Adding the two phases gives the total running time:

\[O(n \lg(n/k)) + O(nk) = O(nk + n \lg(n/k))\]

In theory, to minimize the total running time we want to minimize:

\[f(k) = nk + n \lg(n/k) = nk + n \lg n - n \lg k\]

Taking the derivative with respect to \(k\):

\[\frac{df}{dk} = n - \frac{n}{k \ln 2}\]

Setting to zero:

\[n - \frac{n}{k \ln 2} = 0 \quad \Rightarrow \quad k = \frac{1}{\ln 2} \approx 1.44\]

This suggests \(k = \Theta(1)\) (a small constant) is optimal. However, since \(k\) must be at least 1, and we’re dealing with discrete values, we should choose \(k\) to be a small constant like \(k = 10\) to \(k = 20\).

With \(k = \Theta(1)\):

\[O(n \cdot 1 + n \lg(n/1)) = O(n + n \lg n) = O(n \lg n)\]

In fact \(k\) does not have to be constant for the asymptotics to stay optimal. Any choice up to \(k = O(\lg n)\) keeps both terms within \(\Theta(n \lg n)\), since \(nk = O(n \lg n)\) and \(n \lg(n/k) = O(n \lg n)\). The hybrid algorithm therefore maintains quicksort’s asymptotic performance for any such \(k\) while improving the constant factors.

In practice, \(k\) should be chosen based on:

  1. The constant factors in quicksort vs. insertion sort on your specific hardware
  2. Cache effects (insertion sort has better cache locality for small arrays)
  3. Empirical testing with representative data

Typical values used in production implementations are \(k = 10\) to \(k = 50\). Too small and we don’t save enough quicksort overhead; too large and insertion sort becomes slow.