CLRS Solutions | Exercise 6.4-4

Show that the worst-case running time of \(\textsc{Heapsort}\) is \(\Omega(n \lg n)\).

We already know from the text that heapsort runs in \(O(n \lg n)\) time. To establish a tight bound of \(\Theta(n \lg n)\), we need to prove the matching lower bound showing that heapsort requires at least \(\Omega(n \lg n)\) time in the worst case.

The key insight is to focus on the sorting phase of the algorithm (the for loop in lines 2–5), since this phase dominates the running time. We’ll construct a specific input that forces heapsort to do a lot of work.

Consider what happens during the sorting phase. After building the initial max-heap, we perform \(n-1\) iterations. In each iteration \(i\) (where \(i\) goes from \(n\) down to 2), we:

Exchange \(A[1]\) with \(A[i]\)
Decrement the heap size
Call \(\textsc{Max-Heapify}(A, 1)\) to restore the heap property

The running time of \(\textsc{Max-Heapify}\) on a heap of size \(k\) is proportional to the height of the heap, which is \(\lfloor \lg k \rfloor\). More precisely, the time is \(\Omega(h)\) where \(h\) is the actual number of levels the element travels.

To establish a lower bound, we need to show that for some input, \(\textsc{Max-Heapify}\) must do \(\Omega(\lg k)\) work for many values of \(k\). This happens when the element placed at the root (the element that was previously at position \(A[i]\)) needs to travel all the way down to a leaf.

Consider an array where, after building the initial max-heap and during each iteration of the sorting loop, the element swapped to the root is smaller than all elements in the remaining heap. For example, if we carefully choose our input so that the elements at the end positions of the heap (which get swapped to the root) are always among the smallest elements, then each call to \(\textsc{Max-Heapify}\) must perform comparisons and swaps all the way down to a leaf.

Specifically, when we have a heap of size \(k\), if the root element must travel to a leaf, it makes \(\Theta(\lg k)\) comparisons and swaps. Summing over all iterations:

\[\begin{align*} T(n) &\geq \sum_{k=2}^{n} c \lg k \\ &= c \sum_{k=2}^{n} \lg k \\ &= c \lg(2 \cdot 3 \cdot 4 \cdots n) \\ &= c \lg(n!) \end{align*}\]

Using Stirling’s approximation, we know that:

\[\lg(n!) = \Theta(n \lg n)\]

More directly, we can observe that at least half of the terms in the sum are from \(k = n/2\) to \(k = n\), and each of these contributes at least \(\lg(n/2) = \lg n - 1\):

\[\begin{align*} \sum_{k=2}^{n} \lg k &\geq \sum_{k=n/2}^{n} \lg k \\ &\geq \sum_{k=n/2}^{n} (\lg n - 1) \\ &\geq \frac{n}{2}(\lg n - 1) \\ &= \Omega(n \lg n) \end{align*}\]

Therefore, the worst-case running time of heapsort is \(\Omega(n \lg n)\). Combined with the upper bound of \(O(n \lg n)\), we conclude that heapsort has worst-case running time \(\Theta(n \lg n)\).