Probabilistic lower bounds on comparison sorting

In this problem, we prove a probabilistic \(\Omega(n \lg n)\) lower bound on the running time of any deterministic or randomized comparison sort on \(n\) distinct input elements. We begin by examining a deterministic comparison sort \(A\) with decision tree \(T_A\). We assume that every permutation of \(A\)’s inputs is equally likely.

  1. Suppose that each leaf of \(T_A\) is labeled with the probability that it is reached given a random input. Prove that exactly \(n!\) leaves are labeled \(1/n!\) and that the rest are labeled 0.
  2. Let \(D(T)\) denote the external path length of a decision tree \(T\); that is, \(D(T)\) is the sum of the depths of all the leaves of \(T\). Let \(T\) be a decision tree with \(k > 1\) leaves, and let \(LT\) and \(RT\) be the left and right subtrees of \(T\). Show that \(D(T) = D(LT) + D(RT) + k\).
  3. Let \(d(k)\) be the minimum value of \(D(T)\) over all decision trees \(T\) with \(k > 1\) leaves. Show that \(d(k) = \min_{1 \leq i \leq k-1} \{d(i) + d(k-i) + k\}\).
  4. Prove that for a given value of \(k > 1\) and \(i\) in the range \(1 \leq i \leq k-1\), the function \(i \lg i + (k-i)\lg(k-i)\) is minimized at \(i = k/2\). Conclude that \(d(k) = \Omega(k \lg k)\).
  5. Prove that \(D(T_A) = \Omega(n! \lg(n!))\), and conclude that the average-case time to sort \(n\) elements is \(\Omega(n \lg n)\).
  6. Show that for any randomized comparison sort \(B\), there exists a deterministic comparison sort \(A\) whose expected number of comparisons is no more than those made by \(B\).

This problem builds the \(\Omega(n \lg n)\) average-case lower bound piece by piece. The strategy is to analyze the decision tree’s total path length, show that balanced trees minimize it, and then argue that even randomized algorithms cannot escape this bound.

A.

A correct sorting algorithm must produce a distinct output for each of the \(n!\) possible input permutations. Since each permutation is equally likely with probability \(1/n!\), and each permutation follows a unique root-to-leaf path in the decision tree, exactly \(n!\) leaves are reached with probability \(1/n!\). Any additional leaves in the tree (corresponding to impossible comparison outcomes) are never reached, so they are labeled 0.

B.

The external path length \(D(T)\) measures the total “work” across all leaves: it sums up how deep every leaf sits in the tree. Consider what happens when we go from the subtrees to the full tree. Every leaf in \(LT\) or \(RT\) gains one extra level of depth because of the root of \(T\), and there are \(k\) leaves total.

If a leaf sits at depth \(d\) in \(LT\), it sits at depth \(d + 1\) in \(T\). The same holds for \(RT\). Writing \(k_{LT}\) and \(k_{RT}\) for the number of leaves in each subtree:

\[\begin{align*} D(T) &= \sum_{\ell \in LT} (\text{depth}_{LT}(\ell) + 1) + \sum_{\ell \in RT} (\text{depth}_{RT}(\ell) + 1) \\ &= D(LT) + k_{LT} + D(RT) + k_{RT} \\ &= D(LT) + D(RT) + k \end{align*}\]

C.

This is the optimal substructure property. To build a decision tree with minimum external path length over \(k\) leaves, we choose how many leaves go in the left subtree (say \(i\)) and how many in the right (\(k - i\)). Each subtree should itself be optimal, giving path lengths \(d(i)\) and \(d(k-i)\). Applying part (b):

\[d(k) = \min_{1 \leq i \leq k-1} \{d(i) + d(k-i) + k\}\]

The minimum ranges over all possible splits because we want the best partition of leaves between left and right.

D.

The question is: what split of \(k\) leaves minimizes the total path length? Consider \(f(i) = i \lg i + (k-i)\lg(k-i)\). Taking the derivative:

\[\frac{df}{di} = \lg i + 1 - \lg(k-i) - 1 = \lg \frac{i}{k-i}\]

Setting this to zero gives \(i/(k-i) = 1\), so \(i = k/2\). The second derivative \(\frac{1}{i \ln 2} + \frac{1}{(k-i) \ln 2} > 0\) confirms this is a minimum.

This makes intuitive sense: just as a balanced binary search tree has minimum total path length, the decision tree that distributes leaves evenly between subtrees minimizes external path length.

We prove \(d(k) = \Omega(k \lg k)\) by induction. The base case \(d(1) = 0\) holds trivially. For \(k > 1\), using the optimal split at \(i = k/2\):

\[d(k) \geq 2d(k/2) + k\]

This recurrence is the same as merge sort’s, giving \(d(k) = \Omega(k \lg k)\).

E.

The decision tree \(T_A\) has at least \(n!\) reachable leaves (from part a), so \(D(T_A) \geq d(n!) = \Omega(n! \lg(n!))\).

The average number of comparisons equals the average leaf depth, weighted by probability. Since each of the \(n!\) reachable leaves has equal probability \(1/n!\), this average is simply \(D(T_A) / n!\):

\[\text{average comparisons} = \frac{D(T_A)}{n!} = \Omega\!\left(\frac{n! \lg(n!)}{n!}\right) = \Omega(\lg(n!)) = \Omega(n \lg n)\]

F.

A randomized sorting algorithm \(B\) can be viewed as a probability distribution over deterministic algorithms. Suppose \(B\) chooses deterministic algorithm \(A_i\) with probability \(p_i\). The expected number of comparisons for \(B\) on a random input is:

\[E_B = \sum_{i} p_i \cdot E[A_i]\]

This is a weighted average of the \(E[A_i]\) values. By the pigeonhole principle, at least one \(A_j\) must have \(E[A_j] \leq E_B\). This deterministic algorithm \(A_j\) performs at least as well as \(B\) in expectation.

Combined with part (e), no randomized comparison sort can beat the \(\Omega(n \lg n)\) average-case bound either.