Alternative quicksort analysis
An alternative analysis of the running time of randomized quicksort focuses on the expected running time of each individual recursive call to \(\textsc{Randomized-Quicksort}\), rather than on the number of comparisons performed.
- Argue that, given an array of size \(n\), the probability that any particular element is chosen as the pivot is \(1/n\). Use this to define indicator random variables \(X_i = I\{i\)th smallest element is chosen as the pivot\(\}\). What is \(\mathbb{E}[X_i]\)?
- Let \(T(n)\) be a random variable denoting the running time of quicksort on an array of size \(n\). Argue that \(\mathbb{E}[T(n)] = \mathbb{E}\left[\sum_{q=1}^{n} X_q(T(q-1) + T(n-q) + \Theta(n))\right]\).
- Show that we can rewrite equation (7.5) as \(\mathbb{E}[T(n)] = \frac{2}{n}\sum_{q=2}^{n-1} \mathbb{E}[T(q)] + \Theta(n)\).
- Show that \(\sum_{k=2}^{n-1} k \lg k \leq \frac{1}{2}n^2 \lg n - \frac{1}{8}n^2\). (Hint: Split the summation into two parts, one for \(k = 2, 3, \ldots, \lceil n/2 \rceil - 1\) and one for \(k = \lceil n/2 \rceil, \ldots, n-1\).)
- Using the bound from equation (7.7), show that the recurrence in equation (7.6) has the solution \(\mathbb{E}[T(n)] = \Theta(n \lg n)\). (Hint: Show, by substitution, that \(\mathbb{E}[T(n)] \leq an \lg n\) for sufficiently large \(n\) and for some positive constant \(a\).)
This alternative analysis provides another way to prove quicksort’s \(\Theta(n \lg n)\) expected running time by analyzing each recursive call independently.
A.
In \(\textsc{Randomized-Partition}\), the pivot is chosen uniformly at random from the \(n\) elements in the subarray. Each element has equal probability of being selected:
\[\Pr[\text{element } i \text{ is chosen as pivot}] = \frac{1}{n}\]Define the indicator random variable:
\[X_i = I\{i\text{th smallest element is chosen as pivot}\}\]By the definition of indicator random variables and linearity of expectation:
\[\mathbb{E}[X_i] = \Pr[X_i = 1] = \frac{1}{n}\]B.
The running time \(T(n)\) depends on which element is chosen as the pivot. If the \(q\)-th smallest element is chosen (indicated by \(X_q = 1\)), the partition creates subarrays of sizes \(q-1\) and \(n-q\), plus \(\Theta(n)\) time for the partition itself.
Since exactly one \(X_q\) equals 1 for any execution, the running time is:
\[T(n) = \sum_{q=1}^{n} X_q \left(T(q-1) + T(n-q) + \Theta(n)\right)\]Taking expectations:
\[\begin{align*} \mathbb{E}[T(n)] &= \mathbb{E}\left[\sum_{q=1}^{n} X_q(T(q-1) + T(n-q) + \Theta(n))\right] \end{align*}\]This expresses the expected running time as a sum over all possible pivot choices, weighted by their probabilities.
C.
The pivot choice at this level is made independently of the random choices inside the recursive calls, so \(X_q\) is independent of \(T(q-1)\) and \(T(n-q)\). Using linearity of expectation together with this independence:
\[\mathbb{E}[T(n)] = \sum_{q=1}^{n} \mathbb{E}[X_q] \cdot \mathbb{E}[T(q-1) + T(n-q) + \Theta(n)]\]Since \(\mathbb{E}[X_q] = 1/n\):
\[\mathbb{E}[T(n)] = \frac{1}{n} \sum_{q=1}^{n} (\mathbb{E}[T(q-1)] + \mathbb{E}[T(n-q)] + \Theta(n))\]Breaking this apart:
\[\begin{align*} \mathbb{E}[T(n)] &= \frac{1}{n} \sum_{q=1}^{n} \mathbb{E}[T(q-1)] + \frac{1}{n} \sum_{q=1}^{n} \mathbb{E}[T(n-q)] + \frac{1}{n} \sum_{q=1}^{n} \Theta(n) \\ &= \frac{1}{n} \sum_{q=0}^{n-1} \mathbb{E}[T(q)] + \frac{1}{n} \sum_{q=0}^{n-1} \mathbb{E}[T(q)] + \Theta(n) \end{align*}\](The second sum uses the substitution \(k = n-q\), giving the same range.)
Since \(\mathbb{E}[T(0)] = \Theta(1)\) and \(\mathbb{E}[T(1)] = \Theta(1)\), we can write:
\[\mathbb{E}[T(n)] = \frac{2}{n} \sum_{q=2}^{n-1} \mathbb{E}[T(q)] + \Theta(n)\]D.
We need to show: \(\sum_{k=2}^{n-1} k \lg k \leq \frac{1}{2}n^2 \lg n - \frac{1}{8}n^2\)
Split the sum at \(\lceil n/2 \rceil\):
\[\sum_{k=2}^{n-1} k \lg k = \sum_{k=2}^{\lceil n/2 \rceil - 1} k \lg k + \sum_{k=\lceil n/2 \rceil}^{n-1} k \lg k\]For the first sum, use \(\lg k \leq \lg(n/2)\) for \(k \leq n/2\):
\[\sum_{k=2}^{\lceil n/2 \rceil - 1} k \lg k \leq \sum_{k=2}^{n/2} k \cdot \lg(n/2) \leq \frac{(n/2)^2}{2} \cdot (\lg n - 1) = \frac{n^2}{8}(\lg n - 1)\]For the second sum, use \(\lg k \leq \lg n\) for all \(k \leq n\):
\[\sum_{k=n/2}^{n-1} k \lg k \leq \sum_{k=n/2}^{n} k \cdot \lg n \leq \frac{n^2}{2} \cdot \lg n - \frac{(n/2)^2}{2} \cdot \lg n = \frac{3n^2}{8} \lg n\]Adding these bounds:
\[\sum_{k=2}^{n-1} k \lg k \leq \frac{n^2}{8}(\lg n - 1) + \frac{3n^2}{8}\lg n = \frac{n^2 \lg n}{2} - \frac{n^2}{8}\]E.
We’ll prove \(\mathbb{E}[T(n)] \leq an \lg n\) for sufficiently large \(n\) and constant \(a > 0\).
Assume \(\mathbb{E}[T(k)] \leq ak \lg k\) for all \(k < n\). Then:
\[\begin{align*} \mathbb{E}[T(n)] &\leq \frac{2}{n} \sum_{q=2}^{n-1} aq \lg q + cn \\ &= \frac{2a}{n} \sum_{q=2}^{n-1} q \lg q + cn \\ &\leq \frac{2a}{n}\left(\frac{1}{2}n^2 \lg n - \frac{1}{8}n^2\right) + cn \\ &= an \lg n - \frac{an}{4} + cn \\ &\leq an \lg n \end{align*}\]The last inequality holds when \(c \leq a/4\), i.e., when \(a \geq 4c\). Therefore, \(\mathbb{E}[T(n)] = O(n \lg n)\).
Combined with the \(\Omega(n \lg n)\) lower bound (from comparison-based sorting), we have \(\mathbb{E}[T(n)] = \Theta(n \lg n)\).