CLRS Solutions | Exercise 8.4-5 ⋆

A probability distribution function $P(x)$ for a random variable $X$ is defined by $P(x) = \Pr\{X \leq x\}$. Suppose that we draw a list of $n$ random variables $X_1, X_2, \ldots, X_n$ from a continuous probability distribution function $P$ that is computable in $O(1)$ time. Give an algorithm that sorts these numbers in linear average-case time.

The cumulative distribution function (CDF) $P(x)$ tells us the probability that a random variable is at most $x$. The key insight is that if $X$ has CDF $P(x)$, then $P(X)$ is uniformly distributed on $[0, 1]$.

This is a fundamental result in probability theory: transforming a random variable by its own CDF produces a uniform distribution. We can exploit this to use bucket sort.

$$\textsc{CDF-Sort}(X, P, n)$$$$\begin{aligned} 1& \quad \text{let}\,B[0..n-1]\,\text{be}\,a\,\text{new}\,\text{array}\,\text{of}\,\text{empty}\,\text{lists} \\ 2& \quad \textbf{for }\,j\,=\,1\textbf{ to }\,n \\ 3& \quad \qquad p\,=\,\textsc{P}(X[j]) \\ 4& \quad \qquad \text{insert}\,X[j]\,\text{into}\,B[\textsc{floor}(n \cdot p)] \\ 5& \quad \textbf{for }\,i\,=\,0\textbf{ to }\,n\,-\,1 \\ 6& \quad \qquad \textsc{Insertion-Sort}(B[i]) \\ 7& \quad \text{concatenate}\,B[0]\,,\,\,\,B[1]\,,\,\,\,\dots,\,\,\,B[n-1]\,\text{in}\,\text{order} \\ \end{aligned}$$

This works because the transformation preserves order (if $x < y$ then $P(x) < P(y)$ since $P$ is monotonically increasing), and $P(X_i)$ is uniform on $[0, 1]$. This second fact is a standard probability result: intuitively, $P(x)$ represents the “percentile” of $x$ in the distribution, and percentiles are uniformly distributed. Since $P(X_i) \in [0, 1]$ uniformly, using bucket index $\lfloor n \cdot P(X_i) \rfloor$ distributes elements uniformly across $n$ buckets.

Computing $P(x_i)$ for all $n$ elements takes $\Theta(n)$ time. Distributing into buckets is $\Theta(n)$. With uniform distribution, each bucket has $\Theta(1)$ expected elements, so sorting all buckets takes $n \times O(1) = O(n)$. Concatenating is $\Theta(n)$. Total: $\Theta(n)$ average-case time.

As an example, if $P(x) = x^2$ for $x \in [0, 1]$ (a valid CDF), then values drawn from this distribution cluster near 1. But $P(X) = X^2$ is uniform, so bucket sort works well on the transformed values while maintaining the correct ordering.