A probability distribution function \(P(x)\) for a random variable \(X\) is defined by \(P(x) = \Pr\{X \leq x\}\). Suppose that we draw a list of \(n\) random variables \(X_1, X_2, \ldots, X_n\) from a continuous probability distribution function \(P\) that is computable in \(O(1)\) time. Give an algorithm that sorts these numbers in linear average-case time.

The cumulative distribution function (CDF) \(P(x)\) tells us the probability that a random variable is at most \(x\). The key insight is that if \(X\) has CDF \(P(x)\), then \(P(X)\) is uniformly distributed on \([0, 1]\).

This is a fundamental result in probability theory: transforming a random variable by its own CDF produces a uniform distribution. We can exploit this to use bucket sort.

$$\textsc{CDF-Sort}(X, P, n)$$$$\begin{aligned} 1& \quad \text{let}\,B[0..n-1]\,\text{be}\,a\,\text{new}\,\text{array}\,\text{of}\,\text{empty}\,\text{lists} \\ 2& \quad \textbf{for }\,j\,=\,1\textbf{ to }\,n \\ 3& \quad \qquad p\,=\,\textsc{P}(X[j]) \\ 4& \quad \qquad \text{insert}\,X[j]\,\text{into}\,B[\textsc{floor}(n \cdot p)] \\ 5& \quad \textbf{for }\,i\,=\,0\textbf{ to }\,n\,-\,1 \\ 6& \quad \qquad \textsc{Insertion-Sort}(B[i]) \\ 7& \quad \text{concatenate}\,B[0]\,,\,\,\,B[1]\,,\,\,\,\dots,\,\,\,B[n-1]\,\text{in}\,\text{order} \\ \end{aligned}$$

This works because the transformation preserves order (if \(x < y\) then \(P(x) < P(y)\) since \(P\) is monotonically increasing), and \(P(X_i)\) is uniform on \([0, 1]\). This second fact is a standard probability result: intuitively, \(P(x)\) represents the “percentile” of \(x\) in the distribution, and percentiles are uniformly distributed. Since \(P(X_i) \in [0, 1]\) uniformly, using bucket index \(\lfloor n \cdot P(X_i) \rfloor\) distributes elements uniformly across \(n\) buckets.

Computing \(P(x_i)\) for all \(n\) elements takes \(\Theta(n)\) time. Distributing into buckets is \(\Theta(n)\). With uniform distribution, each bucket has \(\Theta(1)\) expected elements, so sorting all buckets takes \(n \times O(1) = O(n)\). Concatenating is \(\Theta(n)\). Total: \(\Theta(n)\) average-case time.

As an example, if \(P(x) = x^2\) for \(x \in [0, 1]\) (a valid CDF), then values drawn from this distribution cluster near 1. But \(P(X) = X^2\) is uniform, so bucket sort works well on the transformed values while maintaining the correct ordering.