CLRS Solutions | Exercise 5.3-7 | Probabilistic Analysis and Randomized Algorithms

Suppose we want to create a random sample of the set $\{1, 2, 3, \ldots, n\}$, that is, an $m$-element subset $S$, where $0 \le m \le n$, such that each $m$-subset is equally likely to be created. One way would be to set $A$[$i$] = $i$ for $i = 1, 2, 3, \ldots, n$, call $\textsc{Randomize-In-Place}$($A$), and then take just the first $m$ array elements. This method would make $n$ calls to the $\textsc{Random}$ procedure. If $n$ is much larger than $m$, we can create a random sample with fewer calls to $\textsc{Random}$. Show that the following recursive procedure returns a random $m$-subset $S$ of $\{1, 2, 3, \ldots, n\}$, in which each $m$-subset is equally likely, while making only $m$ calls to $\textsc{Random}$:

$$\textsc{Random-Sample}(m, n)$$$$\begin{aligned} 1& \quad \textbf{if }\,m\,=\,0 \\ 2& \quad \qquad \textbf{return }\,∅ \\ 3& \quad \textbf{else }\,S\,=\,\textsc{Random-Sample}(m - 1, \, n - 1) \\ 4& \quad \qquad i\,=\,\textsc{Random}(1, \, n) \\ 5& \quad \qquad \textbf{if }\,i\,∈\,S \\ 6& \quad \qquad \qquad S\,=\,S\,∪\,{n} \\ 7& \quad \qquad \textbf{else }\,S\,=\,S\,∪\,{i} \\ 8& \quad \qquad \textbf{return }\,S \\ \end{aligned}$$

We’ll prove by induction that $\textsc{Random-Sample}$($m$, $n$) produces a uniform random $m$-subset of $\{1, 2, \ldots, n\}$, making exactly $m$ calls to $\textsc{Random}$.

We prove that each of the $\binom{n}{m}$ possible $m$-subsets of $\{1, 2, \ldots, n\}$ has probability exactly $1/\binom{n}{m}$ of being generated.

Base Case. When $m = 0$, there is exactly one $0$-subset: the empty set $\emptyset$. The algorithm returns $\emptyset$ with probability 1, which equals $1/\binom{n}{0} = 1/1 = 1$. The base case holds, and we make 0 calls to $\textsc{Random}$.

Inductive Hypothesis. Assume that $\textsc{Random-Sample}$($m - 1$, $n - 1$) correctly produces each $(m-1)$-subset of $\{1, 2, \ldots, n-1\}$ with probability $1/\binom{n-1}{m-1}$, making $m - 1$ calls to $\textsc{Random}$.

Inductive Step. We need to show that $\textsc{Random-Sample}$($m$, $n$) produces each $m$-subset of $\{1, 2, \ldots, n\}$ with probability $1/\binom{n}{m}$.

Consider an arbitrary $m$-subset $T \subseteq \{1, 2, \ldots, n\}$. We analyze two cases based on whether $n \in T$.

Case 1: $n \in T$. If $n$ is in the target subset $T$, then $T = T' \cup \{n\}$ where $T' = T \setminus \{n\}$ is an $(m-1)$-subset of $\{1, 2, \ldots, n-1\}$.

For the algorithm to produce $T$:

Line 3 must produce $S = T'$ (probability $1/\binom{n-1}{m-1}$ by the inductive hypothesis)
Line 4 must choose $i \in T'$ (probability $(m-1)/n$), causing line 6 to add $n$ to $S$

The probability is:

\[\Pr\{\text{generate } T\} = \frac{1}{\binom{n-1}{m-1}} \cdot \frac{m-1}{n}\]

We can verify this equals $1/\binom{n}{m}$:

\[\begin{align*} \frac{1}{\binom{n-1}{m-1}} \cdot \frac{m-1}{n} &= \frac{1}{\frac{(n-1)!}{(m-1)!(n-m)!}} \cdot \frac{m-1}{n} \\ &= \frac{(m-1)!(n-m)!}{(n-1)!} \cdot \frac{m-1}{n} \\ &= \frac{(m-1)! \cdot (m-1) \cdot (n-m)!}{n!} \\ &= \frac{m! \cdot (n-m)!}{n!} \\ &= \frac{1}{\binom{n}{m}} \end{align*}\]

Case 2: $n \notin T$. If $n$ is not in the target subset $T$, then $T$ is an $m$-subset of $\{1, 2, \ldots, n-1\}$.

For the algorithm to produce $T$:

Line 3 must produce $S = T'$ where $T'$ is some $(m-1)$-subset of $\{1, 2, \ldots, n-1\}$ with $T' \subset T$
Line 4 must choose $i \in T \setminus T'$ (the unique element in $T$ but not in $T'$), and $i \notin S$, causing line 7 to add $i$ to $S$

There are $m$ ways to choose which element of $T$ is added last (i.e., $m$ choices for $T' \subset T$). For each choice:

Probability that line 3 produces that specific $T'$: $1/\binom{n-1}{m-1}$
Probability that line 4 chooses the right $i$: $1/n$
Probability that $i \notin S$: 1 (since $i \in T$ and $T \subseteq \{1, \ldots, n-1\}$ means $i < n$, and $S = T' \subset \{1, \ldots, n-1\}$ with $i \notin T'$)

The total probability is:

\[\begin{align*} \Pr\{\text{generate } T\} &= m \cdot \frac{1}{\binom{n-1}{m-1}} \cdot \frac{1}{n} \\ &= \frac{m}{n} \cdot \frac{(m-1)!(n-m)!}{(n-1)!} \\ &= \frac{m!(n-m)!}{n!} \\ &= \frac{1}{\binom{n}{m}} \end{align*}\]

Both cases give probability $1/\binom{n}{m}$, completing the induction.

The algorithm makes exactly one call to $\textsc{Random}$ at line 4 each time it recurses, plus the $m - 1$ calls made by the recursive call. This gives $1 + (m - 1) = m$ total calls.

This is a significant improvement over the $n$ calls required by $\textsc{Randomize-In-Place}$ when $m \ll n$.

For example, consider $\textsc{Random-Sample}$(2, 4). We want a 2-subset of $\{1, 2, 3, 4\}$. There are $\binom{4}{2} = 6$ possibilities: $\{1,2\}, \{1,3\}, \{1,4\}, \{2,3\}, \{2,4\}, \{3,4\}$.

The recursion tree shows all paths lead to each subset with probability $1/6$ after exactly 2 random calls.