Suppose we want to create a random sample of the set \(\{1, 2, 3, \ldots, n\}\), that is, an \(m\)-element subset \(S\), where \(0 \le m \le n\), such that each \(m\)-subset is equally likely to be created. One way would be to set \(A\)[\(i\)] = \(i\) for \(i = 1, 2, 3, \ldots, n\), call \(\textsc{Randomize-In-Place}\)(\(A\)), and then take just the first \(m\) array elements. This method would make \(n\) calls to the \(\textsc{Random}\) procedure. If \(n\) is much larger than \(m\), we can create a random sample with fewer calls to \(\textsc{Random}\). Show that the following recursive procedure returns a random \(m\)-subset \(S\) of \(\{1, 2, 3, \ldots, n\}\), in which each \(m\)-subset is equally likely, while making only \(m\) calls to \(\textsc{Random}\):
$$\textsc{Random-Sample}(m, n)$$$$\begin{aligned} 1& \quad \textbf{if }\,m\,=\,0 \\ 2& \quad \qquad \textbf{return }\,∅ \\ 3& \quad \textbf{else }\,S\,=\,\textsc{Random-Sample}(m - 1, \, n - 1) \\ 4& \quad \qquad i\,=\,\textsc{Random}(1, \, n) \\ 5& \quad \qquad \textbf{if }\,i\,∈\,S \\ 6& \quad \qquad \qquad S\,=\,S\,∪\,{n} \\ 7& \quad \qquad \textbf{else }\,S\,=\,S\,∪\,{i} \\ 8& \quad \qquad \textbf{return }\,S \\ \end{aligned}$$
We’ll prove by induction that \(\textsc{Random-Sample}\)(\(m\), \(n\)) produces a uniform random \(m\)-subset of \(\{1, 2, \ldots, n\}\), making exactly \(m\) calls to \(\textsc{Random}\).
We prove that each of the \(\binom{n}{m}\) possible \(m\)-subsets of \(\{1, 2, \ldots, n\}\) has probability exactly \(1/\binom{n}{m}\) of being generated.
Base Case. When \(m = 0\), there is exactly one \(0\)-subset: the empty set \(\emptyset\). The algorithm returns \(\emptyset\) with probability 1, which equals \(1/\binom{n}{0} = 1/1 = 1\). The base case holds, and we make 0 calls to \(\textsc{Random}\).
Inductive Hypothesis. Assume that \(\textsc{Random-Sample}\)(\(m - 1\), \(n - 1\)) correctly produces each \((m-1)\)-subset of \(\{1, 2, \ldots, n-1\}\) with probability \(1/\binom{n-1}{m-1}\), making \(m - 1\) calls to \(\textsc{Random}\).
Inductive Step. We need to show that \(\textsc{Random-Sample}\)(\(m\), \(n\)) produces each \(m\)-subset of \(\{1, 2, \ldots, n\}\) with probability \(1/\binom{n}{m}\).
Consider an arbitrary \(m\)-subset \(T \subseteq \{1, 2, \ldots, n\}\). We analyze two cases based on whether \(n \in T\).
Case 1: \(n \in T\). If \(n\) is in the target subset \(T\), then \(T = T' \cup \{n\}\) where \(T' = T \setminus \{n\}\) is an \((m-1)\)-subset of \(\{1, 2, \ldots, n-1\}\).
For the algorithm to produce \(T\):
- Line 3 must produce \(S = T'\) (probability \(1/\binom{n-1}{m-1}\) by the inductive hypothesis)
- Line 4 must choose \(i \in T'\) (probability \((m-1)/n\)), causing line 6 to add \(n\) to \(S\)
The probability is:
\[\Pr\{\text{generate } T\} = \frac{1}{\binom{n-1}{m-1}} \cdot \frac{m-1}{n}\]We can verify this equals \(1/\binom{n}{m}\):
\[\begin{align*} \frac{1}{\binom{n-1}{m-1}} \cdot \frac{m-1}{n} &= \frac{1}{\frac{(n-1)!}{(m-1)!(n-m)!}} \cdot \frac{m-1}{n} \\ &= \frac{(m-1)!(n-m)!}{(n-1)!} \cdot \frac{m-1}{n} \\ &= \frac{(m-1)! \cdot (m-1) \cdot (n-m)!}{n!} \\ &= \frac{m! \cdot (n-m)!}{n!} \\ &= \frac{1}{\binom{n}{m}} \end{align*}\]
Case 2: \(n \notin T\). If \(n\) is not in the target subset \(T\), then \(T\) is an \(m\)-subset of \(\{1, 2, \ldots, n-1\}\).
For the algorithm to produce \(T\):
- Line 3 must produce \(S = T'\) where \(T'\) is some \((m-1)\)-subset of \(\{1, 2, \ldots, n-1\}\) with \(T' \subset T\)
- Line 4 must choose \(i \in T \setminus T'\) (the unique element in \(T\) but not in \(T'\)), and \(i \notin S\), causing line 7 to add \(i\) to \(S\)
There are \(m\) ways to choose which element of \(T\) is added last (i.e., \(m\) choices for \(T' \subset T\)). For each choice:
- Probability that line 3 produces that specific \(T'\): \(1/\binom{n-1}{m-1}\)
- Probability that line 4 chooses the right \(i\): \(1/n\)
- Probability that \(i \notin S\): 1 (since \(i \in T\) and \(T \subseteq \{1, \ldots, n-1\}\) means \(i < n\), and \(S = T' \subset \{1, \ldots, n-1\}\) with \(i \notin T'\))
The total probability is:
\[\begin{align*} \Pr\{\text{generate } T\} &= m \cdot \frac{1}{\binom{n-1}{m-1}} \cdot \frac{1}{n} \\ &= \frac{m}{n} \cdot \frac{(m-1)!(n-m)!}{(n-1)!} \\ &= \frac{m!(n-m)!}{n!} \\ &= \frac{1}{\binom{n}{m}} \end{align*}\]Both cases give probability \(1/\binom{n}{m}\), completing the induction.
The algorithm makes exactly one call to \(\textsc{Random}\) at line 4 each time it recurses, plus the \(m - 1\) calls made by the recursive call. This gives \(1 + (m - 1) = m\) total calls.
This is a significant improvement over the \(n\) calls required by \(\textsc{Randomize-In-Place}\) when \(m \ll n\).
For example, consider \(\textsc{Random-Sample}\)(2, 4). We want a 2-subset of \(\{1, 2, 3, 4\}\). There are \(\binom{4}{2} = 6\) possibilities: \(\{1,2\}, \{1,3\}, \{1,4\}, \{2,3\}, \{2,4\}, \{3,4\}\).
The recursion tree shows all paths lead to each subset with probability \(1/6\) after exactly 2 random calls.