What is the probability that a \(k\)-string over a set of size \(n\) forms a \(k\)-permutation? How does this question relate to the birthday paradox?
A \(k\)-string over a set of size \(n\) is a sequence of \(k\) elements, where each element is chosen from the set \(\{1, 2, \ldots, n\}\) with replacement. A \(k\)-permutation means all \(k\) elements in the string are distinct (no repetitions).
Computing the probability
If we randomly generate a \(k\)-string where each position is chosen uniformly and independently from \(\{1, 2, \ldots, n\}\), what’s the probability that all \(k\) elements are different?
The total number of possible \(k\)-strings is \(n^k\) (each of \(k\) positions has \(n\) choices).
The number of \(k\)-strings with all distinct elements (\(k\)-permutations) is:
\[\frac{n!}{(n-k)!} = n \cdot (n-1) \cdot (n-2) \cdots (n-k+1)\]This is because:
- First position: \(n\) choices
- Second position: \(n-1\) choices (must differ from first)
- Third position: \(n-2\) choices (must differ from first two)
- \(k\)-th position: \(n-k+1\) choices
Therefore, the probability is:
\[\begin{align*} \Pr\{\text{k-permutation}\} &= \frac{n!/(n-k)!}{n^k} \\ &= \frac{n \cdot (n-1) \cdot (n-2) \cdots (n-k+1)}{n^k} \\ &= 1 \cdot \frac{n-1}{n} \cdot \frac{n-2}{n} \cdots \frac{n-k+1}{n} \\ &= \prod_{i=0}^{k-1} \left(1 - \frac{i}{n}\right) \end{align*}\]
Connection to the birthday paradox
This is exactly the same formula as the birthday paradox! In the birthday paradox:
- \(n = 365\) (days in a year)
- \(k\) is the number of people
- We want the probability that all \(k\) birthdays are distinct
The birthday paradox asks: what is \(\Pr\{B_k\}\) where \(B_k\) is the event that all \(k\) birthdays are different? The answer is precisely the \(k\)-permutation probability above.
Think of the birthday problem as randomly generating a \(k\)-string over the set of 365 days. Each person contributes one element to this string (their birthday). The \(k\)-string forms a \(k\)-permutation exactly when no two people share a birthday.
Numerical examples
For \(n = 365\) and various values of \(k\):
\(k = 10\):
\[\begin{align*} \Pr\{\text{10-permutation}\} &= \prod_{i=0}^{9} \left(1 - \frac{i}{365}\right) \\ &= 1 \times 0.9973 \times 0.9945 \times \cdots \times 0.9753 \\ &\approx 0.883 \end{align*}\]\(k = 23\):
\[\begin{align*} \Pr\{\text{23-permutation}\} &\approx 0.493 \end{align*}\]This is the famous birthday paradox result: with 23 people, there’s about a 50.7% chance that at least two share a birthday (i.e., the string is not a 23-permutation).
\(k = 50\):
\[\begin{align*} \Pr\{\text{50-permutation}\} &\approx 0.030 \end{align*}\]With 50 people, it’s very unlikely (only 3%) that all have different birthdays.
\(k = 100\):
\[\begin{align*} \Pr\{\text{100-permutation}\} &\approx 3 \times 10^{-7} \end{align*}\]Essentially impossible for all 100 to have different birthdays.
Upper bound approximation
Using the inequality \(1 - x \leq e^{-x}\):
\[\begin{align*} \Pr\{\text{k-permutation}\} &= \prod_{i=0}^{k-1} \left(1 - \frac{i}{n}\right) \\ &\leq \prod_{i=0}^{k-1} e^{-i/n} \\ &= e^{-\sum_{i=0}^{k-1} i/n} \\ &= e^{-(k-1)k/(2n)} \end{align*}\]This probability drops below \(1/2\) when:
\[\begin{align*} e^{-k(k-1)/(2n)} &\leq 1/2 \\ k(k-1) &\geq 2n \ln 2 \\ k &\approx \sqrt{2n \ln 2} \\ k &\approx 1.177\sqrt{n} \end{align*}\]For \(n = 365\): \(k \approx 22.5\), which matches the birthday paradox!