CLRS Solutions | Problem 5-2 | Probabilistic Analysis and Randomized Algorithms

This problem examines three algorithms for searching for a value $x$ in an unsorted array $A$ consisting of $n$ elements.

Consider the following randomized strategy: pick a random index $i$ into $A$. If $A[i] = x$, then we terminate; otherwise, we continue the search by picking a new random index into $A$. We continue picking random indices into $A$ until we find an index $j$ such that $A[j] = x$ or until we have checked every element of $A$. Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.

Write pseudocode for a procedure $\textsc{Random-Search}$ to implement the strategy above. Be sure that your algorithm terminates when all indices into $A$ have been picked.

Suppose that there is exactly one index $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we find $x$ and $\textsc{Random-Search}$ terminates?

Generalizing your solution to part (b), suppose that there are $k \geq 1$ indices $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we find $x$ and $\textsc{Random-Search}$ terminates? Your answer should be a function of $n$ and $k$.

Suppose that there are no indices $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we have checked all elements of $A$ and $\textsc{Random-Search}$ terminates?

Now consider a deterministic linear search algorithm, which we refer to as $\textsc{Deterministic-Search}$. Specifically, the algorithm searches $A$ for $x$ in order, considering $A[1], A[2], A[3], \ldots, A[n]$ until either it finds $A[i] = x$ or it reaches the end of the array. Assume that all possible permutations of the input array are equally likely.

Suppose that there is exactly one index $$i$$ such that $$A[i] = x$$. What is the average-case running time of $$\textsc{Deterministic-Search}$$? What is the worst-case running time of $$\textsc{Deterministic-Search}$$?

Generalizing your solution to part (e), suppose that there are $$k \geq 1$$ indices $$i$$ such that $$A[i] = x$$. What is the average-case running time of $$\textsc{Deterministic-Search}$$? What is the worst-case running time of $$\textsc{Deterministic-Search}$$? Your answer should be a function of $$n$$ and $$k$$.

Suppose that there are no indices $$i$$ such that $$A[i] = x$$. What is the average-case running time of $$\textsc{Deterministic-Search}$$? What is the worst-case running time of $$\textsc{Deterministic-Search}$$?

Finally, consider a randomized algorithm $\textsc{Scramble-Search}$ that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuted array.

Letting $$k$$ be the number of indices $$i$$ such that $$A[i] = x$$, give the worst-case and expected running times of $$\textsc{Scramble-Search}$$ for the cases in which $$k = 0$$ and $$k \geq 1$$. Generalize your solution to handle the case in which $$k \geq 1$$.

Which of the three searching algorithms would you use? Explain your answer.

A

The key challenge is ensuring termination when all indices have been checked. We need to track which indices we’ve already examined. Think of it like randomly picking lottery balls from a bag, marking each one we’ve seen to know when we’ve checked them all.

$$\textsc{RandomSearch}(A, x)$$$$\begin{aligned} 1& \quad checked\,=\,\text{new}\,boolean\,\text{array}\,\text{of}\,\text{size}\,n,\,\,\,initialized\textbf{ to }\,\text{false} \\ 2& \quad count\,=\,0 \\ 3& \quad \\ 4& \quad \textbf{while }\,count\,<\,n \\ 5& \quad \qquad i\,=\,\textsc{Random}(1, \, n) \\ 6& \quad \qquad \textbf{if }\textbf{ not }\,checked[i] \\ 7& \quad \qquad \qquad checked[i]\,=\,\text{true} \\ 8& \quad \qquad \qquad count\,=\,count\,+\,1 \\ 9& \quad \qquad \qquad \textbf{if }\,A[i]\,=\,x \\ 10& \quad \qquad \qquad \qquad \textbf{return }\,i \\ 11& \quad \textbf{return }\,\textsc{NIL} \\ \end{aligned}$$

The checked array prevents infinite loops by tracking visited indices. When count reaches $n$, we’ve examined all elements.

B

This is a classic application of geometric distribution. Each random pick has probability $1/n$ of finding $x$ (since exactly one element equals $x$). The picks are independent, so we’re essentially asking: how many independent trials until the first success?

The number of indices we must pick follows a geometric distribution with success probability $p = 1/n$. The expected value of a geometric distribution is $1/p$.

Therefore, the expected number of indices to pick is:

\[\mathrm{E}[\text{number of picks}] = \frac{1}{1/n} = n\]

This might seem surprising at first: even though we’re picking randomly, we expect to make $n$ picks on average. This is because we might pick the same (wrong) indices multiple times before finally hitting the right one.

C

Now we have $k$ elements equal to $x$, so each random pick has probability $k/n$ of success (finding any one of the $k$ matches). Using the same geometric distribution reasoning:

\[\mathrm{E}[\text{number of picks}] = \frac{1}{k/n} = \frac{n}{k}\]

This makes intuitive sense: if half the elements are $x$ ($k = n/2$), we expect to find one in just 2 picks. If all elements are $x$ ($k = n$), we find one on the first pick.

D

When $x$ is not in the array, we must examine all $n$ distinct elements before terminating. This is the coupon collector’s problem: how many random draws (with replacement) until we’ve seen all $n$ distinct items?

Let $T$ be the number of picks until all $n$ elements are checked. We can partition the process into stages:

Stage 1: Pick until we see the first distinct element (takes 1 pick)
Stage 2: Pick until we see a second distinct element
Stage $i$: Pick until we see the $i$th distinct element

After seeing $i-1$ distinct elements, the probability of seeing a new one is $(n-i+1)/n$. Let $T_i$ be the number of picks in stage $i$. Then $T_i$ follows a geometric distribution with success probability $(n-i+1)/n$:

\[\mathrm{E}[T_i] = \frac{n}{n-i+1}\]

The total expected number of picks is:

\[\begin{align*} \mathrm{E}[T] &= \sum_{i=1}^{n} \mathrm{E}[T_i] \\ &= \sum_{i=1}^{n} \frac{n}{n-i+1} \\ &= n \sum_{i=1}^{n} \frac{1}{i} \\ &= n H_n \\ &= n(\ln n + O(1)) \end{align*}\]

where $H_n$ is the $n$th harmonic number. Thus, we expect approximately $n \ln n$ picks.

E

For deterministic linear search with exactly one match, we scan from left to right until we find $x$.

Average-case: Since all permutations are equally likely, $x$ is equally likely to be at any position $1, 2, \ldots, n$. The average position is:

\[\frac{1 + 2 + \cdots + n}{n} = \frac{n(n+1)}{2n} = \frac{n+1}{2}\]

Average-case running time: $\Theta((n+1)/2) = \Theta(n)$

Worst-case: $x$ is at position $n$, requiring $n$ comparisons.

Worst-case running time: $\Theta(n)$

F

With $k \geq 1$ matches, we find $x$ when we encounter the first match position.

Average-case: Among all $n!$ permutations, we need to find the expected position of the first occurrence of $x$. By symmetry, each of the $n$ positions is equally likely to contain the first match. However, we can think of it differently: there are $k$ elements that equal $x$ and $n-k$ that don’t.

The expected position of the first match is:

\[\frac{n+1}{k+1}\]

This can be derived by considering that the $k$ matching elements divide the array into $k+1$ “gaps,” and by symmetry, each gap has the same expected size.

Average-case running time: $\Theta((n+1)/(k+1)) = \Theta(n/k)$

Worst-case: All $k$ matches are at the end, with the first at position $n-k+1$, requiring $n-k+1$ comparisons. In the absolute worst case, the first match is at position $n$.

Worst-case running time: $\Theta(n)$

G

When $x$ is not in the array, we must scan all $n$ elements.

Average-case: Always $n$ comparisons (independent of permutation).

Average-case running time: $\Theta(n)$

Worst-case: Also $n$ comparisons.

Worst-case running time: $\Theta(n)$

H

$\textsc{Scramble-Search}$ first randomly permutes the array ($\Theta(n)$ time using algorithms from Section 5.3), then runs deterministic linear search on the permuted array.

Case $k = 0$ (no matches):

Expected running time: $\Theta(n)$ for permutation $+ \Theta(n)$ for search $= \Theta(n)$
Worst-case running time: $\Theta(n)$ (same as expected, since search always takes $n$ comparisons)

Case $k \geq 1$ (at least one match):

Expected running time: After permutation, the expected position of the first match is $(n+1)/(k+1)$ by the analysis in part (f). Total: $\Theta(n + n/(k+1)) = \Theta(n)$
Worst-case running time: The permutation might place all matches at the end. Total: $\Theta(n)$ for permutation $+ \Theta(n)$ for search $= \Theta(n)$

For all cases, both expected and worst-case running times are $\Theta(n)$.

I

Recommendation: $\textsc{Deterministic-Search}$ (standard linear search)

Here’s the reasoning:

$\textsc{Random-Search}$ has two major drawbacks:

When $k$ matches exist, it requires $\Theta(n/k)$ expected picks, which seems comparable to $\textsc{Deterministic-Search}$’s $\Theta(n/k)$ expected comparisons. However, $\textsc{Random-Search}$ must maintain the checked array and might examine the same element multiple times, adding overhead.
When no matches exist, it requires $\Theta(n \ln n)$ picks, asymptotically worse than the $\Theta(n)$ of deterministic search!
The constant factors are worse: generating random numbers and maintaining tracking data structures adds computational overhead.

$\textsc{Scramble-Search}$ combines the costs of both worlds:

It pays $\Theta(n)$ upfront to permute the array
Then it runs linear search, which takes expected time $\Theta(n/k)$ if $k \geq 1$, or $\Theta(n)$ if $k = 0$
Total expected time is $\Theta(n)$ in all cases, but with higher constant factors than plain linear search

$\textsc{Deterministic-Search}$ is the clear winner:

Simple and cache-friendly (sequential memory access)
No overhead from random number generation or extra data structures
Expected time $\Theta(n/k)$ when $k \geq 1$, $\Theta(n)$ when $k = 0$
Worst-case time $\Theta(n)$—optimal for unsorted arrays
In practice, modern CPUs optimize sequential scanning with prefetching