This problem examines three algorithms for searching for a value \(x\) in an unsorted array \(A\) consisting of \(n\) elements.

Consider the following randomized strategy: pick a random index \(i\) into \(A\). If \(A[i] = x\), then we terminate; otherwise, we continue the search by picking a new random index into \(A\). We continue picking random indices into \(A\) until we find an index \(j\) such that \(A[j] = x\) or until we have checked every element of \(A\). Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.

  1. Write pseudocode for a procedure \(\textsc{Random-Search}\) to implement the strategy above. Be sure that your algorithm terminates when all indices into \(A\) have been picked.

  2. Suppose that there is exactly one index \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we find \(x\) and \(\textsc{Random-Search}\) terminates?

  3. Generalizing your solution to part (b), suppose that there are \(k \geq 1\) indices \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we find \(x\) and \(\textsc{Random-Search}\) terminates? Your answer should be a function of \(n\) and \(k\).

  4. Suppose that there are no indices \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we have checked all elements of \(A\) and \(\textsc{Random-Search}\) terminates?

Now consider a deterministic linear search algorithm, which we refer to as \(\textsc{Deterministic-Search}\). Specifically, the algorithm searches \(A\) for \(x\) in order, considering \(A[1], A[2], A[3], \ldots, A[n]\) until either it finds \(A[i] = x\) or it reaches the end of the array. Assume that all possible permutations of the input array are equally likely.

  1. Suppose that there is exactly one index $$i$$ such that $$A[i] = x$$. What is the average-case running time of $$\textsc{Deterministic-Search}$$? What is the worst-case running time of $$\textsc{Deterministic-Search}$$?
  2. Generalizing your solution to part (e), suppose that there are $$k \geq 1$$ indices $$i$$ such that $$A[i] = x$$. What is the average-case running time of $$\textsc{Deterministic-Search}$$? What is the worst-case running time of $$\textsc{Deterministic-Search}$$? Your answer should be a function of $$n$$ and $$k$$.
  3. Suppose that there are no indices $$i$$ such that $$A[i] = x$$. What is the average-case running time of $$\textsc{Deterministic-Search}$$? What is the worst-case running time of $$\textsc{Deterministic-Search}$$?

Finally, consider a randomized algorithm \(\textsc{Scramble-Search}\) that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuted array.

  1. Letting $$k$$ be the number of indices $$i$$ such that $$A[i] = x$$, give the worst-case and expected running times of $$\textsc{Scramble-Search}$$ for the cases in which $$k = 0$$ and $$k \geq 1$$. Generalize your solution to handle the case in which $$k \geq 1$$.
  2. Which of the three searching algorithms would you use? Explain your answer.

A

The key challenge is ensuring termination when all indices have been checked. We need to track which indices we’ve already examined. Think of it like randomly picking lottery balls from a bag, marking each one we’ve seen to know when we’ve checked them all.

$$\textsc{RandomSearch}(A, x)$$$$\begin{aligned} 1& \quad checked\,=\,\text{new}\,boolean\,\text{array}\,\text{of}\,\text{size}\,n,\,\,\,initialized\textbf{ to }\,\text{false} \\ 2& \quad count\,=\,0 \\ 3& \quad \\ 4& \quad \textbf{while }\,count\,<\,n \\ 5& \quad \qquad i\,=\,\textsc{Random}(1, \, n) \\ 6& \quad \qquad \textbf{if }\textbf{ not }\,checked[i] \\ 7& \quad \qquad \qquad checked[i]\,=\,\text{true} \\ 8& \quad \qquad \qquad count\,=\,count\,+\,1 \\ 9& \quad \qquad \qquad \textbf{if }\,A[i]\,=\,x \\ 10& \quad \qquad \qquad \qquad \textbf{return }\,i \\ 11& \quad \textbf{return }\,\textsc{NIL} \\ \end{aligned}$$

The checked array prevents infinite loops by tracking visited indices. When count reaches \(n\), we’ve examined all elements.

B

This is a classic application of geometric distribution. Each random pick has probability \(1/n\) of finding \(x\) (since exactly one element equals \(x\)). The picks are independent, so we’re essentially asking: how many independent trials until the first success?

The number of indices we must pick follows a geometric distribution with success probability \(p = 1/n\). The expected value of a geometric distribution is \(1/p\).

Therefore, the expected number of indices to pick is:

\[\mathrm{E}[\text{number of picks}] = \frac{1}{1/n} = n\]

This might seem surprising at first: even though we’re picking randomly, we expect to make \(n\) picks on average. This is because we might pick the same (wrong) indices multiple times before finally hitting the right one.

C

Now we have \(k\) elements equal to \(x\), so each random pick has probability \(k/n\) of success (finding any one of the \(k\) matches). Using the same geometric distribution reasoning:

\[\mathrm{E}[\text{number of picks}] = \frac{1}{k/n} = \frac{n}{k}\]

This makes intuitive sense: if half the elements are \(x\) (\(k = n/2\)), we expect to find one in just 2 picks. If all elements are \(x\) (\(k = n\)), we find one on the first pick.

D

When \(x\) is not in the array, we must examine all \(n\) distinct elements before terminating. This is the coupon collector’s problem: how many random draws (with replacement) until we’ve seen all \(n\) distinct items?

Let \(T\) be the number of picks until all \(n\) elements are checked. We can partition the process into stages:

  • Stage 1: Pick until we see the first distinct element (takes 1 pick)
  • Stage 2: Pick until we see a second distinct element
  • Stage \(i\): Pick until we see the \(i\)th distinct element

After seeing \(i-1\) distinct elements, the probability of seeing a new one is \((n-i+1)/n\). Let \(T_i\) be the number of picks in stage \(i\). Then \(T_i\) follows a geometric distribution with success probability \((n-i+1)/n\):

\[\mathrm{E}[T_i] = \frac{n}{n-i+1}\]

The total expected number of picks is:

\[\begin{align*} \mathrm{E}[T] &= \sum_{i=1}^{n} \mathrm{E}[T_i] \\ &= \sum_{i=1}^{n} \frac{n}{n-i+1} \\ &= n \sum_{i=1}^{n} \frac{1}{i} \\ &= n H_n \\ &= n(\ln n + O(1)) \end{align*}\]

where \(H_n\) is the \(n\)th harmonic number. Thus, we expect approximately \(n \ln n\) picks.

E

For deterministic linear search with exactly one match, we scan from left to right until we find \(x\).

Average-case: Since all permutations are equally likely, \(x\) is equally likely to be at any position \(1, 2, \ldots, n\). The average position is:

\[\frac{1 + 2 + \cdots + n}{n} = \frac{n(n+1)}{2n} = \frac{n+1}{2}\]

Average-case running time: \(\Theta((n+1)/2) = \Theta(n)\)

Worst-case: \(x\) is at position \(n\), requiring \(n\) comparisons.

Worst-case running time: \(\Theta(n)\)

F

With \(k \geq 1\) matches, we find \(x\) when we encounter the first match position.

Average-case: Among all \(n!\) permutations, we need to find the expected position of the first occurrence of \(x\). By symmetry, each of the \(n\) positions is equally likely to contain the first match. However, we can think of it differently: there are \(k\) elements that equal \(x\) and \(n-k\) that don’t.

The expected position of the first match is:

\[\frac{n+1}{k+1}\]

This can be derived by considering that the \(k\) matching elements divide the array into \(k+1\) “gaps,” and by symmetry, each gap has the same expected size.

Average-case running time: \(\Theta((n+1)/(k+1)) = \Theta(n/k)\)

Worst-case: All \(k\) matches are at the end, with the first at position \(n-k+1\), requiring \(n-k+1\) comparisons. In the absolute worst case, the first match is at position \(n\).

Worst-case running time: \(\Theta(n)\)

G

When \(x\) is not in the array, we must scan all \(n\) elements.

Average-case: Always \(n\) comparisons (independent of permutation).

Average-case running time: \(\Theta(n)\)

Worst-case: Also \(n\) comparisons.

Worst-case running time: \(\Theta(n)\)

H

\(\textsc{Scramble-Search}\) first randomly permutes the array (\(\Theta(n)\) time using algorithms from Section 5.3), then runs deterministic linear search on the permuted array.

Case \(k = 0\) (no matches):

  • Expected running time: \(\Theta(n)\) for permutation \(+ \Theta(n)\) for search \(= \Theta(n)\)
  • Worst-case running time: \(\Theta(n)\) (same as expected, since search always takes \(n\) comparisons)

Case \(k \geq 1\) (at least one match):

  • Expected running time: After permutation, the expected position of the first match is \((n+1)/(k+1)\) by the analysis in part (f). Total: \(\Theta(n + n/(k+1)) = \Theta(n)\)
  • Worst-case running time: The permutation might place all matches at the end. Total: \(\Theta(n)\) for permutation \(+ \Theta(n)\) for search \(= \Theta(n)\)

For all cases, both expected and worst-case running times are \(\Theta(n)\).

I

Recommendation: \(\textsc{Deterministic-Search}\) (standard linear search)

Here’s the reasoning:

\(\textsc{Random-Search}\) has two major drawbacks:

  1. When \(k\) matches exist, it requires \(\Theta(n/k)\) expected picks, which seems comparable to \(\textsc{Deterministic-Search}\)’s \(\Theta(n/k)\) expected comparisons. However, \(\textsc{Random-Search}\) must maintain the checked array and might examine the same element multiple times, adding overhead.
  2. When no matches exist, it requires \(\Theta(n \ln n)\) picks, asymptotically worse than the \(\Theta(n)\) of deterministic search!
  3. The constant factors are worse: generating random numbers and maintaining tracking data structures adds computational overhead.

\(\textsc{Scramble-Search}\) combines the costs of both worlds:

  1. It pays \(\Theta(n)\) upfront to permute the array
  2. Then it runs linear search, which takes expected time \(\Theta(n/k)\) if \(k \geq 1\), or \(\Theta(n)\) if \(k = 0\)
  3. Total expected time is \(\Theta(n)\) in all cases, but with higher constant factors than plain linear search

\(\textsc{Deterministic-Search}\) is the clear winner:

  1. Simple and cache-friendly (sequential memory access)
  2. No overhead from random number generation or extra data structures
  3. Expected time \(\Theta(n/k)\) when \(k \geq 1\), \(\Theta(n)\) when \(k = 0\)
  4. Worst-case time \(\Theta(n)\)—optimal for unsorted arrays
  5. In practice, modern CPUs optimize sequential scanning with prefetching