Consider sorting \(n\) numbers stored in array \(A\) by first finding the smallest element of \(A\) and exchanging it with the element in \(A[1]\). Then find the second smallest element of \(A\), and exchange it with \(A[2]\). Continue in this manner for the first \(n-1\) elements of \(A\). Write pseudocode for this algorithm, which is known as selection sort. What loop invariant does this algorithm maintain? Why does it need to run for only the first \(n-1\) elements, rather than for all \(n\) elements? Give the best-case and worst-case running times of selection sort in \(\Theta\)-notation.

Pseudocode

$$\textsc {Selection-Sort }(A)$$$$\begin{aligned}1& \quad \textbf {for }i=1\textbf { to }A.length-1 \\2& \quad \qquad minIndex=i \\3& \quad \qquad \textbf {for }j=i+1\textbf { to }A.length \\4& \quad \qquad \qquad \textbf {if }A[j]<A[minIndex]\text { and }j\not=minIndex \\5& \quad \qquad \qquad \qquad minIndex=j \\6& \quad \qquad \text {swap }A[i]\text { with }A[minIndex] \\\end{aligned}$$

A python implementation of the above pseudocode is shared at end of the page, you can verify the workings for yourself.

Loop Invariant

At the start of the each iteration of the outer for loop of lines 1-6, the subarray \(A[1..i - 1]\) consists of \(i - 1\) smallest elements of \(A\), sorted in increasing order.

Why only first n - 1 elements

The algorithm needs to run for only the first \(n - 1\) elements, rather than for all \(n\) elements because the last iteration will compare \(A[n]\) with the minimum element in \(A[1 .. n - 1]\) in line 4 and swap them if necessary. So, there is no need to continue the algorithm for all the way to the last element.

Running Times

For both the best-case (sorted array) and worst-case (reverse sorted array), the algorithm will anyway take one element at a time and compare it with all the other elements. In other words, each of the \(n\) elements will be compared with rest of the \(n - 1\) elements. So, the running times for both scenario will be \(\Theta(n^2)\).

The above reasoning should be sufficient to understand or convey why the runtime would be \(\Theta(n^2)\). However, for the sake of completeness, an exhaustive mathematical proof is given below.

Runtime Analysis

Let’s assume the inner for loop in line 3-5 is executed for \(t_j\) times for \(j = 2, 3, \ldots, n\), where \(n = A.length\). Now note that, line 5 will be executed less than \(t_j - 1\) times in the average case, but it’ll still be of the order of \(n\).

For the sake of simplicity let’s assume the worst case, i.e. a reverse sorted array, when it’ll be executed exactly \(t_j - 1\) times. Note, this assumption is only for that particular line, which is not going to change our overall analysis, it will only make our calculation easier.

We can now calculate the cost and times for individual lines of the pseudocode as follows …

Line Cost Times
1 \(c_1\) \(n\)
2 \(c_2\) \(n - 1\)
3 \(c_3\) \(\sum_{j = 2}^n t_j\)
4 \(c_4\) \(\sum_{j = 2}^n (t_j - 1)\)
5 \(c_5\) \(\sum_{j = 2}^n (t_j - 1)\)
6 \(c_6\) \(n - 1\)

Now, for any arbitrary value of \(j\), the inner for loop (line 3-5) compares the previously computed minimum value with all elements in the subarray \(A[j..n]\). So the inner for loop executes \(n - j + 1\) times, i.e. \(t_j = (n - j + 1)\) for \(j = 2, 3, \ldots, n\). So, \(t_2 = n - 1, t_3 = n - 2, \ldots t_n = 1\). We can calculate the summations for line 3-5 as follows …

\[\begin{aligned} \sum_{j = 2}^n t_j & = (n - 1) + (n - 2) + \cdots + 1 \\ & = \frac {n(n - 1)} 2 \\ \\ \sum_{j = 2}^n (t_j - 1) & = \sum_{j = 2}^n t_j - \sum_{j = 2}^n 1 \\ & = \frac {n(n - 1)} 2 - (n - 1) \\ & = \frac {(n - 3)(n - 2)} 2 \end{aligned}\]

Therefore, we can calculate the running time as follows..

\[T(n) = c_1(n - 1) + (c_2 + c_6)n + c_3 \frac {n(n - 1)} 2 + (c_4 + c_5) \frac {(n - 1)(n - 2)} 2\]

For best-case scenario, as the array is already sorted, line #5 will not be executed ever. So, \(c_5 = 0\). Even with that (and without that for worst-case) the expression of \(T(n)\) will be reduced to the form \(an^2 + bn + c\), i.e. the algorithm will run at \(\Theta(n^2)\) time.

Python Code

# Selection Sort def SelectionSort(A): for i in range(len(A)): minIndex = i for j in range(i + 1, len(A)): if A[j] < A[minIndex] and j != minIndex: minIndex = j A[i], A[minIndex] = A[minIndex], A[i] # Test import random num_failed = 0 total_tests = 100 for i in range(total_tests): length = random.randint(2, 50) lst = [random.randint(0, 100) for _ in range(length)] SelectionSort(lst) # Check if list has been sorted for i in range(len(lst) - 1): if lst[i] > lst[i + 1]: num_failed += 1 print(f"Test #{i:<2}: List is not sorted") break if num_failed > 0: break if num_failed > 0: print(f"\nFailed") else: print(f"Passed {total_tests}/{total_tests} tests")