The 0-1 sorting lemma and columnsort
A compare-exchange operation on two array elements \(A[i]\) and \(A[j]\), where \(i < j\), has the form:
$$\textsc{Compare-Exchange}(A, i, j)$$$$\begin{aligned} 1& \quad \textbf{if }\,A[i]\,>\,A[j] \\ 2& \quad \qquad \text{exchange}\,A[i]\,\text{with}\,A[j] \\ \end{aligned}$$After the compare-exchange operation, we know that \(A[i] \leq A[j]\).
An oblivious compare-exchange algorithm operates solely by a sequence of prespecified compare-exchange operations. The indices of the positions compared in the sequence must be determined in advance, and although they can depend on the number of elements being sorted, they cannot depend on the values being sorted, nor can they depend on the result of any prior compare-exchange operation.
The 0-1 sorting lemma provides a powerful way to prove that an oblivious compare-exchange algorithm produces a sorted result. It states that if an oblivious compare-exchange algorithm correctly sorts all input sequences consisting of only 0s and 1s, then it correctly sorts all inputs containing arbitrary values.
You will prove the 0-1 sorting lemma by proving its contrapositive: if an oblivious compare-exchange algorithm fails to sort an input containing arbitrary values, then it fails to sort some 0-1 input. Assume that an oblivious compare-exchange algorithm X fails to correctly sort the array \(A[1 \mathinner{.\,.} n]\). Let \(A[p]\) be the smallest value in \(A\) that algorithm X puts into the wrong location, and let \(A[q]\) be the value that algorithm X moves to the location into which \(A[p]\) should have gone. Define an array \(B[1 \mathinner{.\,.} n]\) of 0s and 1s as follows:
\[B[i] = \begin{cases} 0 & \text{if } A[i] \leq A[p] \\ 1 & \text{if } A[i] > A[p] \end{cases}\]
- Argue that \(A[q] > A[p]\), so that \(B[p] = 0\) and \(B[q] = 1\).
- To complete the proof of the 0-1 sorting lemma, prove that algorithm X fails to sort array \(B\) correctly.
Now you will use the 0-1 sorting lemma to prove that a particular sorting algorithm, columnsort, works correctly. The algorithm works on a rectangular array of \(n\) elements with \(r\) rows and \(s\) columns (so that \(n = rs\)), subject to three restrictions: \(r\) must be even, \(s\) must be a divisor of \(r\), and \(r \geq 2s^2\). Columnsort operates in eight steps, regardless of the value of \(n\). The odd steps are all the same: sort each column individually. Each even step is a fixed permutation. Here are the steps:
- Sort each column.
- Transpose the array, but reshape it back to \(r\) rows and \(s\) columns. In other words, turn the leftmost column into the top \(r/s\) rows, in order; turn the next column into the next \(r/s\) rows, in order; and so on.
- Sort each column.
- Perform the inverse of the permutation performed in step 2.
- Sort each column.
- Shift the top half of each column into the bottom half of the same column, and shift the bottom half of each column into the top half of the next column to the right. Leave the top half of the leftmost column empty. Shift the bottom half of the last column into the top half of a new rightmost column, and leave the bottom half of this new column empty.
- Sort each column.
Perform the inverse of the permutation performed in step 6.
- Argue that we can treat columnsort as an oblivious compare-exchange algorithm, even if we do not know what sorting method the odd steps use.
Although it might seem hard to believe that columnsort actually sorts, you will use the 0-1 sorting lemma to prove that it does. A couple of definitions will help you apply the 0-1 sorting lemma. We say that an area of an array is clean if we know that it contains either all 0s or all 1s. Otherwise, the area might contain mixed 0s and 1s, and it is dirty. From here on, assume that the input array contains only 0s and 1s, and that we can treat it as an array with \(r\) rows and \(s\) columns.
- Prove that after steps 1-3, the array consists of some clean rows of 0s at the top, some clean rows of 1s at the bottom, and at most \(s\) dirty rows between them.
- Prove that after step 4, the array, read in column-major order, starts with a clean area of 0s, ends with a clean area of 1s, and has a dirty area of at most \(s^2\) elements in the middle.
- Prove that steps 5-8 produce a fully sorted 0-1 output. Conclude that columnsort correctly sorts all inputs containing arbitrary values.
- Now suppose that \(s\) does not divide \(r\). Prove that after steps 1-3, the array consists of some clean rows of 0s at the top, some clean rows of 1s at the bottom, and at most \(2s - 1\) dirty rows between them. How large must \(r\) be, compared with \(s\), for columnsort to correctly sort when \(s\) does not divide \(r\)?
- Suggest a simple change to step 1 that allows us to maintain the requirement that \(r \geq 2s^2\) even when \(s\) does not divide \(r\), and prove that with your change, columnsort correctly sorts.
A. Proof that \(A[q] > A[p]\)
Since \(A[p]\) is the smallest value that algorithm X places incorrectly, all values smaller than \(A[p]\) end up in their correct positions. The correct sorted position for \(A[p]\) is occupied instead by \(A[q]\). Since \(A[p]\) is the smallest misplaced value, \(A[q]\) cannot be smaller than \(A[p]\) (it would then be misplaced but smaller, contradicting our choice of \(A[p]\)). And \(A[q] \neq A[p]\) because \(A[q]\) is in \(A[p]\)’s correct position while \(A[p]\) is elsewhere. Therefore \(A[q] > A[p]\).
From the definition of \(B\): since \(A[p] \leq A[p]\), we have \(B[p] = 0\). Since \(A[q] > A[p]\), we have \(B[q] = 1\).
B. Proof that algorithm X fails to sort array \(B\)
Since X is oblivious, its sequence of compare-exchange operations depends only on the array size, not on element values. We show that every compare-exchange on \(A\) and \(B\) produces the same swap-or-no-swap decision.
Consider \(\textsc{Compare-Exchange}(i, j)\) with \(i < j\). The operation exchanges if and only if the value at position \(i\) exceeds the value at position \(j\).
If \(A[i] \leq A[p]\) and \(A[j] \leq A[p]\), then \(B[i] = B[j] = 0\). The exchange happens in \(A\) iff \(A[i] > A[j]\), but in \(B\) neither value exceeds the other (both 0), so no exchange. However, since both are 0 in \(B\), exchanging or not produces the same result, so the positions of 0s and 1s are the same either way.
If \(A[i] > A[p]\) and \(A[j] \leq A[p]\), then \(B[i] = 1\) and \(B[j] = 0\). In \(A\), we have \(A[i] > A[p] \geq A[j]\), so \(A[i] > A[j]\), meaning an exchange occurs. In \(B\), \(B[i] = 1 > 0 = B[j]\), so an exchange also occurs. Both exchange.
If \(A[i] \leq A[p]\) and \(A[j] > A[p]\), then \(B[i] = 0\) and \(B[j] = 1\). In \(A\), \(A[i] \leq A[p] < A[j]\), so no exchange. In \(B\), \(B[i] = 0 < 1 = B[j]\), so no exchange. Both skip.
If \(A[i] > A[p]\) and \(A[j] > A[p]\), then \(B[i] = B[j] = 1\). Both are 1 in \(B\), so (as in the first case) the result in \(B\) is the same regardless of whether an exchange occurs.
In every case, each element in \(B\) ends up at the same position as it would by tracking the \(\leq A[p]\) / \(> A[p]\) classification through algorithm X on input \(A\). Since X places \(A[q]\) (with \(B[q] = 1\)) where \(A[p]\) (with \(B[p] = 0\)) should go, X also places a 1 where a 0 should go in \(B\). So X fails to sort \(B\).
C. Columnsort as oblivious compare-exchange
Columnsort performs a fixed sequence of operations determined entirely by \(r\) and \(s\). The even steps (2, 4, 6, 8) are fixed permutations that move elements to predetermined positions without any comparisons. The odd steps (1, 3, 5, 7) sort individual columns, which can each be implemented as a fixed sequence of compare-exchange operations (using any oblivious sorting network, such as an odd-even merge network, applied independently to each column). Since every compare-exchange index is determined in advance by \(r\) and \(s\) alone, columnsort is an oblivious compare-exchange algorithm.
D. After steps 1-3: at most \(s\) dirty rows
Assume the input contains only 0s and 1s. After step 1 (sort each column), every column has all its 0s at the top and 1s at the bottom. Each column is a clean block of 0s followed by a clean block of 1s.
Step 2 transposes and reshapes. Reading the array in column-major order before step 2, the data is a sequence of \(s\) sorted columns. The transpose-and-reshape operation reads this column-major sequence and lays it out row by row into \(r/s\) rows per original column. Within each original column’s contribution, the 0s precede the 1s. The boundary between 0s and 1s within each original column maps to at most one “dirty” row in the reshaped array (a row containing both 0s and 1s). Since there are \(s\) columns, at most \(s\) rows are dirty after the reshape.
Step 3 sorts each column again. Each column now has at most \(s\) dirty entries scattered among its \(r\) entries. After sorting each column, the 0s move to the top and 1s to the bottom. The boundary between 0s and 1s in each column falls within the at-most-\(s\) dirty rows. Since \(r \geq 2s^2\), there are far more clean rows (at least \(r - s\)) than dirty ones.
After step 3, reading column by column, the top rows are all 0s (clean), the bottom rows are all 1s (clean), and the transition between 0 and 1 happens within a band of at most \(s\) consecutive rows, where different columns may transition at different rows within this band.
E. After step 4: dirty area of at most \(s^2\) elements
Step 4 inverts the permutation from step 2, returning to the original layout. After step 3, the dirty region spans at most \(s\) rows, each containing \(s\) elements, for at most \(s^2\) dirty elements total. The clean region of all-0 rows maps back to a contiguous block of 0s at the start (in column-major order), and the clean region of all-1 rows maps back to a contiguous block of 1s at the end.
So in column-major order after step 4, the array has a clean prefix of 0s, a dirty middle region of at most \(s^2\) elements, and a clean suffix of 1s.
F. Steps 5-8 produce sorted output
After step 4, we have at most \(s^2\) dirty elements in column-major order. Since \(r \geq 2s^2\), these \(s^2\) dirty elements fit within at most \(\lceil s^2 / r \rceil \cdot s \leq s\) columns, or equivalently, within at most \(s^2\) consecutive positions that span at most one column boundary.
Step 5 sorts each column. After this, every column is individually sorted (0s on top, 1s on bottom). The dirty elements are confined to at most two adjacent columns. The array is “almost sorted” in column-major order, with the only disorder at the boundary between these columns.
Step 6 shifts each column by half: the top \(r/2\) elements move to the bottom of the same column, and the bottom \(r/2\) elements move to the top of the next column. This interleaves the boundary region between adjacent columns into a single column. The at-most-\(r\) dirty elements from the two adjacent columns (after step 5) now land in a single column plus possibly its neighbors.
Step 7 sorts each column again. Since the only unsorted elements are confined to a small region, and each column is sorted independently, this step fixes the remaining disorder.
Step 8 inverts the shift from step 6, restoring the original column structure. Since step 7 sorted all columns and the shift/unshift is a fixed permutation that preserves column-major order, the result is fully sorted in column-major order.
Since columnsort is oblivious and correctly sorts all 0-1 inputs, the 0-1 sorting lemma guarantees it correctly sorts all inputs with arbitrary values.
G. When \(s\) does not divide \(r\): at most \(2s - 1\) dirty rows
When \(s\) does not divide \(r\), the reshape in step 2 is not as clean. Each original column contributes \(r\) elements that span \(\lceil r/s \rceil\) or \(\lfloor r/s \rfloor\) rows in the reshaped array, and column boundaries may fall in the middle of a row. Each such mid-row boundary creates an additional dirty row, since elements from two different sorted columns are mixed within the same row.
There are \(s\) columns, each contributing at most one 0-to-1 transition (one dirty row from the data) and at most one column boundary that may dirty another row. With \(s\) data transitions and up to \(s - 1\) column boundaries, the total is at most \(2s - 1\) dirty rows after step 3.
For columnsort to work, the dirty area after step 4 must fit within a half-column (\(r/2\) elements) so that steps 5-8 can clean it up. The dirty area has at most \((2s - 1) \cdot s\) elements. We need \((2s - 1) \cdot s \leq r/2\), giving \(r \geq 2s(2s - 1) = 4s^2 - 2s\). So \(r \geq 4s^2\) suffices (roughly doubling the original requirement).
H. Simple change to maintain \(r \geq 2s^2\)
Before step 1, pad each column so that its length is divisible by \(s\). Specifically, if \(r\) is not divisible by \(s\), add \(s - (r \bmod s)\) dummy rows filled with \(+\infty\) (a value larger than any real element) at the bottom of each column, increasing \(r\) to the next multiple of \(s\). After columnsort finishes, remove the dummy rows.
With this change, \(s\) divides \(r\), so the analysis from parts (d)-(f) applies directly, requiring only \(r \geq 2s^2\). The dummy elements do not affect correctness because they are larger than all real elements and end up in the bottom rows after sorting. The padding adds at most \(s - 1\) rows, which is negligible compared to \(r \geq 2s^2\).