Sorting variable-length items
- You are given an array of integers, where different integers may have different numbers of digits, but the total number of digits over all the integers in the array is \(n\). Show how to sort the array in \(O(n)\) time.
- You are given an array of strings, where different strings may have different numbers of characters, but the total number of characters over all the strings is \(n\). Show how to sort the strings in \(O(n)\) time.
(Note that the desired order here is the standard alphabetical order; for example, a < ab < b.)
A. Sorting variable-length integers
A shorter integer is always smaller than a longer one (assuming positive integers), so we can exploit length as a primary sort key. First, group the integers by their number of digits. This scan touches every digit once, taking \(O(n)\) time. Then, within each group of \(d\)-digit integers, apply radix sort with \(d\) passes of counting sort (base 10).
The total work across all groups is \(\sum n_i \cdot d_i\), where group \(i\) has \(n_i\) integers of \(d_i\) digits each. Since each digit belongs to exactly one integer, \(\sum n_i \cdot d_i = n\), so the total radix sort work is \(O(n)\). Finally, concatenating the sorted groups in order of increasing length takes \(O(n)\) time.
Total: \(O(n)\).
B. Sorting variable-length strings
Strings are trickier because lexicographic ordering is not simply length-based. The string “b” comes after “ab” even though it is shorter. The approach is to conceptually pad all strings to the same length \(\ell_{\max}\) with a special character that sorts before all alphabetic characters, then apply radix sort from right to left.
In practice, we do not actually pad. At each position \(i\) (processing from \(\ell_{\max}\) down to 1), we sort by the character at position \(i\), treating strings shorter than \(i\) as having the padding character. Each character in the original input contributes to exactly one position’s counting sort pass, so the total work across all passes is proportional to the total number of characters.
As an example, sorting \(\langle\) “b”, “ab”, “a” \(\rangle\) with padding (shown as \(\sqcup\)): padded to length 2 gives \(\langle\) “b\(\sqcup\)”, “ab”, “a\(\sqcup\)” \(\rangle\). After sorting by position 2: \(\langle\) “a\(\sqcup\)”, “b\(\sqcup\)”, “ab” \(\rangle\). After sorting by position 1: \(\langle\) “a\(\sqcup\)”, “ab”, “b\(\sqcup\)” \(\rangle\), which gives the correct order “a” < “ab” < “b”.
The total time is \(O(n + k \cdot \ell_{\max})\) where \(k\) is the number of strings. Since the total characters satisfy \(n \geq k\) (every string has at least one character) and \(n \geq \ell_{\max}\) (at least one string has the maximum length), this simplifies to \(O(n)\).