From abstract algebra we know that permutations are either the identity, a single cycle, or a product of disjoint cycles. Haddon (1990) used this insight for a simple sorting algorithm that works with the theoretical minimum of writes.

Any array we might want to sort is just a permutation of the sorted array. For example, the array [3, 2, 7, 6, 1, 8, 5, 4, 9] is obtained by the following permutation:

We can decompose this permutation into two cycles: 1 moves in fifth, 5 in seventh, 7 in third, and 3 in first position, which closes the first cycle (1 3 7 5); the second cycle is (4 6 8). These cyclic permutations are disjoint and thus can be performed in any order. 2 and 9 remain unchanged.

To get back to the sorted array, we can simply reverse the permutation by walking each cycle backwards. I recycled the diagram above, that is, I permuted the elements and reversed the arrows.

The permutation example above is an instance of what Haddon (1990) describes as the *special case*: the input array has *n* elements with all integer values from 1 to *n*, so each value tells us exactly where it should be in the sorted array.
The algorithm moves over each element from left to right.
If an element's value does not correspond to its position, we have found a cycle and can shift all its elements to their final positions in the sorted array, one by one, until we come back to the starting point.
To do this, we only need a single temporary variable and one write operation per element.
After completing a cycle, we keep moving from left to right to find the next cycle or convince ourselves that the array is already sorted.
Here is a Rust implementation (zero-based):

```
pub fn special_cycle_sort(xs: &mut [usize]) {
for i in 0..xs.len() {
let mut value = xs[i];
if i == value { continue; }
loop {
std::mem::swap(&mut xs[value], &mut value);
if i == value { break; }
}
xs[i] = value;
}
}
```

Each element either stays in place (if it is already in the right position) or is written to the final position immediately, so we have *≤ n* write operations on the array.

In the general case, the input array of length *n* has integer values out of *1* to *m* and each value may appear any number of times.
The idea of the algorithm stays the same, but we need some auxiliary memory and have to do a bit of book keeping to perform in *O(n+m)* time.

First we need a histogram of the input values and turn it into a cumulative histogram, such that it tells us the index of the first element of any given value in the sorted array.

```
input: [0, 3, 2, 2, 2, 3, 1, 0]
sorted: [0, 0, 1, 2, 2, 2, 3, 3]
-----------------------------------
values: 0 1 2 3
histogram: [2, 1, 3, 2]
cumulative: [0, 2, 3, 6]
```

The cumulative histogram in this example tells us that the value 1 first appears at index 2 in the sorted array, value 3 first appears in sixth position etc. We can, of course, also sort large integer values efficiently as long as their range is small by storing counts in the histogram under keys with subtracted minimum.

Like in the special case, the algorithm moves over each element from left to right, but the target position does not simply equal the element's value, but is retrieved from the cumulative histogram. Unlike in the special case, we only enter a cycle if the target position is less than the current element's position, so we always move the first element of a cycle to a lower position. This avoids unnecessary writes as I'll show in the example run below. Whenever we encounter an element that is in its final position, we need to update the position for this value in the auxiliary data structure (which then no longer is a cumulative histogram), so that it always points to the next valid position for each value. Here is a Rust implementation (again zero-based):

```
pub fn general_cycle_sort(xs: &mut [usize], max_val: usize) {
let mut aux = cumulative_histogram(xs, max_val);
for i in 0..xs.len() {
let mut value = xs[i];
let mut target = aux[value];
if i < target { continue; }
if i != target {
loop {
aux[value] += 1;
std::mem::swap(&mut xs[target], &mut value);
target = aux[value];
if i == target { break; }
}
xs[i] = value;
}
aux[value] += 1;
}
}
```

The following table traces the values of some variables during a sorting run on the example array above. I leave out commas and spaces in the arrays since all numbers are single digits, the results of write operations in the input and the auxiliary array are highlighted.

i | value | target | aux | xs |
---|---|---|---|---|

- | - | - | [0236] | [03222310] |

0 | 0 | 0 | [1236] | |

1 | 3 | 6 | ||

2 | 2 | 3 | ||

3 | 2 | 3 | [1246] | |

4 | 2 | 4 | [1256] | |

5 | 3 | 6 | ||

6 | 1 | 2 | [1356] | [03122310] |

6 | 2 | 5 | [1366] | [03122210] |

6 | 3 | 6 | [1367] | [03122230] |

7 | 0 | 1 | [2367] | [00122230] |

7 | 3 | 7 | [2368] | [00122233] |

In *i=0,3,4* the value is already found in the right position and we only update the auxiliary data structure.
In *i=6* we have *i > target* for the first time and we write the elements of this cycle into their final positions until *target == i*.
In *i=7* we detect the second cycle and move the remaining elements.
If we would start moving the elements of the first cycle in *i=2* (where *i < target*), we would have to shift all elements of value *2* by one position, resulting in unnecessary writes to the input array.

The algorithm for the general case is very similar to counting sort, but it works *in-place*, avoiding the allocation of an additional output array.
A disadvantage compared to counting sort is that cycle sort is *unstable*.

For simplicity, this post assumed natural numbers as input.
This easily extends to elements that have a natural number as a key, see Haddon (1990).
For input types with a hash function, an efficient variant can be implemented that sorts the elements according to their hashes.
Alternatively, in each iteration we can count the number of elements less or equal to the current element to find its position in the array, resulting in an algorithm with *O(n ^{2})* runtime.

Haddon, B. K. (1990). Cycle-sort: a linear sorting method. *The Computer Journal*, 33(4), 365-367.

*This page was moved from my old website.*