Rao–Sandelius shuffle

The Rao–Sandelius shuffle is a divide-and-conquer algorithm for shuffling a finite sequence. The algorithm takes a list of all the elements of the sequence, and for each element draws a uniform random value from 0 to k-1. The list is broken up into k subsequences by the random value chosen, which are further shuffled.^[1]

For an array with $n$ entries, the algorithm consumes $O(n\log n)$ random digits and performs $O(n\log n)$ swaps. This appears worse than the Fisher-Yates Shuffle, which consumes $O(n\log n)$ random digits and performs $O(n)$ swaps. However, because of the random access pattern of the Fisher-Yates shuffle, the Rao-Sandelius Shuffle can be faster for large $n$ due to its cache friendliness.^[2]

Original method using random digit table

Rao described an algorithm for shuffling the numbers 1 through n using a table of random decimal digits.^[3] Sandelius described a procedure for shuffling a deck with n cards using random decimal digits.^[4] Sandelius includes the optimization that for a subdeck of length 2, you only need a single random digit. In the original, the order is kept for an odd random digit, and swapped for an even random digit. In either case the algorithm is as follows:

For each element to be shuffled, select a random digit 0 through 9.
Group elements that selected the same random digit into sub-lists.
Place the sub-lists in numerical order based on the digit selected.
Repeat on any sub-lists of size 2 or greater.

Length 2 optimization

Sandelius included the following optimization:^[4] If the sub-list is of length 2, draw a single random digit. Swap the order if the digit is even, and leave it unchanged if the digit is odd.

Implementation with random bits

It is straightforward to adapt a version of this algorithm to a computer using k=2.

For each element to be shuffled, select a random bit.
Place all the elements that selected 0 ahead of all elements that selected 1.
Repeat on the 2 sub-lists. This step can be parallelized.

The Size-2 speedup can also be applied in this case. Preserving the order if 1 is chosen, and swapping if 0 is chosen.

Here is pseudocode for an in-place version of the algorithm (for a zero-based array):

function rs_shuffle(A, n):
  if n < 2 then return
  if n = 2 then:
    b ← uniform random bit
    if b = 0 then
       exchange A[0] and A[1]
       return
  rs_shuffle_large(A, n)

function rs_shuffle_large(A, n):
  front ← 0
  // Invariant 1: front is the index of the first
  // non-shuffled element
  back ← n - 1
  // Invariant 2: back is the index of the last
  // non-shuffled element
  outer: while true:
    b ← uniform random bit
    while b ≠ 1
      front ← front + 1
      if front > back then break outer
    // (*) front is now the index of an element
    // that belongs at the back.
    b ← uniform random bit
    while b ≠ 0
      back ← back - 1
      // Different due to (*) above
      if front ≥ back then break outer
    exchange A[front] and A[back]
    // Restore Invariant 1
    front ← front + 1
    // Restore Invariant 2
    back ← back - 1
    if front > back then break outer
  // Because the two halves are disjoint, these
  // two calls could be done in parallel
  rs_shuffle(A, front)
  rs_shuffle(A + front, n - front)

The above uses a variation on the Hoare partition scheme to reduce the average number of swaps.

Each algorithm pass is effectively an Inverse-GSR 2-shuffle.

Performance

For large (e.g. $10^{9}$ items) data sets, the RS shuffle outperforms the more common Fisher-Yates Shuffle. It does this for two reasons:

It exhibits much better cache locality.
It is parallelizable. The Fisher-Yates shuffle is not.

Performance vs Fisher-Yates was measured by the authors of the MergeShuffle algorithm^[5] and by the authors of the VarPhilox GPU Shuffle^[2]

References

↑ Bacher, Axel; Bodini, Olivier; Hwang, Hsien-Kuei; Tsai, Tsung-Hsi (February 2017). "Generating Random Permutations by Coin Tossing: Classical Algorithms, New Analysis, and Modern Implementation". ACM Transactions on Algorithms. 13 (2): 24:4. doi:10.1145/3009909. Retrieved 12 May 2024.
1 2 Mitchell, Rory; Stokes, Daniel; Frank, Eibe; Holmes, Geoffrey (February 2022). "Bandwidth-Optimal Random Shuffling for GPUs". ACM Transactions on Parallel Computing. 9 (1): 17. arXiv:2106.06161. doi:10.1145/3505287.
↑ Rao, C. Radhakrishna (August 1961). "Generation of random permutations of given number of elements using random sampling numbers" (PDF). The Indian Journal of Statistics. 23 (3): 305–307. Retrieved 12 May 2024.
1 2 Sandelius, Martin (July 1962). "A Simple Randomization Procedure". Journal of the Royal Statistical Society. 24 (2): 472–481. doi:10.1111/j.2517-6161.1962.tb00474.x.
↑ Bacher, Axel; Bodini, Olivier; Hollender, Alexandros; Lumbros, Jérémie (June 2018). "MERGESHUFFLE: A Very Fast, Parallel Random Permutation Algorithm" (PDF). CEUR Workshop Proceedings. 2113: 43–52. Retrieved 22 Dec 2025.

[1] Bacher, Axel; Bodini, Olivier; Hwang, Hsien-Kuei; Tsai, Tsung-Hsi (February 2017). "Generating Random Permutations by Coin Tossing: Classical Algorithms, New Analysis, and Modern Implementation". ACM Transactions on Algorithms. 13 (2): 24:4. doi:10.1145/3009909. Retrieved 12 May 2024.

[VarPhilox-2] 1 2 Mitchell, Rory; Stokes, Daniel; Frank, Eibe; Holmes, Geoffrey (February 2022). "Bandwidth-Optimal Random Shuffling for GPUs". ACM Transactions on Parallel Computing. 9 (1): 17. arXiv:2106.06161. doi:10.1145/3505287.

[3] Rao, C. Radhakrishna (August 1961). "Generation of random permutations of given number of elements using random sampling numbers" (PDF). The Indian Journal of Statistics. 23 (3): 305–307. Retrieved 12 May 2024.

[Sandelius-4] 1 2 Sandelius, Martin (July 1962). "A Simple Randomization Procedure". Journal of the Royal Statistical Society. 24 (2): 472–481. doi:10.1111/j.2517-6161.1962.tb00474.x.

[MergeShuffle-5] Bacher, Axel; Bodini, Olivier; Hollender, Alexandros; Lumbros, Jérémie (June 2018). "MERGESHUFFLE: A Very Fast, Parallel Random Permutation Algorithm" (PDF). CEUR Workshop Proceedings. 2113: 43–52. Retrieved 22 Dec 2025.

[1]

[2]

[3]

[4]

[5]