Sorting on two dimensional processor arraysAlgorithm LS3 sort 
A very simple algorithm for sorting on a twodimensional mesh is LS3 sort by Lang, Schimmler, Schmeck and Schröder [LSSS 85].
Algorithm LS3 sort is based on a merge algorithm that merges four sorted k/2×k/2arrays to one sorted k×karray. Sorting direction is the snake. The merge algorithm makes use of the basic operations shuffle and oets.
A deck of cards can be mixed by exactly interleaving its two halves. The permutation obtained by this mix is called perfect shuffle or shuffle for short. It is shown here for n = 8 elements:
0 1 2 3 4 5 6 7 
0 4 1 5 2 6 3 7 
On a processor array the shuffle permutation can be realized by a "triangle" of exchanges between neighbouring elements:
0 1 2 3 4 5 6 7 
0 4 1 5 2 6 3 7 
The operation oets stands for one step of oddeven transposition sort. In an even step all elements at even positions of the sorting order are compared with their right neighbour elements and exchanged if necessary. In an odd step the same is done with all elements at odd positions. Odd and even steps are applied alternately.
Procedure ls3merge(k)  
Input:  k×karray whose k/2×k/2subarrays are sorted in snakelike direction 
Output:  the k×karray sorted in snakelike direction 
Method: 

The correctness of procedure ls3merge is shown using the 01principle.
The initial situation is an array filled with zeroes and ones whose four subarrays are sorted in snakelike direction. This situation is shown in Figure 1a (zeroes are drawn white, ones gray). Each subarray contains a certain number of full rows (a, b, c and d, respectively) and possibly one incomplete row.
After the shuffle operation in Step 1 the situation of Figure 1b is obtained. In every double column there are a + b + c + d ones that stem from the full rows plus at most four more ones from the incomplete rows.
Case 1: a + b + c + d is even
Sorting the double columns in Step 2 yields a situation as shown in Figure 1c. The ones from the full rows form (a+b+c+d)/2 full rows. Additionally, in each double column there are at most 4 ones from the incomplete rows. These ones form an unsorted zone consisting of at most two rows.
 
Figure 1: Proof of correctness of procedure ls3merge  
Case 2: a + b + c + d is odd
If a + b + c + d is odd then the ones from the full rows form a step in each double column (Figure 2). Now if in one double column there were 0 and in another double column there were 4 additional ones from the incomplete rows, then an unsorted zone of three rows would arise.
 
Figure 2: Situation if a+b+c+d is odd  
However, this is only possible if all incomplete rows start from the same side (e.g. all from left or all from right). Since the sorting direction is the snake this implies that the numbers a, b, c, d are all even or all odd. But then their sum cannot be odd.
If a + b + c + d is odd then in every double column there are at least one or at most three additional ones. Therefore, also in this case the unsorted zone consists of at most two rows.
Eventually, in Step 3 of the procedure the unsorted zone is sorted. Since it consists of at most 2k elements, 2k oetssteps are sufficient.
By recursive application of procedure ls3merge an unsorted array is sorted:
Algorithm ls3sort(n)  
Input:  unsorted n×narray 
Output:  the sorted n×narray 
Method: 

An analysis of procedure ls3merge(k) yields the following number of operations:
Step 1:  k/2  
Step 2:  2k  
Step 3:  2k  
altogether:  4.5k 
Then algorithm LS3 sort requires T(n) = 4.5(n + n/2 + n/4 + ... + 2) operations; therefore it holds that
T(n) 9n
In [LSSS 85] it is shown that the time complexity of LS3 sort can be reduced to 7n by sorting the double columns in k rather than in 2k steps.
The following applet shows execution of LS3 sort with a 32×32array of zeroes and ones. For better illustration some operations are shown sequentially that would be performed in parallel on a twodimensional processor array.
Algorithm LS3 sort for sorting an n×narray has the same asymptotic time complexity as the algorithm of Thompson/Kung [TK 77], namely O(n). Regarding its constant it is slower by a factor of about 2. The quality of the algorithm lies in its simplicity together with asymptotic optimality.
[LSSS 85]  H.W. Lang, M. Schimmler, H. Schmeck, H. Schröder: Systolic Sorting on a MeshConnected Network. IEEE Transactions on Computers C34, 7, 652658 (1985)  
[TK 77]  C.D. Thompson, H.T. Kung: Sorting on a MeshConnected Parallel Computer. Communications of the ACM, 20, 4, 263271 (1977) 
Next: [4way mergesort] or 
H.W. Lang Hochschule Flensburg lang@hsflensburg.de Impressum © Created: 20.08.2001 Updated: 05.06.2016 