Sorting on two dimensional processor arrays
Algorithm LS3 sort
A very simple algorithm for sorting on a two-dimensional mesh is LS3 sort by Lang, Schimmler, Schmeck and Schröder [LSSS 85].
Algorithm LS3 sort is based on a merge algorithm that merges four sorted k/2 × k/2-arrays to one sorted k × k-array. Sorting direction is the snake. The merge algorithm makes use of the basic operations shuffle and oets.
A deck of cards can be mixed by exactly interleaving its two halves. The permutation obtained by this mix is called perfect shuffle or shuffle for short. It is shown here for n = 8 elements:
|0 1 2 3 4 5 6 7|
|0 4 1 5 2 6 3 7|
On a processor array the shuffle permutation can be realized by a "triangle" of exchanges between neighbouring elements:
|0 1 2 3 4 5 6 7|
|0 4 1 5 2 6 3 7|
The operation oets stands for one step of odd-even transposition sort. In an even step all elements at even positions of the sorting order are compared with their right neighbour elements and exchanged if necessary. In an odd step the same is done with all elements at odd positions. Odd and even steps are applied alternately.
|Input:||k × k-array whose k/2 × k/2-subarrays are sorted in snake-like direction|
|Output:||the k × k-array sorted in snake-like direction|
The correctness of procedure ls3-merge is shown using the 0-1-principle.
The initial situation is an array filled with zeroes and ones whose four subarrays are sorted in snake-like direction. This situation is shown in Figure 1a (zeroes are drawn white, ones gray). Each subarray contains a certain number of full rows (a, b, c and d, respectively) and possibly one incomplete row.
After the shuffle operation in Step 1 the situation of Figure 1b is obtained. In every double column there are a + b + c + d ones that stem from the full rows plus at most four more ones from the incomplete rows.
Case 1: a + b + c + d is even
Sorting the double columns in Step 2 yields a situation as shown in Figure 1c. The ones from the full rows form (a+b+c+d)/2 full rows. Additionally, in each double column there are at most 4 ones from the incomplete rows. These ones form an unsorted zone consisting of at most two rows.
|Figure 1: Proof of correctness of procedure ls3-merge|
Case 2: a + b + c + d is odd
If a + b + c + d is odd then the ones from the full rows form a step in each double column (Figure 2). Now if in one double column there were 0 and in another double column there were 4 additional ones from the incomplete rows, then an unsorted zone of three rows would arise.
|Figure 2: Situation if a+b+c+d is odd|
However, this is only possible if all incomplete rows start from the same side (e.g. all from left or all from right). Since the sorting direction is the snake this implies that the numbers a, b, c, d are all even or all odd. But then their sum cannot be odd.
If a + b + c + d is odd then in every double column there are at least one or at most three additional ones. Therefore, also in this case the unsorted zone consists of at most two rows.
Eventually, in Step 3 of the procedure the unsorted zone is sorted. Since it consists of at most 2k elements, 2k oets-steps are sufficient.
By recursive application of procedure ls3-merge an unsorted array is sorted:
|Input:||unsorted n × n-array|
|Output:||the sorted n × n-array|
An analysis of procedure ls3-merge(k) yields the following number of operations:
Then algorithm LS3 sort requires T(n) = 4.5(n + n/2 + n/4 + ... + 2) operations; therefore it holds that
In [LSSS 85] it is shown that the time complexity of LS3 sort can be reduced to 7n by sorting the double columns in k rather than in 2k steps.
The following applet shows execution of LS3 sort with a 32 × 32-array of zeroes and ones. For better illustration some operations are shown sequentially that would be performed in parallel on an n × n-processor array.
Algorithm LS3 sort for sorting an n × n-array has the same asymptotic time complexity as the algorithm of Thompson/Kung [TK 77], namely O(n). Regarding its constant it is slower by a factor of about 2. The quality of the algorithm lies in its simplicity together with asymptotic optimality.
|[LSSS 85]||H.W. Lang, M. Schimmler, H. Schmeck, H. Schröder: Systolic Sorting on a Mesh-Connected Network. IEEE Transactions on Computers C-34, 7, 652-658 (1985)|
|[TK 77]||C.D. Thompson, H.T. Kung: Sorting on a Mesh-Connected Parallel Computer. Communications of the ACM, 20, 4, 263-271 (1977)|
|Next: [4-way mergesort] or|
|H.W. Lang Fachhochschule Flensburg firstname.lastname@example.org Impressum © Created: 20.08.2001 Updated: 15.02.2016|