Sorting on two-dimensional processor arrays

Sorting on two-dimensional processor arrays

 German version  up

Problem

The definition of the sorting problem has to be generalized in order to cope with two-dimensional arrays (and moreover, arbitrary arrangements) of data.

Definition:  Let A be a set of data with some order relation less or equal. A data record is a mapping a : J right arrow A, where J is a finite index set. Another writing for a data record a is (ai)i element J .

Example:  A data sequence a0, ..., an-1 is a data record with J = {0, ..., n-1}.

An n×n-array of data is a data record with J = {0, ..., n-1} × {0, ..., n-1}.

Definition:  A sorting order is a bijective mapping  ρ : J  right arrow  {0, ..., |J|-1}.

Example:  For data sequences ascending order is expressed by ρ(i) = i, descending order by ρ(i) = n – 1 – i.

For n×n-arrays usually the following sorting orders are considered:

row-major:  ρ(i, j)  =   i·n + j
snake-like: 
ρ(i, j)  =  open brace
i·n + j       if  i even
i·n + n – 1 – j    if  i odd

Definition:  The sorting problem is defined to reorder a data record (ai)i element J to a data record (aφ(i))i element J such that:

 for all i, j element J  :  aφ(i)less or equalaφ(j)   for   ρ(i) < ρ(j)

Here φ is a permutation of the index set  J.

In the following, n×n-data records are to be sorted on an n×n-processor array. The processors are connected as a mesh, i.e. only adjacent processors can exchange data with each other. The data are associated with the processors in a natural way such that each processor (ij) contains data element aij. The sorting order is row-major.

Lower bounds

Assume that the smallest element is at the beginning in the lower right corner of the n×n-array, then at least 2n – 2 steps are necessary to move it to its place in the upper right corner. This trivial lower bound has been improved by Kunde to 3n – square root2n – 3 steps [Kun 87].

In Figure 1 an n×n-array is shown. The triangle in the lower right corner is called "joker zone", since data in this zone are left undetermined for the time being. Outside the joker zone the array is filled with all different data.

nxn-array with joker zone
Figure 1: n×n-array with joker zone

The data of the joker zone need at least 2n – square root2n – 2 steps to reach the upper left corner of the array. The element x that happens to be in the upper left corner at that time is therefore independent of the data of the joker zone: regardless of the data values of the joker zone, after 2n – square root2n – 2 steps there will always be the same element x in the upper left corner.

The data values of the joker zone can therefore be chosen in a way such that element x belongs to the right edge of the array in the sorted order. Since the joker zone contains n elements there is enough mass available to push x to the right edge. The distance that element x still has to move is n-1, corresponding to n-1 additional steps. This means that altogether at least 3n – square root2n – 3 steps are necessary to sort the array.

Sorting on n×n-processor arrays

Already in 1977 Thompson and Kung published a paper on sorting on n×n-processor arrays [TK 77]. Their algorithm s2-way Mergesort is optimal not only asymptotically but also concerning the constant of the highest-order term. The time complexity of the algorithm is 3n + O(n2/3 log(n)) comparison-exchange operations.

In the sequel, several more papers appeared that presented O(n)-sorting algorithms for n×n-processor arrays [LSSS 85][SchSh 86][MG 88]. Some of these algorithms are simpler than that of Thompson/Kung, but all are slower. Only the algorithm of Schnorr/Shamir [SchSh 86] also reaches 3n as the highest-order term.

In practice, there is a need for an algorithm that is simple and easy to implement. In the following, three algorithms are presented: LS3 Sort by Lang, Schimmler, Schmeck und Schröder [LSSS 85], 4-way mergesort by Schimmler [Schi 87] and rotatesort by Marberg and Gafni [MG 88].

References

[Kun 87]M. Kunde: Lower Bounds for Sorting on Mesh-Connected Architectures. Acta Informatica 24, 121-130 (1987)
[LSSS 85]H.W. Lang, M. Schimmler, H. Schmeck, H. Schröder: Systolic Sorting on a Mesh-Connected Network. IEEE Transactions on Computers C-34, 7, 652-658 (1985)
[MG 88]J.M. Marberg, E. Gafni: Sorting in Constant Number of Row and Column Phases on a Mesh. Algorithmica 3, 561-572 (1988)
[Schi 87]M. Schimmler: Fast Sorting on the Instruction Systolic Array. Bericht Nr. 8709, Institut für Informatik, Christian-Albrechts-Universität Kiel (1987)
[SchSh 86]C.P. Schnorr, A. Shamir: An Optimal Sorting Algorithm for Mesh-Connected Computers. Proc. of the 18th ACM Symposium on Theory of Computing, 255-261 (1986)
[TK 77]C.D. Thompson, H.T. Kung: Sorting on a Mesh-Connected Parallel Computer. Communications of the ACM, 20, 4, 263-271 (1977)

 

Next:   [LS3 sort]   or   up

 

homeH.W. Lang   Hochschule Flensburg   lang@hs-flensburg.de   Impressum   ©   Created: 01.03.1997   Updated: 05.06.2016
Valid HTML 4.01 Transitional