Instruction systolic array (ISA)
There are two directions in parallel computing: coarse-grained multiprocessors and fine-grained array processors. On this page, the design of a highly-integrated, fine-grained array processor is presented. Its applications range from numerical problems such as matrix multiplication or solution of linear equations to problems in image processing, computer graphics, cryptography, computer tomography, and many others.
The underlying parallel computer model is the instruction systolic array (ISA), an architectural concept suited for implementation in very high integration technology. The concept of the ISA focusses on short interconnections for data communication as well as for control communication.
The basic architecture is an nn array of mesh-connected processors. Each processor can communicate with its four direct neighbours. There is no global communication, even control flow is strictly local. This facilitates the synchronisation of the communication among the processors in the presence of clock skew.
The array is programmed in the ISA (instruction systolic array) style: One instruction after the other is entered into the upper left processor and propagated in diagonal wavefronts through the array. While travelling through the array, each instruction is executed by each processor – provided that execution of the instruction in this processor is not masked. The masking mechanism is realized by row and column selector bits that belong to each instruction. The instruction is executed in processor P(i, j), if the i-th row selector bit and the j-th column selector bit are 1; otherwise, a no-operation instruction is executed.
Figure 1 shows how an instruction (+) together with its row and column selector bits is moved through a 44 array. Selector bits with value 1 are drawn as blue and yellow boxes, selector bits with value 0 are drawn as gray boxes. In this example the second row selector bit and the last column selector are 0, therefore the instruction is not executed in the second row of the array and in the last column of the array.
|Figure 1: Stepwise execution of an instruction|
There is no need to wait until the instruction has moved through the whole array: the next instruction, say *, together with its selectors, can enter the array immediately after the +-instruction (Figure 2). In this example, the *-instruction is masked in the last row of the array.
|Figure 2: Pipelined execution of two instructions|
Data communication between the processors is realized using a special communication register C that is provided in each processor. This register can be read by the four direct neighbours of the processor. In order to establish a data communication between two adjacent processors P and Q, processor P writes a data item x into its communication register and, in the next instruction cycle, processor Q reads x from P's communication register.
Laisa is a Pascal-like programming language for ISA programs. It supports control structures like conditional statements and loops as well as procedures.
Elementary statements in Laisa are of the form
< instruction; row-selector; column-selector >.
Instructions can be register assignments of the form
set source-register, destination-register
or arithmetical or logical operations of the form
instructioncode source-register1, source-register2, destination-register
Registers can be any of the data registers R0, ..., R31, or the communication register C, or the communication registers of the western, northern, eastern or southern neighbour CW, CN, CE, CS, respectively, the latter only as source registers.
Selectors are sequences of 0's and 1's of length n, where n is the size of the processor array. There are different possibilities to denote selectors by expressions; the following examples illustrate this.
The first possibility is to denote selectors by a kind of regular expression, where the length of the selector is determined by a given n. Here, it is assumed that n = 8.
It is also possible to give the numbers of 0's and 1's explicitely, as in the following examples.
Finally, it is possible to denote selectors by the positions of the 1's, as is shown in the following examples.
Examples of statements are
< set R0, C; 1*; 1* >; < add R0, R1, R0; 1*; [2..n] >;
In the first statement, the contents of register R0 is moved into the communication register C, in all rows and all columns. In the second statement, the contents of registers R0 and R1 is added and written into register R0; this is done in all rows and in all columns except the first column.
|Next: [Communication primitives] or|
|H.W. Lang FH Flensburg email@example.com Impressum © Created: 17.09.1998 Updated: 11.10.2008|