Editing Systolic array (section)

==Examples==
=== Polynomial evaluation ===
[[Horner's rule]] for evaluating a polynomial is:

:<math>
y = ( \ldots ( ( (a_n \cdot x + a_{n-1}) \cdot x + a_{n-2}) \cdot x + a_{n-3}) \cdot x + \ldots + a_1) \cdot x + a_0.
</math>

A linear systolic array in which the processors are arranged in pairs: one multiplies its input by <math>x</math> and passes the result to the right, the next adds <math>a_j</math> and passes the result to the right.

=== Convolution ===

Consider a chain of processing elements (PEs), each performing a [[Multiply–accumulate operation|multiply-accumulate operation]]. It processes input data (<math>x_i</math>) and weights (<math>w_i</math>) systolically, meaning data flows through the array in a regular, rhythmic manner.  The weights remain stationary within each PE, while the input data and partial sums (<math>y_i</math>) move in opposite directions.

Each PE performs the following operation:<math display="block">
\begin{aligned}
y_{out} &= y_{in} + w \cdot x_{in} \\
x_{out} &= x_{in}
\end{aligned} 
</math>where:

* <math>x_{in}</math> is the input data.
* <math>y_{in}</math> is the incoming partial sum.
* <math>w</math> is the weight stored in the PE.
* <math>x_{out}</math> is the output data (passed to the next PE).
* <math>y_{out}</math> is the updated partial sum.

From the left, the input stream is <math>
\dots, x_3, 0, x_2, 0, x_1
</math>, and from the right, the output stream is <math>
y_1, y_2, y_3, \dots
</math>. If <math>
y_1, x_1
</math> enter the rightmost PE simultaneously, then the leftmost PE outputs<math display="block">
\begin{aligned}
y_1 &= w_1 x_1 + w_2 x_2 + w_3 x_3 + \cdots \\
y_2 &= w_1 x_2 + w_2 x_3 + w_3 x_4 + \cdots \\
&\vdots 
\end{aligned}
</math>This is the 1-dimensional convolution. Similarly, n-dimensional convolution can be computed by an n-dimensional array of PEs.

Many other implementations of the 1D convolutions are available, with different data flows.<ref name=":0">{{Cite journal |last=Kung |date=January 1982 |title=Why systolic architectures? |url=https://www.semanticscholar.org/paper/Why-systolic-architectures-Kung/256bbe8e9fa3f5b72c24f1037ab734f9e7dd01c4 |journal=Computer |volume=15 |issue=1 |pages=37–46 |doi=10.1109/MC.1982.1653825 |issn=0018-9162}}</ref>

See <ref name=":0" /> Figure 12 for an algorithm that performs on-the-fly [[Least squares|least-squares]] using one- and two-dimensional systolic arrays.