Editing Sparse matrix (section)

===Compressed sparse row (CSR, CRS or Yale format)===

The ''compressed sparse row'' (CSR) or ''compressed row storage'' (CRS) or Yale format represents a matrix {{math|'''M'''}} by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. It is similar to COO, but compresses the row indices, hence the name. This format allows fast row access and matrix-vector multiplications ({{math|'''M'''''x''}}). The CSR format has been in use since at least the mid-1960s, with the first complete description appearing in 1967.<ref>{{cite conference |first1=Aydın |last1=Buluç |first2=Jeremy T. |last2=Fineman |first3=Matteo |last3=Frigo |first4=John R. |last4=Gilbert |first5=Charles E. |last5=Leiserson |authorlink5=Charles E. Leiserson |title=Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks |year=2009 |conference=ACM Symp. on Parallelism in Algorithms and Architectures |url=https://people.eecs.berkeley.edu/~aydin/csb2009.pdf |citeseerx=10.1.1.211.5256}}</ref>

The CSR format stores a sparse {{math|''m'' × ''n''}} matrix {{math|'''M'''}} in row form using three (one-dimensional) arrays {{math|(V, COL_INDEX, ROW_INDEX)}}. Let {{math|NNZ}} denote the number of nonzero entries in {{math|'''M'''}}. (Note that [[Zero-based numbering|zero-based indices]] shall be used here.)

* The arrays {{math|V}} and {{math|COL_INDEX}} are of length {{math|NNZ}}, and contain the non-zero values and the column indices of those values respectively
* {{math|COL_INDEX}} contains the column in which the corresponding entry {{math|V}} is located.
* The array {{math|ROW_INDEX}} is of length {{math|''m'' + 1}} and encodes the index in {{math|V}} and {{math|COL_INDEX}} where the given row starts. This is equivalent to {{math|ROW_INDEX[j]}} encoding the total number of nonzeros above row {{math|j}}.  The last element is {{math|NNZ}} , i.e., the fictitious index in {{Math|V}} immediately after the last valid index {{math|NNZ − 1}}.<ref name=Saad03>{{harvnb|Saad|2003|publisher=SIAM}}</ref>

For example, the matrix
<math display="block">\begin{pmatrix}
5 & 0 & 0 & 0 \\
0 & 8 & 0 & 0 \\
0 & 0 & 3 & 0 \\
0 & 6 & 0 & 0 \\
\end{pmatrix}</math>
is a {{math|4 × 4}} matrix with 4 nonzero elements, hence
 V         = [ 5 8 3 6 ]
 COL_INDEX = [ 0 1 2 1 ]
 ROW_INDEX = [ 0 1 2 3 4 ] 
assuming a zero-indexed language.

To extract a row, we first define:
 row_start = ROW_INDEX[row]
 row_end   = ROW_INDEX[row + 1]

Then we take slices from V and COL_INDEX starting at row_start and ending at row_end.

To extract the row 1 (the second row) of this matrix we set <code>row_start=1</code> and <code>row_end=2</code>. Then we make the slices <code>V[1:2] = [8]</code> and <code>COL_INDEX[1:2] = [1]</code>. We now know that in row 1 we have one element at column 1 with value 8.

In this case the CSR representation contains 13 entries, compared to 16 in the original matrix. The CSR format saves on memory only when {{math|NNZ < (''m'' (''n'' − 1) − 1) / 2}}.

Another example, the matrix
<math display="block">\begin{pmatrix}
10 & 20 &  0 &  0 &  0 &  0 \\
 0 & 30 &  0 & 40 &  0 &  0 \\
 0 &  0 & 50 & 60 & 70 &  0 \\
 0 &  0 &  0 &  0 &  0 & 80 \\
\end{pmatrix}</math>
is a {{math|4 × 6}} matrix (24 entries) with 8 nonzero elements, so
 V         = [ 10 20 30 40 50 60 70 80 ]
 COL_INDEX = [  0  1  1  3  2  3  4  5 ]   
 ROW_INDEX = [  0  2  4  7  8 ]

The whole is stored as 21 entries: 8 in {{math|V}}, 8 in {{math|COL_INDEX}}, and 5 in {{math|ROW_INDEX}}.

* {{math|ROW_INDEX}} splits the array {{math|V}} into rows: <code>(10, 20) (30, 40) (50, 60, 70) (80)</code>, indicating the index of {{math|V}} (and {{math|COL_INDEX}}) where each row starts and ends;
* {{math|COL_INDEX}} aligns values in columns: <code>(10, 20, ...) (0, 30, 0, 40, ...)(0, 0, 50, 60, 70, 0) (0, 0, 0, 0, 0, 80)</code>.

Note that in this format, the first value of {{math|ROW_INDEX}} is always zero and the last is always {{math|NNZ}}, so they are in some sense redundant (although in programming languages where the array length needs to be explicitly stored, {{math|NNZ}} would not be redundant). Nonetheless, this does avoid the need to handle an exceptional case when computing the length of each row, as it guarantees the formula {{math|ROW_INDEX[''i'' + 1] − ROW_INDEX[''i'']}} works for any row {{math|''i''}}. Moreover, the memory cost of this redundant storage is likely insignificant for a sufficiently large matrix.

The (old and new) Yale sparse matrix formats are instances of the CSR scheme. The old Yale format works exactly as described above, with three arrays; the new format combines {{math|ROW_INDEX}} and {{math|COL_INDEX}} into a single array and handles the diagonal of the matrix separately.<ref>{{citation |title=Sparse Matrix Multiplication Package (SMMP) |first1=Randolph E. |last1=Bank |first2=Craig C. |last2=Douglas |journal=Advances in Computational Mathematics |volume=1 |year=1993 |pages=127–137 |doi=10.1007/BF02070824 |s2cid=6412241 |url=http://www.mgnet.org/~douglas/Preprints/pub0034.pdf}}</ref>

For [[Logical matrix | logical]] [[Adjacency matrix | adjacency matrices]], the data array can be omitted, as the existence of an entry in the row array is sufficient to model a binary adjacency relation.

It is likely known as the Yale format because it was proposed in the 1977 Yale Sparse Matrix Package report from Department of Computer Science at Yale University.<ref>{{cite web |url=https://apps.dtic.mil/dtic/tr/fulltext/u2/a047724.pdf |archive-url=https://web.archive.org/web/20190406045412/https://apps.dtic.mil/dtic/tr/fulltext/u2/a047724.pdf |url-status=live |archive-date=April 6, 2019 |title=Yale Sparse Matrix Package |last1=Eisenstat  |first1=S. C. |last2=Gursky |first2=M. C. |last3=Schultz |first3=M. H. |last4=Sherman |first4=A. H. |date=April 1977 |language=English |access-date=6 April 2019 }}</ref>