Editing Cray-1 (section)

===Cray's approach===
Cray studied the failure of the STAR and learned from it{{citation needed|date=November 2020}}. He decided that in addition to fast vector processing, his design would also require excellent all-around scalar performance. That way when the machine switched modes, it would still provide superior performance. Additionally he noticed that the workloads could be dramatically improved in most cases through the use of [[processor register|registers]].

Just as earlier machines had ignored the fact that most operations were being applied to many data points, the STAR ignored the fact that those same data points would be repeatedly operated on. Whereas the STAR would read and process the same memory five times to apply five vector operations on a set of data, it would be much faster to read the data into the CPU's registers once, and then apply the five operations. However, there were limitations with this approach. Registers were significantly more expensive in terms of circuitry, so only a limited number could be provided. This implied that Cray's design would have less flexibility in terms of vector sizes. Instead of reading any sized vector several times as in the STAR, the Cray-1 would have to read only a portion of the vector at a time, but it could then run several operations on that data prior to writing the results back to memory. Given typical workloads, Cray felt that the small cost incurred by being required to break large sequential memory accesses into segments was a cost well worth paying.

Since the typical vector operation would involve loading a small set of data into the vector registers and then running several operations on it, the vector system of the new design had its own separate pipeline. For instance, the multiplication and addition units were implemented as separate hardware, so the results of one could be internally pipelined into the next, the instruction decode having already been handled in the machine's main pipeline. Cray referred to this concept as ''chaining'', as it allowed programmers to "chain together" several instructions and extract higher performance.