Editing Berkeley RISC (section)

==The RISC concept==
{{Main|Reduced instruction set computer}}
Both RISC and MIPS were developed from the realization that the vast majority of programs used only a small minority of a processor's available instruction set. In a famous 1978 paper, [[Andrew S. Tanenbaum]] demonstrated that a complex 10,000 line high-level program could be represented using a simplified [[instruction set architecture]] using an 8-bit fixed-length opcode.<ref name=implications>{{cite journal |journal=Communications of the ACM |date=March 1978 |volume=21 |number=3 |pages=237–246 |first= Andrew |last= Tanenbaum |title=Implications of Structured Programming for Machine Architecture|doi=10.1145/359361.359454 |s2cid=3261560 |url=https://research.vu.nl/ws/files/110789436/11056 |doi-access=free }}</ref> This was roughly the same conclusion reached at [[IBM]], whose studies of their own code running on mainframes like the [[IBM 360]] used only a small subset of all the instructions available. Both of these studies suggested that one could produce a much simpler CPU that would still run most real-world code. Another finding, not fully explored at the time, was Tanenbaum's note that 81% of the constants were either 0, 1, or 2.<ref name=implications/>

These realizations were taking place as the [[microprocessor]] market was moving from 8 to 16-bit with 32-bit designs about to appear. Those designs were premised on the goal of replicating some of the more well-respected existing ISAs from the mainframe and minicomputer world. For instance, the [[National Semiconductor NS32000]] started out as an effort to produce a single-chip implementation of the [[VAX-11]], which had a rich instruction set with a wide variety of [[addressing mode]]s. The [[Motorola 68000]] was similar in general layout. To provide this rich set of instructions, CPUs used [[microcode]] to decode the user-visible instruction into a series of internal operations. This microcode represented perhaps {{frac|4}} to {{frac|3}} of the transistors of the overall design.

If, as these other papers suggested, the majority of these opcodes would never be used in practice, then this significant resource was being wasted. If one were to simply build the same processor with the unused instructions removed it would be smaller and thus less expensive, while if one instead used those transistors to improve performance instead of decoding instructions that would not be used, a faster processor was possible. The RISC concept was to take advantage of both of these, producing a CPU that was the same level of complexity as the 68000, but much faster.

To do this, RISC concentrated on adding many more [[Processor register|registers]], small bits of memory holding temporary values that can be accessed very rapidly. This contrasts with normal [[main memory]], which might take several cycles to access. By providing more registers, and making sure the compilers actually used them, programs should run much faster. Additionally, the speed of the processor would be more closely defined by its clock speed, because less of its time would be spent waiting for memory accesses. Transistor for transistor, a RISC design would outperform a conventional CPU.

On the downside, the instructions being removed were generally performing several "sub-instructions". For instance, the <code>ADD</code> instruction of a traditional design would generally come in several flavours, one that added the numbers in two registers and placed it in a third, another that added numbers found in main memory and put the result in a register, etc. The RISC designs, on the other hand, included only a single flavour of any particular instruction, the <code>ADD</code>, for instance, would ''always'' use registers for all operands. This forced the programmer to write additional instructions to load the values from memory, if needed, making a RISC program "less dense".

In the era of expensive memory this was a real concern, notably because memory was also much slower than the CPU. Since a RISC design's <code>ADD</code> would actually require four instructions (two loads, an add, and a save), the machine would have to do much more memory access to read the extra instructions, potentially slowing it down considerably. This was offset to some degree by the fact that the new designs used what was then a very large instruction word of [[32-bit]]s, allowing small constants to be folded directly into the instruction instead of having to be loaded separately. Additionally, the results of one operation are often used soon after by another, so by skipping the write to memory and storing the result in a register, the program did not end up much larger, and could in theory run much faster. For instance, a string of instructions carrying out a series of mathematical operations might require only a few loads from memory, while the majority of the numbers being used would be either constants in the instructions, or intermediate values left in the registers from prior calculations. In a sense, in this technique some registers are used to ''shadow'' memory locations, so that the registers are used as proxies for the memory locations until their final values after a group of instructions have been determined.

To the casual observer, it was not clear that the RISC concept would improve performance, and it might even make it worse. The only way to be sure was to simulate it. The results of such simulations were clear; in test after test, every simulation showed an enormous overall benefit in performance from this design.

Where the two projects, RISC and MIPS, differed was in the handling of the registers. MIPS simply added lots of registers and left it to the compilers (or [[assembly language]] programmers) to make use of them. RISC, on the other hand, added circuitry to the CPU to assist the compiler. RISC used the concept of [[register window]]s, in which the entire "register file" was broken down into blocks, allowing the compiler to "see" one block for global variables, and another for local variables.

The idea was to make one particularly common instruction, the [[procedure call]], extremely easy to implement. Almost all [[programming language]]s use a system known as an ''activation record'' or ''stack frame'' for each procedure which contains the address from which the procedure was called, the data (parameters) that were passed in, and space for any result values that need to be returned. In the vast majority of cases these frames are small, typically with three or fewer inputs and one or no outputs (and sometimes an input is reused as an output). In the Berkeley design, then, a register window was a set of several registers, enough of them that the entire procedure stack frame would most likely fit entirely within the register window.

In this case, the call into and return from a procedure is simple and extremely fast. A single instruction is called to set up a new block of registers—a new register window—and then, with operands passed into the procedure in the "low end" of the new window, the program jumps into the procedure. On return, the results are placed in the window at the same end, and the procedure exits. The register windows are set up to overlap at the ends, so that the results from the call simply "appear" in the window of the caller, ''with no data having to be copied''. Thus the common procedure call does not have to interact with main memory, greatly accelerating it.

On the downside, this approach means that procedures with large numbers of local variables are problematic, and ones with fewer lead to registers—an expensive resource—being wasted. There are a finite number of register windows in the design, e.g., eight, so procedures can only be nested that many levels deep before the register windowing mechanism reaches its limit; once the last window is reached, no new window can be set up for another nested call. And if procedures are only nested a few levels deep, registers in the windows above the deepest call nesting level can never be accessed at all, so these are completely wasted. It was Stanford's work on compilers that led them to ignore the register window concept, believing that an efficient compiler could make better use of the registers than a fixed system in hardware. (The same reasoning would apply for a smart assembly language programmer.)