Editing Microcode (section)

==History==
===Early examples===
In 1947, the design of the [[Whirlwind (computer)|MIT Whirlwind]] introduced the concept of a control store as a way to simplify computer design and move beyond ''[[ad hoc]]'' methods. The control store is a [[diode matrix]]: a two-dimensional lattice, where one dimension accepts "control time pulses" from the CPU's internal clock, and the other connects to control signals on gates and other circuits. A "pulse distributor" takes the pulses generated by the [[CPU clock]] and breaks them up into eight separate time pulses, each of which activates a different row of the lattice. When the row is activated, it activates the control signals connected to it.<ref>{{Cite tech report |last1=Everett |first1=R.R. |last2=Swain |first2=F.E. |year=1947 |title=Whirlwind I Computer Block Diagrams |publisher=MIT Servomechanisms Laboratory |id=R-127 |url=http://www.cryptosmith.com/wp-content/uploads/2009/05/whirlwindr-127.pdf |access-date=June 21, 2006 |url-status=dead |archive-url=https://web.archive.org/web/20120617112919/http://www.cryptosmith.com/wp-content/uploads/2009/05/whirlwindr-127.pdf |archive-date=June 17, 2012}}</ref>

In 1951, [[Maurice Wilkes]]<ref>{{multiref|{{cite tech report |last=Wilkes |first=Maurice |year=1951 |title=The Best Way to Design an Automatic Calculating Machine |institution=[[University of Manchester]]}}|{{cite book |last=Wilkes |first=Maurice |chapter=The Best Way to Design an Automatic Calculating Machine |chapter-url=https://www.cs.princeton.edu/courses/archive/fall09/cos375/BestWay.pdf |editor-first=M. |editor-last=Campbell-Kelly |title=The early British computer conferences |publisher=MIT Press |date=1989 |isbn=978-0-262-23136-7 |pages=182–4 }}}}</ref> enhanced this concept by adding ''conditional execution'', a concept akin to a [[Conditional (programming)|conditional]] in computer software. His initial implementation consisted of a pair of matrices: the first one generated signals in the manner of the Whirlwind control store, while the second matrix selected which row of signals (the microprogram instruction word, so to speak) to invoke on the next cycle. Conditionals were implemented by providing a way that a single line in the control store could choose from alternatives in the second matrix. This made the control signals conditional on the detected internal signal. Wilkes coined the term ''microprogramming'' to describe this feature and distinguish it from a simple control store.

===The 360===
{{main|System/360}}
Microcode remained relatively rare in computer design as the cost of the ROM needed to store the code was not significantly different from the cost of custom control logic. This changed through the early 1960s with the introduction of mass-produced [[core memory]] and [[core rope]], which was far less expensive than dedicated logic based on diode arrays or similar solutions. The first to take real advantage of this was [[IBM]] in their 1964 [[System/360]] series. This allowed the machines to have a very complex instruction set, including operations that matched high-level language constructs like formatting binary values as decimal strings, encoding the complex series of internal steps needed for this task in low cost memory.<ref name=IBM>{{cite web |url=https://www.righto.com/2022/01/ibm360model50.html |title=Simulating the IBM 360/50 mainframe from its microcode |website=Ken Shirriff's blog |first=Ken |last=Shirriff}}</ref>

But the real value in the 360 line was that one could build a series of machines that were completely different internally, yet run the same ISA. For a low-end machine, one might use an 8-bit ALU that requires multiple cycles to complete a single 32-bit addition, while a higher end machine might have a full 32-bit ALU that performs the same addition in a single cycle. These differences could be implemented in control logic, but the cost of implementing a completely different decoder for each machine would be prohibitive. Using microcode meant all that changed was the code in the ROM. For instance, one machine might include a [[floating point unit]] and thus its microcode for multiplying two numbers might be only a few lines line, whereas on the same machine without the FPU this would be a program that did the same using multiple additions, and all that changed was the ROM.<ref name=IBM/>

The outcome of this design was that customers could use a low-end model of the family to develop their software, knowing that if more performance was ever needed, they could move to a faster version and nothing else would change. This lowered the barrier to entry and the 360 was a runaway success. By the end of the decade, the use of microcode was ''de rigueur'' across the mainframe industry.

===Moving up the line===
[[File:Motorola 68000 die.JPG|thumb|The microcode (and "nanocode") of the [[Motorola 68000]] is stored in the two large square blocks in the upper right and controlled by circuitry to the right of it. It takes up a significant amount of the total chip surface.]]
Early [[minicomputer]]s were far too simple to require microcode, and were more similar to earlier mainframes in terms of their instruction sets and the way they were decoded. But it was not long before their designers began using more powerful [[integrated circuit]]s that allowed for more complex ISAs. By the mid-1970s, most new minicomputers and [[superminicomputer]]s were using microcode as well, such as most models of the [[PDP-11]] and, most notably, most models of the [[VAX]], which included high-level instruction not unlike those found in the 360.<ref>{{cite book |date=May 1988 |title=VLSI VAX Micro-Architecture |first=Bob |last=Supnik |publisher=Digital Equipment |url=http://simh.trailing-edge.com/docs/microarch.pdf}}</ref>

The same basic evolution occurred with [[microprocessor]]s as well. Early designs were extremely simple, and even the more powerful 8-bit designs of the mid-1970s like the [[Zilog Z80]] had instruction sets that were simple enough to be implemented in dedicated logic. By this time, the control logic could be patterned into the same die as the CPU, making the difference in cost between ROM and logic less of an issue. However, it was not long before these companies were also facing the problem of introducing higher-performance designs but still wanting to offer [[backward compatibility]]. Among early examples of microcode in micros was the [[Intel 8086]].<ref name=microcode/>

Among the ultimate implementations of microcode in microprocessors is the [[Motorola 68000]]. This offered a highly [[orthogonal instruction set]] with a wide variety of [[addressing mode]]s, all implemented in microcode. This did not come without cost, according to early articles, about 20% of the chip's surface area (and thus cost) is the microcode system<ref>{{cite magazine |magazine=Byte |date= April 1983 |title=Design Philosophy Behind Motorola's MC68000 |first= Thomas |last= Starnes |url=http://www.easy68k.com/paulrsm/doc/dpbm68k1.htm}}</ref> and {{cn|reason=later estimates suggest approximately 23,000|date=December 2024}} of the systems 68,000 transistors were part of the microcode system.

===RISC enters===
While companies continued to compete on the complexity of their instruction sets, and the use of microcode to implement these was unquestioned, in the mid-1970s an internal project in IBM was raising serious questions about the entire concept. As part of a project to develop a high-performance all-digital [[telephone switch]], a team led by [[John Cocke (computer scientist)|John Cocke]] began examining huge volumes of performance data from their customer's 360 (and [[System/370]]) programs. This led them to notice a curious pattern: when the ISA presented multiple versions of an instruction, the [[compiler]] almost always used the simplest one, instead of the one most directly representing the code. They learned that this was because those instructions were always implemented in hardware, and thus run the fastest. Using the other instruction might offer higher performance on some machines, but there was no way to know what machine they were running on. This defeated the purpose of using microcode in the first place, which was to hide these distinctions.<ref name=risc>{{Cite journal
 | last1 = Cocke | first1 = John
 | last2 = Markstein | first2 = Victoria
 | doi = 10.1147/rd.341.0004
 | url = https://www.cis.upenn.edu/~milom/cis501-Fall11/papers/cocke-RISC.pdf
 | title = The evolution of RISC technology at IBM
 | journal = IBM Journal of Research and Development
 | volume = 34| issue = 1
 | pages = 4–11
 | date=January 1990
 }}</ref>

The team came to a radical conclusion: "Imposing microcode between a computer and its users imposes an expensive overhead in performing the most frequently executed instructions."<ref name=risc/>

The result of this discovery was what is today known as the [[RISC]] concept. The complex microcode engine and its associated ROM is reduced or eliminated completely, and those circuits instead dedicated to things like additional registers or a wider ALU, which increases the performance of every program. When complex sequences of instructions are needed, this is left to the compiler, which is the entire purpose of using a compiler in the first place. The basic concept was soon picked up by university researchers in California, where simulations suggested such designs would trivially outperform even the fastest conventional designs. It was one such project, at the [[University of California, Berkeley]], that introduced the term RISC.

The industry responded to the concept of RISC with both confusion and hostility, including a famous dismissive article by the VAX team at Digital.<ref name=comments>{{cite journal |url=https://dl.acm.org/doi/pdf/10.1145/641914.641918 |title=Comments on "The Case for the Reduced Instruction Computer" |first1=Douglas |last1=Clark |first2=William |last2=Strecker |date=September 1980 |journal=ACM|volume=8 |issue=6 |pages=34–38 |doi=10.1145/641914.641918 |s2cid=14939489 |url-access=subscription }}</ref> A major point of contention was that implementing the instructions outside of the processor meant it would spend much more time reading those instructions from memory, thereby slowing overall performance no matter how fast the CPU itself ran.<ref name=comments/> Proponents pointed out that simulations clearly showed the number of instructions was not much greater, especially when considering compiled code.<ref name=risc/>

The debate raged until the first commercial RISC designs emerged in the second half of the 1980s, which easily outperformed the most complex designs from other companies. By the late 1980s it was over; even DEC was abandoning microcode for their [[DEC Alpha]] designs, and CISC processors switched to using hardwired circuitry, rather than microcode, to perform many functions.  For example, the [[Intel 80486]] uses hardwired circuitry to fetch and decode instructions, using microcode only to execute instructions; register-register move and arithmetic instructions required only one microinstruction, allowing them to be completed in one clock cycle.<ref>{{cite conference|url=https://ieeexplore.ieee.org/document/63682|title=The execution pipeline of the Intel i486 CPU|book-title= Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage|publisher=[[IEEE]]|isbn=0-8186-2028-5|location=San Francisco, CA|doi=10.1109/CMPCON.1990.63682|url-access=subscription}}</ref>  The [[Pentium Pro]]'s fetch and decode hardware fetches instructions and decodes them into series of micro-operations that are passed on to the execution unit, which schedules and executes the micro-operations, possibly doing so [[out-of-order execution|out-of-order]].  Complex instructions are implemented by microcode that consists of predefined sequences of micro-operations.<ref>{{cite web|url=http://stffrdhrn.github.io/content/2019/Intel_PentiumPro.pdf|title=Pentium Pro Processor At 150, 166, 180, and 200 MHz|publisher=[[Intel]]|date=November 1995|type=Datasheet}}</ref>

Some processor designs use machine code that runs in a special mode, with special instructions, available only in that mode, that have access to processor-dependent hardware, to implement some low-level features of the instruction set. The DEC Alpha, a pure RISC design, used [[PALcode]] to implement features such as [[translation lookaside buffer]] (TLB) miss handling and interrupt handling,<ref name="axp-architecture-manual">{{cite book|url=http://bitsavers.org/pdf/dec/alpha/Sites_AlphaAXPArchitectureReferenceManual_2ed_1995.pdf|title=Alpha AXP Architecture Reference Manual|edition=Second|chapter=Part I / Common Architecture, Chapter 6 Common PALcode Architecture|publisher=[[Digital Press]]|date=1995|isbn=1-55558-145-5}}</ref> as well as providing, for Alpha-based systems running [[OpenVMS]], instructions requiring interlocked memory access that are similar to instructions provided by the [[VAX]] architecture.<ref name="axp-architecture-manual" /> CMOS [[IBM System/390]] CPUs, starting with the G4 processor, and [[z/Architecture]] CPUs use [[millicode]] to implement some instructions.<ref>{{cite journal|last=Rogers|first=Bob|title=The What and Why of zEnterprise Millicode|journal=IBM Systems Magazine|date=Sep–Oct 2012|url=http://www.ibmsystemsmag.com/mainframe/administrator/performance/millicode_rogers/|archive-url=https://web.archive.org/web/20121009085728/http://www.ibmsystemsmag.com/mainframe/administrator/performance/millicode_rogers/|archive-date=October 9, 2012|url-status=dead}}</ref>