Editing Supercomputer (section)

===Massively parallel designs===
{{Main|Supercomputer architecture|Parallel computer hardware}}
[[File:BlueGeneL cabinet.jpg|thumb|upright|220px|A cabinet of the massively parallel [[Blue Gene]]/L, showing the stacked [[Blade server|blades]], each holding many processors]]
The only computer to seriously challenge the Cray-1's performance in the 1970s was the [[ILLIAC IV]]. This machine was the first realized example of a true [[massively parallel]] computer, in which many processors worked together to solve different parts of a single larger problem. In contrast with the vector systems, which were designed to run a single stream of data as quickly as possible, in this concept, the computer instead feeds separate parts of the data to entirely different processors and then recombines the results. The ILLIAC's design was finalized in 1966 with 256 processors and offer speed up to 1&nbsp;GFLOPS, compared to the 1970s Cray-1's peak of 250&nbsp;MFLOPS. However, development problems led to only 64 processors being built, and the system could never operate more quickly than about 200&nbsp;MFLOPS while being much larger and more complex than the Cray. Another problem was that writing software for the system was difficult, and getting peak performance from it was a matter of serious effort.

But the partial success of the ILLIAC IV was widely seen as pointing the way to the future of supercomputing. Cray argued against this, famously quipping that "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?"<ref>{{cite web|url=https://www.brainyquote.com/quotes/seymour_cray_103779|title=Seymour Cray Quotes|website=BrainyQuote}}</ref> But by the early 1980s, several teams were working on parallel designs with thousands of processors, notably the [[Connection Machine]] (CM) that developed from research at [[MIT]]. The CM-1 used as many as 65,536 simplified custom [[microprocessor]]s connected together in a [[computer network|network]] to share data. Several updated versions followed; the CM-5 supercomputer is a massively parallel processing computer capable of many billions of arithmetic operations per second.<ref>{{cite web|title=ComputerGK.com : Supercomputers |url=http://www.computergk.com/computers/supercomputers/ |date=3 October 2014 |author=Steve Nelson}}</ref>

In 1982, [[Osaka University]]'s [[Supercomputing in Japan|LINKS-1 Computer Graphics System]] used a [[massively parallel]] processing architecture, with 514 [[microprocessor]]s, including 257 [[Zilog Z8000|Zilog Z8001]] [[Central processing unit|control processors]] and 257 [[iAPX]] [[IAPX 86|86/20]] [[Floating-point unit|floating-point processors]]. It was mainly used for rendering realistic [[3D computer graphics]].<ref>{{cite web|url=http://museum.ipsj.or.jp/en/computer/other/0013.html|title=LINKS-1 Computer Graphics System-Computer Museum|website=museum.ipsj.or.jp}}</ref> Fujitsu's VPP500 from 1992 is unusual since, to achieve higher speeds, its processors used [[GaAs]], a material normally reserved for microwave applications due to its toxicity.<ref>{{Cite web | url=https://www.fujitsu.com/global/about/corporate/history/products/computer/supercomputer/vpp500.html |title = VPP500 (1992) - Fujitsu Global}}</ref> [[Fujitsu]]'s [[Numerical Wind Tunnel]] supercomputer used 166 vector processors to gain the top spot in 1994 with a peak speed of 1.7&nbsp;[[FLOPS|gigaFLOPS (GFLOPS)]] per processor.<ref>{{cite web|url=http://www.netlib.org/benchmark/top500/reports/report94/main.html |title=TOP500 Annual Report 1994 |publisher=Netlib.org |date=1 October 1996 |access-date=9 June 2012}}</ref><ref>{{Cite conference
|author1=N. Hirose  |author2=M. Fukuda
|title=Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97
|name-list-style=amp
|year=1997
|chapter=Numerical Wind Tunnel (NWT) and CFD Research at National Aerospace Laboratory
|pages=99–103
|conference=Proceedings of HPC-Asia '97
|publisher=IEEE Computer SocietyPages
|doi=10.1109/HPC.1997.592130
|isbn=0-8186-7901-8
}}</ref> The [[Hitachi SR2201]] obtained a peak performance of 600&nbsp;GFLOPS in 1996 by using 2048 processors connected via a fast three-dimensional [[crossbar switch|crossbar]] network.<ref>H. Fujii, Y. Yasuda, H. Akashi, Y. Inagami, M. Koga, O. Ishihara, M. Syazwan, H. Wada, T. Sumimoto, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.5625&rep=rep1&type=pdf Architecture and performance of the Hitachi SR2201 massively parallel processor system], Proceedings of 11th International Parallel Processing Symposium, April 1997, pages 233–241.</ref><ref>Y. Iwasaki, The CP-PACS project, Nuclear Physics B: Proceedings Supplements, Volume 60, Issues 1–2, January 1998, pages 246–254.</ref><ref>A.J. van der Steen, [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.7986&rep=rep1&type=pdf Overview of recent supercomputers], Publication of the NCF, Stichting Nationale Computer Faciliteiten, the Netherlands, January 1997.</ref> The [[Intel Paragon]] could have 1000 to 4000 [[Intel i860]] processors in various configurations and was ranked the fastest in the world in 1993. The Paragon was a [[Multiple instruction, multiple data|MIMD]] machine which connected processors via a high speed two-dimensional mesh, allowing processes to execute on separate nodes, communicating via the [[Message Passing Interface]].<ref>''Scalable input/output: achieving system balance'' by Daniel A. Reed 2003 {{ISBN|978-0-262-68142-1}} page 182</ref>

Software development remained a problem, but the CM series sparked off considerable research into this issue. Similar designs using custom hardware were made by many companies, including the [[Evans & Sutherland ES-1]], [[MasPar]], [[nCUBE]], [[Intel iPSC]] and the [[Goodyear MPP]]. But by the mid-1990s, general-purpose CPU performance had improved so much in that a supercomputer could be built using them as the individual processing units, instead of using custom chips. By the turn of the 21st century, designs featuring tens of thousands of commodity CPUs were the norm, with later machines adding [[GPGPU|graphic units]] to the mix.<ref name="Hoffman"/><ref name="Jouppi"/>

In 1998, [[David A. Bader|David Bader]] developed the first [[Linux]] supercomputer using commodity parts.<ref name=fernbach>{{cite web| url= https://www.computer.org/press-room/2021-news/david-bader-to-receive-2021-ieee-cs-sidney-fernbach-award | title=David Bader Selected to Receive the 2021 IEEE Computer Society Sidney Fernbach Award|publisher=IEEE Computer Society|date=September 22, 2021 |accessdate= 2023-10-12}}</ref> While at the University of New Mexico, Bader sought to build a supercomputer running Linux using consumer off-the-shelf parts and a high-speed low-latency interconnection network. The prototype utilized an Alta Technologies "AltaCluster" of eight dual, 333&nbsp;MHz, Intel Pentium II computers running a modified Linux kernel. Bader ported a significant amount of software to provide Linux support for necessary components as well as code from members of the National Computational Science Alliance (NCSA) to ensure interoperability, as none of it had been run on Linux previously.<ref name=IEEEhistory>{{cite journal|last=Bader|first=David A.|journal=IEEE Annals of the History of Computing|title=Linux and Supercomputing: How My Passion for Building COTS Systems Led to an HPC Revolution|date=2021|volume=43|issue=3|pages=73–80|doi=10.1109/MAHC.2021.3101415|s2cid=237318907 |doi-access=free}}</ref> Using the successful prototype design, he led the development of "RoadRunner," the first Linux supercomputer for open use by the national science and engineering community via the National Science Foundation's National Technology Grid. RoadRunner was put into production use in April 1999. At the time of its deployment, it was considered one of the 100 fastest supercomputers in the world.<ref name=IEEEhistory/><ref name="AJRoadRunner">{{cite news|last=Fleck|first=John|title=UNM to crank up $400,000 supercomputer today|newspaper=[[Albuquerque Journal]]|date=April 8, 1999|page=D1}}</ref>  Though Linux-based clusters using consumer-grade parts, such as [[Beowulf cluster|Beowulf]], existed prior to the development of Bader's prototype and RoadRunner, they lacked the scalability, bandwidth, and parallel computing capabilities to be considered "true" supercomputers.<ref name=IEEEhistory/>

[[File:Processor families in TOP500 supercomputers.svg|thumb|right|The CPU share of [[TOP500]]]]
[[File:2x2x2torus.svg|thumb|Diagram of a three-dimensional [[torus interconnect]] used by systems such as Blue Gene, Cray XT3, etc.]]
Systems with a massive number of processors generally take one of two paths. In the [[grid computing]] approach, the processing power of many computers, organized as distributed, diverse administrative domains, is opportunistically used whenever a computer is available.<ref name=Prodan>{{cite book |title=Grid computing: experiment management, tool integration, and scientific workflows |url=https://archive.org/details/gridcomputingexp00prod |url-access=limited |first1=Radu |last1=Prodan |first2=Thomas |last2=Fahringer |year=2007 |isbn=978-3-540-69261-4 |pages=[https://archive.org/details/gridcomputingexp00prod/page/n17 1]–4 |publisher=Springer }}</ref> In another approach, many processors are used in proximity to each other, e.g. in a [[computer cluster]]. In such a centralized [[massively parallel]] system the speed and flexibility of the ''{{vanchor|interconnect}}'' becomes very important and modern supercomputers have used various approaches ranging from enhanced [[Infiniband]] systems to three-dimensional [[torus interconnect]]s.<ref name=Bluenight >Knight, Will: "[https://www.newscientist.com/article/dn12145-ibm-creates-worlds-most-powerful-computer/ IBM creates world's most powerful computer]", ''NewScientist.com news service'', June 2007</ref><ref>{{cite web |author=N. R. Agida|year=2005 |title=Blue Gene/L Torus Interconnection Network {{pipe}} IBM Journal of Research and Development | volume= 45, No 2/3 March–May 2005 |page= 265 |url=http://www.cc.gatech.edu/classes/AY2008/cs8803hpc_spring/papers/bgLtorusnetwork.pdf |work=Torus Interconnection Network|display-authors=etal|archive-url=https://web.archive.org/web/20110815102821/http://www.cc.gatech.edu/classes/AY2008/cs8803hpc_spring/papers/bgLtorusnetwork.pdf|archive-date=15 August 2011}}</ref> The use of [[multi-core processor]]s combined with centralization is an emerging direction, e.g. as in the [[Cyclops64]] system.<ref name="Cellular Computer Architecture Cyclops64' 2005, pages 132–143">{{Cite book | chapter-url=https://link.springer.com/content/pdf/10.1007/11577188_18.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://link.springer.com/content/pdf/10.1007/11577188_18.pdf |archive-date=2022-10-09 |url-status=live |doi = 10.1007/11577188_18|isbn = 978-3-540-29810-6|chapter = Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64|title = Network and Parallel Computing|series = Lecture Notes in Computer Science|year = 2005|last1 = Niu|first1 = Yanwei|last2 = Hu|first2 = Ziang|last3 = Barner|first3 = Kenneth|author3-link = Kenneth Barner|last4 = Gao|first4 = Guang R.|volume = 3779|pages = 132–143}}</ref><ref name=Guangming >''Analysis and performance results of computing betweenness centrality on IBM Cyclops64'' by Guangming Tan, Vugranam C. Sreedhar and Guang R. Gao [[The Journal of Supercomputing]] Volume 56, Number 1, 1–24 September 2011</ref>

As the price, performance and [[Efficient energy use|energy efficiency]] of [[GPGPU|general-purpose graphics processing units]] (GPGPUs) have improved, a number of [[petaFLOPS]] supercomputers such as [[Tianhe-I]] and [[Nebulae (computer)|Nebulae]] have started to rely on them.<ref name=GPGPU >{{cite web |last=Prickett |first=Timothy |title=Top 500 supers – The Dawning of the GPUs |publisher=Theregister.co.uk |date=31 May 2010 |url=https://www.theregister.co.uk/2010/05/31/top_500_supers_jun2010/ }}</ref> However, other systems such as the [[K computer]] continue to use conventional processors such as [[SPARC]]-based designs and the overall applicability of [[GPGPU]]s in general-purpose high-performance computing applications has been the subject of debate, in that while a GPGPU may be tuned to score well on specific benchmarks, its overall applicability to everyday algorithms may be limited unless significant effort is spent to tune the application to it.<ref name=HansH >{{cite book |chapter=Considering GPGPU for HPC Centers: Is It Worth the Effort? |author1=Hans Hacker|author2=Carsten Trinitis|author3=Josef Weidendorfer|author4=Matthias Brehm|title=Facing the Multicore-Challenge: Aspects of New Paradigms and Technologies in Parallel Computing|editor1=Rainer Keller|editor2=David Kramer|editor3=Jan-Philipp Weiss |year=2010 |isbn= 978-3-642-16232-9 |pages= 118–121 |chapter-url=https://books.google.com/books?id=-luqXPiew_UC&pg=PA118|publisher=Springer Science & Business Media}}</ref> However, GPUs are gaining ground, and in 2012 the [[Jaguar supercomputer|Jaguar]] supercomputer was transformed into [[Titan (supercomputer)|Titan]] by retrofitting CPUs with GPUs.<ref name=PC>{{cite web |title=Cray's Titan Supercomputer for ORNL Could Be World's Fastest |author=Damon Poeter |publisher=Pcmag.com |date=11 October 2011 |url=https://www.pcmag.com/article2/0,2817,2394515,00.asp}}</ref><ref>{{cite web |title=GPUs Will Morph ORNL's Jaguar into 20-Petaflop Titan |first= Michael |last=Feldman |publisher=Hpcwire.com |date=11 October 2011 |url=http://www.hpcwire.com/hpcwire/2011-10-11/gpus_will_morph_ornl_s_jaguar_into_20-petaflop_titan.html}}</ref><ref name=TitanReg>{{cite web |title=Oak Ridge changes Jaguar's spots from CPUs to GPUs |author= Timothy Prickett Morgan |publisher=Theregister.co.uk |date= 11 October 2011 |url=https://www.theregister.co.uk/2011/10/11/oak_ridge_cray_nvidia_titan/}}</ref>

High-performance computers have an expected life cycle of about three years before requiring an upgrade.<ref>[http://www.netl.doe.gov/File%20Library/Research/onsite%20research/R-D190-2014Nov.pdf "The NETL SuperComputer"] {{Webarchive|url=https://web.archive.org/web/20150904034017/http://www.netl.doe.gov/File%20Library/Research/onsite%20research/R-D190-2014Nov.pdf |date=4 September 2015 }}.
page 2.</ref> The [[Gyoukou]] supercomputer is unique in that it uses both a massively parallel design and [[Server immersion cooling|liquid immersion cooling]].