Editing IBM Blue Gene (section)

==History==
A video presentation of the history and technology of the Blue Gene project was given at the Supercomputing 2020 conference.<ref>{{citation|title=Supercomputing 2020 conference, Test of Time award video presentation|url=https://www.youtube.com/watch?v=jaF2gUTgME4|access-date=2024-06-23|archive-date=2024-06-23|archive-url=https://web.archive.org/web/20240623001343/https://www.youtube.com/watch?v=jaF2gUTgME4&gl=US&hl=en|url-status=live}}</ref>

In December 1999, IBM announced a US$100 million research initiative for a five-year effort to build a massively [[parallel computer]], to be applied to the study of biomolecular phenomena such as [[protein folding]].<ref>{{cite journal | url=http://www.research.ibm.com/journal/sj/402/allen.pdf |title=Blue Gene: A Vision for Protein Science using a Petaflop Supercomputer |journal=IBM Systems Journal |volume=40 |issue=2|date=2017-10-23 }}</ref> The research and development was pursued by a large multi-disciplinary team at the [[Thomas J. Watson Research Center|IBM T. J. Watson Research Center]], initially led by [[William R. Pulleyblank]].<ref>{{citation|url=http://www.businessweek.com/stories/2001-11-06/a-talk-with-the-brain-behind-blue-gene|archive-url=https://archive.today/20141211052804/http://www.businessweek.com/stories/2001-11-06/a-talk-with-the-brain-behind-blue-gene|url-status=dead|archive-date=December 11, 2014|journal=[[BusinessWeek]]|title=A Talk with the Brain behind Blue Gene|date=November 6, 2001}}</ref>
The project had two main goals: to advance understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures.

The initial design for Blue Gene was based on an early version of the [[Cyclops64]] architecture, designed by [[Monty Denneau]]. In parallel, Alan Gara had started working on an extension of the [[QCDOC]] architecture into a more general-purpose supercomputer. The [[United States Department of Energy|US Department of Energy]] started funding the development of this system and it became known as Blue Gene/L (L for Light). Development of the original Blue Gene architecture continued under the name Blue Gene/C (C for Cyclops) and, later, Cyclops64.

Architecture and chip logic design for the Blue Gene systems was done at the [[Thomas J. Watson Research Center|IBM T. J. Watson Research Center]], chip design was completed and chips were manufactured by [[IBM Microelectronics]], and the systems were built at [[IBM Rochester |IBM Rochester, MN]].

In November 2004 a 16-[[Cabinet (computer)|rack]] system, with each rack holding 1,024 compute nodes, achieved first place in the [[TOP500]] list, with a [[LINPACK benchmarks]] performance of 70.72&nbsp;TFLOPS.<ref name=Top500/> It thereby overtook NEC's [[Earth Simulator]], which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL<ref>{{cite web |url=http://asc.llnl.gov/computing_resources/bluegenel/ |title=BlueGene/L |access-date=2007-10-05 |url-status=dead |archive-url=https://web.archive.org/web/20110718034455/https://asc.llnl.gov/computing_resources/bluegenel/ |archive-date=2011-07-18 }}</ref> gradually expanded to 104 racks, achieving 478&nbsp;TFLOPS Linpack and 596&nbsp;TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based [[IBM Roadrunner|Roadrunner]] system at [[Los Alamos National Laboratory]], which was the first system to surpass the 1 PetaFLOPS mark.

While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. The November 2006 [[TOP500]] list showed 27 computers with the ''eServer Blue Gene Solution'' architecture. For example, three racks of Blue Gene/L were housed at the [[San Diego Supercomputer Center]].

While the [[TOP500]] measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100&nbsp;[[FLOPS|TFLOPS]] sustained on a real-world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This achievement won the 2005 [[Gordon Bell Prize]].

In June 2006, [[National Nuclear Security Administration|NNSA]] and IBM announced that Blue Gene/L achieved 207.3&nbsp;TFLOPS on a quantum chemical application ([[Qbox]]).<ref>{{Cite web|url=http://www.hpcwire.com/hpc/701665.html|archive-url=https://web.archive.org/web/20070928004334/http://www.hpcwire.com/hpc/701665.html|url-status=dead|title=hpcwire.com|archive-date=September 28, 2007}}</ref> At Supercomputing 2006,<ref>{{cite web|url=http://sc06.supercomputing.org|title=SC06|website=sc06.supercomputing.org|access-date=13 October 2017 |url-status=dead |archive-url=https://web.archive.org/web/20171013224817/http://sc06.supercomputing.org/ |archive-date=2017-10-13}}</ref> Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards.<ref>{{cite web |url=http://www.hpcchallenge.org/custom/index.html?lid=103&slid=212 |title=HPC Challenge Award Competition |access-date=2006-12-03 |url-status=dead |archive-url=https://web.archive.org/web/20061211183441/http://www.hpcchallenge.org/custom/index.html?lid=103&slid=212 |archive-date=2006-12-11 }}</ref> In 2007, a team from the [[IBM Almaden Research Center]] and the [[University of Nevada, Reno|University of Nevada]] ran an [[artificial neural network]] almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).<ref>{{cite news|title=Mouse brain simulated on computer|url=http://news.bbc.co.uk/2/hi/technology/6600965.stm|agency=BBC News|date=April 27, 2007|archive-url=https://web.archive.org/web/20070525081051/http://news.bbc.co.uk/2/hi/technology/6600965.stm|archive-date=2007-05-25}}</ref>

===The name ===
The name Blue Gene comes from what it was originally designed to do, help biologists understand the processes of [[protein folding]] and [[genetics|gene development]].<ref>{{cite web|url=http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/bluegene/|archive-url=https://web.archive.org/web/20120403015415/http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/bluegene/|url-status=dead|archive-date=April 3, 2012|title=IBM100 - Blue Gene|date=7 March 2012|website=03.ibm.com|access-date=13 October 2017}}</ref> "Blue" is a traditional moniker that IBM uses for many of its products and [[IBM#Brand and reputation|the company itself]]. The original Blue Gene design was renamed "Blue Gene/C" and eventually [[Cyclops64]]. The "L" in Blue Gene/L comes from "Light" as that design's original name was "Blue Light". The "P" version was designed to be a [[Petascale computing|petascale]] design. "Q" is just the letter after "P".<ref>{{cite book|url=https://books.google.com/books?id=TTC7BQAAQBAJ&q=%22Blue+Gene%2FL%22+%22light%22&pg=PA318|title=Supercomputing: 28th International Supercomputing Conference, ISC 2013, Leipzig, Germany, June 16-20, 2013. Proceedings|first1=Julian M.|last1=Kunkel|first2=Thomas|last2=Ludwig|first3=Hans|last3=Meuer|date=12 June 2013|publisher=Springer|access-date=13 October 2017|via=Google Books|isbn=9783642387500}}</ref>

===Major features===
The Blue Gene/L supercomputer was unique in the following aspects:<ref>{{cite journal |url=http://www.research.ibm.com/journal/rd49-23.html |journal=IBM Journal of Research and Development |title=Blue Gene |volume=49 |issue=2/3 |year=2005 |archive-date=2006-12-01 |access-date=2006-12-14 |archive-url=https://web.archive.org/web/20061201043139/http://www.research.ibm.com/journal/rd49-23.html |url-status=live }}</ref>
* Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating-point accelerators. While the performance of each chip was relatively low, the system could achieve better power efficiency for applications that could use large numbers of nodes.
* Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load.
* System-on-a-chip design. Components were embedded on a single chip for each node, with the exception of 512&nbsp;MB external DRAM.
* A large number of nodes (scalable in increments of 1024 up to at least 65,536).
* Three-dimensional [[torus interconnect]] with auxiliary networks for global communications (broadcast and reductions), I/O, and management.
* Lightweight OS per node for minimum system overhead (system noise).

===Architecture===
The Blue Gene/L architecture was an evolution of the QCDSP and [[QCDOC]] architectures. Each Blue Gene/L Compute or I/O node was a single [[Application-specific integrated circuit|ASIC]] with associated [[Dynamic random access memory|DRAM]] memory chips. The ASIC integrated two 700&nbsp;MHz [[PowerPC 440]] embedded processors, each with a double-pipeline-double-precision [[floating-point unit|Floating-Point Unit]] (FPU), a [[CPU cache|cache]] sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6&nbsp;[[FLOPS|GFLOPS (gigaFLOPS)]]. The two CPUs were not [[Cache coherency|cache coherent]] with one another.

Compute nodes were packaged two per compute card, with 16 compute cards (thus 32 nodes) plus up to 2 I/O nodes per node board. A cabinet/rack contained 32 node boards.<ref>{{cite web|url=https://asc.llnl.gov/computing_resources/bluegenel/configuration.html|title=BlueGene/L Configuration|first=Lynn|last=Kissel|website=asc.llnl.gov|access-date=13 October 2017|archive-date=17 February 2013|archive-url=https://web.archive.org/web/20130217032440/https://asc.llnl.gov/computing_resources/bluegenel/configuration.html|url-status=dead}}</ref> By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated about 17 watts (including DRAMs). The low power per node allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard [[19-inch rack]], within reasonable limits on electrical power supply and air cooling. The system performance metrics, in terms of [[FLOPS per watt]], FLOPS per m<sup>2</sup> of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run.

Each Blue Gene/L node was attached to three parallel communications networks: a [[dimension|3D]] [[torus interconnect|toroidal network]] for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for [[Barrier (computer science)|fast barriers]]. The I/O nodes, which run the [[Linux]] [[operating system]], provided communication to storage and external hosts via an [[Ethernet]] network. The I/O nodes handled filesystem operations on behalf of the compute nodes. A separate and private [[Ethernet]] management network provided access to any node for configuration, [[booting]] and diagnostics.

To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive [[integer]] power of 2, with at least 2<sup>5</sup> = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.

Blue Gene/L compute nodes used a minimal [[operating system]] supporting a single user program. Only a subset of [[POSIX]] calls was supported, and only one process could run at a time on a node in co-processor mode—or one process per CPU in virtual mode. Programmers needed to implement [[green threads]] in order to simulate local concurrency. Application development was usually performed in [[C (programming language)|C]], [[C++]], or [[Fortran]] using [[Message Passing Interface|MPI]] for communication. However, some scripting languages such as [[Ruby (programming language)|Ruby]]<ref>{{Cite web|title=Compute Node Ruby for Bluegene/L|website=www.ece.iastate.edu|url=http://www.ece.iastate.edu/~crb002/cnr.html|archive-url=https://web.archive.org/web/20090211071506/http://www.ece.iastate.edu:80/~crb002/cnr.html|url-status=dead|archive-date=February 11, 2009}}</ref> and [[Python (programming language)|Python]]<ref>{{cite conference |url=http://us.pycon.org/2011/home/ |title=Python for High Performance Computing |author=William Scullin |date=March 12, 2011 |location=Atlanta, GA |access-date=March 12, 2011 |archive-date=March 9, 2011 |archive-url=https://web.archive.org/web/20110309110314/http://us.pycon.org/2011/home/ |url-status=live }}</ref> have been ported to the compute nodes.

IBM published BlueMatter, the application developed to exercise Blue Gene/L, as open source.<ref>[https://github.com/IBM/BlueMatter Blue Matter source code, retrieved February 28, 2020]</ref> This serves to document how the torus and collective interfaces were used by applications, and may serve as a base for others to exercise the current generation of supercomputers.