Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cell (processor)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Multi-core microprocessor microarchitecture}} {{Use mdy dates|date=April 2025}} {{Infobox CPU architecture |name = Cell Broadband Engine (Cell/B.E.) |designer = STI ([[Sony]], [[Toshiba]] and [[IBM]]) |bits = [[64-bit]] |introduced = {{start date and age|2006|11}} |version = [[PowerPC 2.02]]<ref name="powerpc_archguide">{{Cite web |date=November 16, 2005 |title=PowerPC Architecture Book, Version 2.02 |url=https://www.ibm.com/developerworks/systems/library/es-archguide-v2.html |archive-url=https://web.archive.org/web/20201129235141/https://www.ibm.com/developerworks/systems/library/es-archguide-v2.html |archive-date=November 29, 2020 |website=[[IBM]]}}</ref> |design = [[RISC]] |type = [[Load–store architecture|Load–store]] |encoding = Fixed/Variable (Book E) |branching = [[Status register|Condition code]] |endianness = [[Bi-endian|Big/Bi]] |extensions = |open = |gpr = |fpr = |vpr = |image=CELL BE logo.svg{{!}}class=skin-invert |successor = }} The '''Cell Broadband Engine''' (Cell/B.E.) is a 64-bit [[multi-core processor]] and [[microarchitecture]] developed by [[Sony]], [[Toshiba]], and [[IBM]]—an alliance known as "STI". It combines a general-purpose [[PowerPC]] core, called the Power Processing Element (PPE), with multiple specialized [[coprocessor]]s, known as Synergistic Processing Elements (SPEs), which accelerate tasks such as [[multimedia]] and [[vector processing]].<ref name="ibmrpaper">{{Cite journal |last=Gschwind |first=Michael |last2=Hofstee |first2=H. Peter |last3=Flachs |first3=Brian |last4=Hopkins |first4=Martin |last5=Watanabe |first5=Yukio |last6=Yamazaki |first6=Takeshi |date=March–April 2006 |title=Synergistic Processing in Cell's Multicore Architecture |url=https://www.cs.tufts.edu/comp/150IPL/papers/gschwind06synergistic.pdf |journal=IEEE Micro |publisher=IEEE |volume=26 |issue=2 |pages=10–24 |doi=10.1109/MM.2006.41 |s2cid=17834015}}</ref> The architecture was developed over a four-year period beginning in March 2001, with Sony reporting a development budget of approximately {{US$|400 million|link=yes}}.<ref>{{Cite web |title=Cell Designer talks about PS3 and IBM Cell Processors |url=http://ps3.qj.net/Cell-Designer-talks-about-PS3-and-IBM-Cell-Processors/pg/49/aid/14805 |url-status=dead |archive-url=https://web.archive.org/web/20060821114005/http://ps3.qj.net/Cell-Designer-talks-about-PS3-and-IBM-Cell-Processors/pg/49/aid/14805 |archive-date=August 21, 2006 |access-date=March 22, 2007}}</ref> Its first major commercial application was in Sony's [[PlayStation 3]] home video game console, released in 2006. In 2008, a modified version of the Cell processor powered IBM's [[Roadrunner (supercomputer)|Roadrunner]], the first supercomputer to sustain one [[petaFLOPS]]. Other applications include high-performance computing systems from [[Mercury Computer Systems]] and specialized [[arcade system boards]]. Cell emphasizes [[memory coherence]], power efficiency, and peak [[Bandwidth (computing)|computational throughput]], but its design presented significant challenges for software development.<ref>{{Cite web |last=Shankland |first=Stephen |date=February 22, 2006 |title=Octopiler seeks to arm Cell programmers |url=http://news.cnet.com/Octopiler+seeks+to+arm+Cell+programmers/2100-1007_3-6042132.html |access-date=March 22, 2007 |website=CNET}}</ref> IBM offered a [[Linux]]-based [[software development kit]] to facilitate programming on the platform.<ref>{{Cite news |date=November 10, 2005 |title=Cell Broadband Engine Software Development Kit Version 1.0 |url=https://lwn.net/Articles/159564/ |access-date=March 22, 2007 |publisher=LWN}}</ref>{{POWER, PowerPC, and Power ISA}} ==History== [[Image:CELL BE processor PS3 board.jpg|thumb|Cell BE as it appears in the PS3 on the motherboard]] [[File:Peter portrait.jpg|thumb|[[Peter Hofstee]], one of the chief architects of the Cell microprocessor]] [[File:Michael Gschwind.jpg|thumb|Michael Gschwind, one of the chief architects of the Cell microprocessor]]In mid-2000, Sony, Toshiba, and IBM formed the STI alliance to develop a new microprocessor.<ref>Krewell, Kevin (February 14, 2005). "Cell Moves Into the Limelight". ''[[Microprocessor Report]]''.</ref> The STI Design Center opened in March 2001 in [[Austin, Texas]]. Over the next four years, more than 400 engineers collaborated on the project, with IBM contributing from eleven of its design centers.<ref name="kahle">{{Cite news |date=August 7, 2005 |title=Introduction to the Cell multiprocessor |url=http://researchweb.watson.ibm.com/journal/rd/494/kahle.html |url-status=dead |archive-url=https://web.archive.org/web/20070228043339/http://researchweb.watson.ibm.com/journal/rd/494/kahle.html |archive-date=February 28, 2007 |access-date=March 22, 2007 |publisher=IBM Journal of Research and Development}}</ref> Initial [[patents]] described a configuration with four [[Power Processing Element]]s (PPEs), each paired with eight Synergistic Processing Elements (SPEs), for a theoretical peak performance of 1 teraFLOPS.{{Citation needed|date=April 2025}} However, only a scaled-down design—one PPE with eight SPEs—was ultimately manufactured.<ref name="xbit-65">{{Cite web |title=IBM Produces Cell Processor Using New Fabrication Technology. |url=http://www.xbitlabs.com/news/cpu/display/20070312121941.html |url-status=dead |archive-url=https://web.archive.org/web/20070315000722/http://www.xbitlabs.com/news/cpu/display/20070312121941.html |archive-date=March 15, 2007 |access-date=March 12, 2007 |publisher=X-bit labs}}</ref> Fabrication of the initial Cell chip began on a [[90 nm process|90 nm]] SOI ([[silicon on insulator]]) process.<ref name="xbit-65" /> In March 2007, IBM transitioned production to a [[65 nm process|65 nm process]],<ref name="xbit-65" /><ref>{{Cite news |date=January 30, 2007 |title=65nm CELL processor production started |url=http://www.psu.com/node/7409 |url-status=dead |archive-url=https://web.archive.org/web/20070202024328/http://www.psu.com/node/7409 |archive-date=February 2, 2007 |access-date=May 18, 2007 |publisher=PlayStation Universe}}</ref> followed by a [[45 nm process|45 nm process]] announced in February 2008.<ref name="ArsTechnicaInterview">{{Cite web |date=August 18, 2009 |title=Sony answears our questions about the new PlayStation 3 |url=https://arstechnica.com/gaming/news/2009/08/sony-answers-our-questions-about-the-new-playstation-3.ars |access-date=August 19, 2009 |website=[[Ars Technica]]}}</ref> [[Bandai Namco Entertainment]] used the Cell processor in its [[Namco System 357]] and 369 arcade boards.{{Citation needed|date=April 2025}} In May 2008, IBM introduced the [[PowerXCell 8i]], a double-precision variant of the Cell processor, used in systems such as IBM's Roadrunner supercomputer, the first to achieve one petaFLOPS and the fastest until late 2009.<ref name="Gaudin 2008">{{Cite news |last=Gaudin |first=Sharon |date=June 9, 2008 |title=IBM's Roadrunner smashes 4-minute mile of supercomputing |url=http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=hardware&articleId=9095318&taxonomyId=12&intsrc=kc_top |url-status=dead |archive-url=https://web.archive.org/web/20081224001155/http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=hardware&articleId=9095318&taxonomyId=12&intsrc=kc_top |archive-date=December 24, 2008 |access-date=June 10, 2008 |work=Computerworld}}</ref><ref name="Fildes 2008">{{Cite news |last=Fildes |first=Jonathan |date=June 9, 2008 |title=Supercomputer sets petaflop pace |url=http://news.bbc.co.uk/2/hi/technology/7443557.stm |access-date=June 9, 2008 |work=BBC News}}</ref> IBM ceased development of higher-core-count Cell variants (such as a 32-APU version) in late 2009,<ref name="HPCwire">{{Cite web |date=October 27, 2009 |title=Will Roadrunner Be the Cell's Last Hurrah? |url=http://www.hpcwire.com/features/Will-Roadrunner-Be-the-Cells-Last-Hurrah-66707892.html |url-status=dead |archive-url=https://web.archive.org/web/20091031112643/http://www.hpcwire.com/features/Will-Roadrunner-Be-the-Cells-Last-Hurrah-66707892.html |archive-date=October 31, 2009}}</ref><ref name="HeiseOnline">{{Cite web |date=November 20, 2009 |title=SC09: IBM lässt Cell-Prozessor auslaufen |url=http://www.heise.de/newsticker/meldung/SC09-IBM-laesst-Cell-Prozessor-auslaufen-864497.html |access-date=November 21, 2009 |publisher=[[HeiseOnline]]}}</ref> but continued supporting existing Cell-based products.<ref name="DriverHeaven">{{Cite web |date=November 23, 2009 |title=IBM have not stopped Cell processor development |url=http://www.driverheaven.net/news.php?newsid=344 |url-status=dead |archive-url=https://web.archive.org/web/20091125175635/http://www.driverheaven.net/news.php?newsid=344 |archive-date=November 25, 2009 |access-date=November 24, 2009 |publisher=DriverHeaven.net}}</ref> ===Commercialization=== On May 17, 2005, Sony confirmed the Cell configuration used in the [[PlayStation 3]]: one PPE and seven SPEs.<ref name="CELLSpecs">{{Cite web |title=Cell Introduction |url=https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/D21E662845B95D4F872570AB0055404D/$file/2053_IBM_CellIntro.pdf |url-status=dead |archive-url=https://web.archive.org/web/20090326055101/http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/D21E662845B95D4F872570AB0055404D/$file/2053_IBM_CellIntro.pdf |archive-date=March 26, 2009 |access-date=January 14, 2008 |publisher=IBM}}</ref><ref name="e3ign">{{Cite web |last=Roper |first=Chris |date=May 17, 2005 |title=E3 2005: Cell Processor Technology Demos |url=http://gear.ign.com/articles/615/615521p1.html |access-date=March 22, 2007 |website=IGN}}</ref><ref>{{Cite news |last=Becker |first=David |date=February 7, 2005 |title=PlayStation 3 chip has split personality |url=http://news.cnet.com/PlayStation+3+chip+has+split+personality/2100-1043_3-5566340.html?tag=nl |access-date=May 18, 2007 |work=[[CNET]]}}</ref> To improve manufacturing [[Semiconductor device fabrication#Device test|yield]], the processor is initially fabricated with eight SPEs. After production, [[Wafer testing|each chip is tested]], and if a defect is found in one SPE, it is disabled using [[laser trimming]]. This approach minimizes waste by utilizing processors that would otherwise be discarded. Even in chips without defects, one SPE is intentionally disabled to ensure consistency across units.<ref>{{Cite web |title=Sony PlayStation 3 Cell Processor |url=http://moss.csc.ncsu.edu/~mueller/cluster/ps3/ |url-status=live |archive-url=https://web.archive.org/web/20071204202236/http://moss.csc.ncsu.edu/~mueller/cluster/ps3/ |archive-date=December 4, 2007 |access-date=January 14, 2008 |publisher=North Carolina State University}}</ref><ref name="GameDevelMag3">{{Cite news |last=Linklater |first=Martin |title=Optimizing Cell Code |work=Game Developer Magazine, April 2007 |pages=15–18 |quote=To increase fabrication yields, Sony ships PlayStation 3 Cell processors with only seven working SPEs. And from those seven, one SPE will be used by the operating system for various tasks, This leaves six SPEs for game programmer to use.}}</ref> Of the seven operational SPEs, six are available for developers to use in games and applications, while the seventh is reserved for the console's operating system.<ref name="GameDevelMag3" /> The chip operates at a clock speed of 3.2 GHz.<ref name="e3witpro">{{Cite news |last=Thurrott |first=Paul |date=May 17, 2005 |title=Sony Ups the Ante with PlayStation 3 |url=http://www.windowsitpro.com/Articles/ArticleID/46431/46431.html?Ad=1 |url-status=dead |archive-url=https://web.archive.org/web/20070930155439/http://www.windowsitpro.com/Articles/ArticleID/46431/46431.html?Ad=1 |archive-date=September 30, 2007 |access-date=March 22, 2007 |publisher=WindowsITPro}}</ref> Sony also used the Cell in its [[Zego]] high-performance media computing server. The PPE supports [[simultaneous multithreading]] (SMT) and can execute two threads, while each active SPE supports one thread. In the PlayStation 3 configuration, the Cell processor supports up to nine threads.{{Citation needed|date=April 2025}} On June 28, 2005, IBM and Mercury Computer Systems announced a partnership to use Cell processors in [[embedded systems]] for [[medical imaging]], [[aerospace]], and [[seismic processing]], among other fields.<ref name="mcs">{{Cite news |date=April 12, 2007 |title=Mercury Wins IBM PartnerWorld Beacon Award |url=http://www.supercomputingonline.com/article.php?sid=13477 |access-date=May 18, 2007 |publisher=Supercomputing Online}}{{dead link|date=August 2017|bot=medic}}{{cbignore|bot=medic}}</ref> Mercury use the full Cell processor with eight active SPEs.{{Citation needed|date=April 2025}} Mercury later released [[blade server]]s and [[PCI Express]] accelerator cards based on the architecture.<ref name="gigaaccel180">{{Cite web |date=April 8, 2008 |title=Fixstars Releases Accelerator Board Featuring the PowerXCell 8i |url=http://www.fixstars.com/en/company/press/20080403.html |url-status=dead |archive-url=https://web.archive.org/web/20090105224210/http://www.fixstars.com/en/company/press/20080403.html |archive-date=January 5, 2009 |access-date=August 18, 2008 |publisher=Fixstars Corporation}}</ref> In 2006, IBM introduced the QS20 blade server, offering up to 410 gigaFLOPS per module in single-precision performance. The [[QS22]] blade, based on the PowerXCell 8i, was used in IBM's Roadrunner supercomputer.<ref name="Gaudin 2008" /><ref name="Fildes 2008" /> On April 8, 2008, Fixstars Corporation released a PCI Express accelerator board based on the PowerXCell 8i.<ref name="gigaaccel180" /> ==Overview== {{more citations needed|section|date=May 2025}} {{copy edit|section|date=May 2025}} The '''Cell Broadband Engine''', or ''Cell'' as it is more commonly known, is a microprocessor intended as a hybrid of conventional desktop processors (such as the [[Athlon 64]], and [[Core 2]] families) and more specialized high-performance processors, such as the [[NVIDIA]] and [[ATI (brand)|ATI]] graphics-processors ([[Graphics processing unit|GPU]]s). The longer name indicates its intended use, namely as a component in current and future [[online distribution]] systems; as such it may be utilized in high-definition displays and recording equipment, as well as [[high-definition television|HDTV]] systems. Additionally the processor may be suited to [[digital imaging]] systems (medical, scientific, ''etc.'') and [[physical simulation]] (''e.g.'', scientific and [[structural engineering]] modeling). As used in the PlayStation 3 it has 250 million transistors.<ref>{{Cite web |date=July 13, 2006 |title=A Glimpse Inside The Cell Processor |url=https://www.gamedeveloper.com/programming/a-glimpse-inside-the-cell-processor |access-date=June 19, 2019 |website=[[Gamasutra]]}}</ref> In a simple analysis, the Cell processor can be split into four components: external input and output structures, the main processor called the ''Power Processing Element'' (PPE) (a two-way [[Simultaneous multithreading|simultaneous-multithreaded]] [[PowerPC 2.02]] core),<ref>{{Cite book |last=Koranne |first=Sandeep |url=https://link.springer.com/chapter/10.1007/978-1-4419-0308-2_2 |title=Practical Computing on the Cell Broadband Engine |date=July 15, 2009 |publisher=[[Springer Science+Business Media]] |isbn=978-1-4419-0307-5 |page=17 |chapter=Chapter 2 - The Power Processing Element (PPE) |doi=10.1007/978-1-4419-0308-2_2 |chapter-url=https://link.springer.com/chapter/10.1007/978-1-4419-0308-2_2}}</ref> eight fully functional co-processors called the ''Synergistic Processing Elements'', or SPEs, and a specialized high-bandwidth [[circular data bus]] connecting the PPE, input/output elements and the SPEs, called the ''Element Interconnect Bus'' or EIB. To achieve the high performance needed for mathematically intensive tasks, such as decoding/encoding [[MPEG]] streams, generating or transforming three-dimensional data, or undertaking [[Fourier analysis]] of data, the Cell processor marries the SPEs and the PPE via EIB to give access, via fully [[Direct memory access#Cache coherency|cache coherent]] [[Direct memory access|DMA (direct memory access)]], to both main memory and to other external data storage. To make the best of EIB, and to overlap computation and data transfer, each of the nine processing elements (PPE and SPEs) is equipped with a [[Direct memory access#DMA engine|DMA engine]]. Since the SPE's load/store instructions can only access its own local [[scratchpad memory]], each SPE entirely depends on DMAs to transfer data to and from the main memory and other SPEs' local memories. A DMA operation can transfer either a single block area of size up to 16KB, or a list of 2 to 2048 such blocks. One of the major design decisions in the architecture of Cell is the use of DMAs as a central means of intra-chip data transfer, with a view to enabling maximal asynchrony and concurrency in data processing inside a chip.<ref name="geschwindpaper">{{Cite conference |last=Gschwind |first=Michael |year=2006 |title=Chip multiprocessing and the cell broadband engine |url=http://portal.acm.org/citation.cfm?id=1128023 |publisher=ACM |pages=1–8 |doi=10.1145/1128022.1128023 |isbn=1595933026 |access-date=June 29, 2008 |book-title=Proceedings of the 3rd conference on Computing frontiers - CF '06 |s2cid=14226551}}</ref> The PPE, which is capable of running a conventional operating system, has control over the SPEs and can start, stop, interrupt, and schedule processes running on the SPEs. To this end, the PPE has additional instructions relating to the control of the SPEs. Unlike SPEs, the PPE can read and write the main memory and the local memories of SPEs through the standard load/store instructions. The SPEs are not fully autonomous and require the PPE to prime them before they can do any useful work. As most of the "horsepower" of the system comes from the synergistic processing elements, the use of [[Direct memory access|DMA]] as a method of data transfer and the limited local [[memory footprint]] of each SPE pose a major challenge to software developers who wish to make the most of this horsepower, demanding careful hand-tuning of programs to extract maximal performance from this CPU. The PPE and bus architecture includes various modes of operation giving different levels of [[memory protection]], allowing areas of memory to be protected from access by specific processes running on the SPEs or the PPE. Both the PPE and SPE are [[RISC]] architectures with a fixed-width 32-bit instruction format. The PPE contains a 64-bit [[general-purpose register]] set (GPR), a 64-bit floating-point register set (FPR), and a 128-bit [[Altivec]] register set. The SPE contains 128-bit registers only. These can be used for scalar data types ranging from 8-bits to 64-bits in size, or for [[SIMD]] computations on various integer and floating-point formats. System memory addresses for both the PPE and SPE are expressed as 64-bit values. Local store addresses internal to the SPU (Synergistic Processor Unit) processor are expressed as a 32-bit word. In documentation relating to Cell, a word is always taken to mean 32 bits, a doubleword means 64 bits, and a quadword means 128 bits. <!-- Far from perfect but trending toward accuracy. Could not find either the virtual address range limit or the physical address range limit. Note that a "system address" on the SPU is an address passed to the SPU DMA controller; the LS has only 2^14 addressable locations (257K/16B) ~~~~ --> ===PowerXCell 8i=== In 2008, IBM announced a revised variant of the Cell called the '''PowerXCell 8i''',<ref name="cbe-programming-handbok">{{Cite book |url=http://www.iman1.jo/iman1/images/IMAN1-User-Site-Files/Programming/CellBE_PXCell_Handbook_v1.11_12May08_pub.pdf |title=Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor |date=May 12, 2008 |publisher=[[IBM]] |series=Version 1.11 |access-date=March 10, 2018 |archive-url=https://web.archive.org/web/20180311081221/http://www.iman1.jo/iman1/images/IMAN1-User-Site-Files/Programming/CellBE_PXCell_Handbook_v1.11_12May08_pub.pdf |archive-date=March 11, 2018 |url-status=dead}}</ref> which is available in QS22 [[BladeCenter|Blade Servers]] from IBM. The PowerXCell is manufactured on a [[65 nm]] process, and adds support for up to 32 GB of slotted DDR2 memory, as well as dramatically improving [[double-precision floating-point]] performance on the SPEs from a peak of about 12.8 [[GFLOPS]] to 102.4 GFLOPS total for eight SPEs, which, coincidentally, is the same peak performance as the [[NEC SX-9]] vector processor released around the same time. The [[Roadrunner (supercomputer)|IBM Roadrunner]] supercomputer, the world's fastest during 2008–2009, consisted of 12,240 PowerXCell 8i processors, along with 6,562 [[Opteron|AMD Opteron]] processors.<ref name="beyond3dpowerxcell">{{Cite web |date=May 2008 |title=IBM announces PowerXCell 8i, QS22 blade server |url=http://www.beyond3d.com/content/news/640 |url-status=dead |archive-url=https://web.archive.org/web/20080616190441/http://www.beyond3d.com/content/news/640 |archive-date=June 16, 2008 |access-date=June 10, 2008 |publisher=Beyond3D |df=mdy-all}}</ref> The PowerXCell 8i powered super computers also dominated all of the top 6 "greenest" systems in the Green500 list, with highest MFLOPS/Watt ratio supercomputers in the world.<ref name="The Green 500 list, Nov 2009 ">{{Cite web |title=The Green500 List - November 2009 |url=http://www.green500.org/lists/2009/11/top/list.php |url-status=dead |archive-url=http://archive.wikiwix.com/cache/20110223210120/http://www.green500.org/lists/2009/11/top/list.php |archive-date=February 23, 2011 |df=mdy-all}}</ref> Beside the QS22 and supercomputers, the PowerXCell processor is also available as an accelerator on a PCI Express card and is used as the core processor in the [[QPACE]] project. Since the PowerXCell 8i removed the RAMBUS memory interface, and added significantly larger DDR2 interfaces and enhanced SPEs, the chip layout had to be reworked, which resulted in both larger chip die and packaging.<ref>{{Cite web |title=Packaging the Cell Broadband Engine Microprocessor for Supercomputer Applications |url=http://ecadigitallibrary.com/pdf/58thECTC/s31p4p67.pdf |url-status=dead |archive-url=https://web.archive.org/web/20140104204617/http://ecadigitallibrary.com/pdf/58thECTC/s31p4p67.pdf |archive-date=January 4, 2014 |access-date=January 4, 2014}}</ref> ==Architecture== [[File:Schema Cell.png|thumb]] While the Cell chip can have a number of different configurations, the basic configuration is a [[multi-core (computing)|multi-core]] chip composed of one "Power Processor Element" ("PPE") (sometimes called "Processing Element", or "PE"), and multiple "Synergistic Processing Elements" ("SPE").<ref name="cellbriefing">{{Cite news |date=February 7, 2005 |title=Cell Microprocessor Briefing |url=http://pc.watch.impress.co.jp/docs/2005/0208/kaigai153.htm |publisher=IBM, Sony Computer Entertainment Inc., Toshiba Corp.}}</ref> The PPE and SPEs are linked together by an internal high speed bus dubbed "Element Interconnect Bus" ("EIB"). ===Power Processor Element (PPE)=== {{Main | Power Processing Element}} [[File:PPE (Cell).png|thumb|PPE]] The ''PPE''<ref name="cc.gatech.edu">{{Cite web |last=Kim |first=Hyesoon |author-link=Hyesoon Kim |date=Spring 2011 |title=CS4803DGC Design and Programming of Game Console |url=https://faculty.cc.gatech.edu/~hyesoon/spr11/lec_cell.pdf}}</ref><ref>{{Cite book |last=Koranne |first=Sandeep |url=https://books.google.com/books?id=f9FxS-mdF8UC&pg=PA19 |title=Practical Computing on the Cell Broadband Engine |date=2009 |publisher=Springer Science+Business Media |isbn=9781441903082 |page=19}}</ref><ref>{{Cite web |last=Hofstee |first=H. Peter |date=2005 |title=All About the Cell Processor |url=http://www.research.ibm.com/people/a/ashwini/E3%202005%20Cell%20Blade%20reports/All_About_Cell_Cool_Chips_Final.pdf |url-status=dead |archive-url=https://web.archive.org/web/20110906154333/http://www.research.ibm.com/people/a/ashwini/E3%202005%20Cell%20Blade%20reports/All_About_Cell_Cool_Chips_Final.pdf |archive-date=September 6, 2011}}</ref> is the [[PowerPC]] based, dual-issue in-order two-way [[Simultaneous multithreading|simultaneous-multithreaded]] [[CPU]] core with a 23-stage pipeline acting as the controller for the eight SPEs, which handle most of the computational workload. PPE has limited out of order execution capabilities; it can perform loads out of order and has delayed execution pipelines. The PPE will work with conventional operating systems due to its similarity to other 64-bit PowerPC processors, while the SPEs are designed for vectorized floating point code execution. The PPE contains a 32 [[KiB]] level 1 instruction [[CPU cache|cache]], a 32 KiB level 1 data cache, and a 512 KiB level 2 cache. The size of a cache line is 128 bytes in all caches.<ref name="cbe-programming-handbok" />{{rp|pages=136–137,141}} Additionally, IBM has included an [[AltiVec]] (VMX) unit<ref name="seminar">{{Cite news |date=February 16, 2005 |title=Power Efficient Processor Design and the Cell Processor |url=http://www.cerc.utexas.edu/vlsi-seminar/spring05/slides/2005.02.16.hph.pdf |url-status=dead |archive-url=https://web.archive.org/web/20050426183838/http://www.cerc.utexas.edu/vlsi-seminar/spring05/slides/2005.02.16.hph.pdf |archive-date=April 26, 2005 |access-date=June 12, 2005 |publisher=IBM}}</ref> which is fully pipelined for [[single precision]] floating point (Altivec 1 does not support [[double precision]] floating-point vectors.), 32-bit [[Arithmetic logic unit|Fixed Point Unit (FXU)]] with 64-bit register file per thread, [[Load–store unit|Load and Store Unit (LSU)]], 64-bit [[Floating-point unit|Floating-Point Unit (FPU)]], [[Branch predictor|Branch Unit (BRU)]] and Branch Execution Unit(BXU).<ref name="cc.gatech.edu" /> PPE consists of three main units: Instruction Unit (IU), Execution Unit (XU), and vector/scalar execution unit (VSU). IU contains L1 instruction cache, branch prediction hardware, instruction buffers, and dependency checking logic. XU contains integer execution units (FXU) and load-store unit (LSU). VSU contains all of the execution resources for FPU and VMX. Each PPE can complete two double-precision operations per clock cycle using a scalar fused-multiply-add instruction, which translates to 6.4 [[GFLOPS]] at 3.2 GHz; or eight single-precision operations per clock cycle with a vector fused-multiply-add instruction, which translates to 25.6 GFLOPS at 3.2 GHz.<ref name="pacellperf">{{Cite web |last=Chen |first=Thomas |last2=Raghavan |first2=Ram |last3=Dale |first3=Jason |last4=Iwata |first4=Eiji |date=November 29, 2005 |title=Cell Broadband Engine Architecture and its first implementation |url=http://www.ibm.com/developerworks/power/library/pa-cellperf/ |url-status=dead |archive-url=https://web.archive.org/web/20121027092540/http://www.ibm.com/developerworks/power/library/pa-cellperf/ |archive-date=October 27, 2012 |access-date=September 9, 2012 |website=IBM developerWorks}}</ref><!-- use of KiB is intentional, please do not modify --> ====Xenon in Xbox 360==== The PPE was designed specifically for the Cell processor but during development, [[Microsoft]] approached IBM wanting a high-performance processor core for its [[Xbox 360]]. IBM complied and made the tri-core [[Xenon (processor)|Xenon processor]], based on a slightly modified version of the PPE with added VMX128 extensions.<ref>{{Cite web |last=Alexander |first=Leigh |date=January 16, 2009 |title=Processing The Truth: An Interview With David Shippy] |url=https://www.gamedeveloper.com/business/processing-the-truth-an-interview-with-david-shippy |website=[[Gamasutra]]}}</ref><ref>{{Cite news |last=Last |first=Jonathan V. |date=December 30, 2008 |title=Playing the Fool |url=https://www.wsj.com/articles/SB123069467545545011 |work=[[Wall Street Journal]]}}</ref> ===Synergistic Processing Element (SPE){{anchor|SPE}}=== {{hatnote|Not to be confused with Signal Processing Engine (SPE), an extension found on [[PowerPC e500]].}} [[File:SPE (cell).png|thumb|SPE]] Each SPE is a dual issue in order processor composed of a "Synergistic Processing Unit",<ref>{{Cite book |url=https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/02E544E65760B0BF87257060006F8F20/$file/SPU_ABI-Specification_1.9.pdf |title=SPU Application Binary Interface Specification |date=July 18, 2008 |access-date=January 24, 2015 |archive-url=https://web.archive.org/web/20141118214923/https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/02E544E65760B0BF87257060006F8F20/$file/SPU_ABI-Specification_1.9.pdf |archive-date=November 18, 2014 |url-status=dead}}</ref> SPU, and a "Memory Flow Controller", MFC ([[Direct memory access|DMA]], [[Memory management unit|MMU]], and [[Bus (computing)|bus]] interface). SPEs do not have any [[branch prediction]] hardware (hence there is a heavy burden on the compiler).<ref name="ibmresearch">{{Cite web |title=IBM Research - Cell |url=http://www.research.ibm.com/cell/ |url-status=dead |archive-url=https://web.archive.org/web/20050614003851/http://www.research.ibm.com/cell/ |archive-date=June 14, 2005 |access-date=June 11, 2005 |website=IBM}}</ref> Each SPE has 6 execution units divided among odd and even pipelines on each SPE : The SPU runs a specially developed [[instruction set]] (ISA) with [[128-bit]] [[SIMD]] organization<ref name="seminar" /><ref name="ibmrpaper" /><ref name="spearch">{{Cite web |date=August 15, 2005 |title=A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor |url=http://www.hotchips.org/archives/hc17/2_Mon/HC17.S1/HC17.S1T1.pdf |url-status=dead |archive-url=https://web.archive.org/web/20080709051040/http://www.hotchips.org/archives/hc17/2_Mon/HC17.S1/HC17.S1T1.pdf |archive-date=July 9, 2008 |access-date=January 1, 2006 |publisher=Hot Chips 17 |df=mdy-all}}</ref> for single and double precision instructions. With the current generation of the Cell, each SPE contains a 256 [[KiB]] [[1T-SRAM|embedded SRAM]] for instruction and data, called [[Scratchpad memory|"Local Storage"]] (not to be mistaken for "Local Memory" in Sony's documents that refer to the VRAM) which is visible to the PPE and can be addressed directly by software. Each SPE can support up to 4 [[GiB]] of local store memory. The local store does not operate like a conventional [[CPU cache]] since it is neither transparent to software nor does it contain hardware structures that predict which data to load. The SPEs contain a 128-bit, 128-entry [[register file]] and measures 14.5 mm<sup>2</sup> on a 90 nm process. An SPE can operate on sixteen 8-bit integers, eight 16-bit integers, four 32-bit integers, or four single-precision floating-point numbers in a single clock cycle, as well as a memory operation. Note that the SPU cannot directly access system memory; the 64-bit virtual memory addresses formed by the SPU must be passed from the SPU to the SPE memory flow controller (MFC) to set up a DMA operation within the system address space. <!-- Far from perfect but trending toward accuracy. Could not find either the virtual address range limit or the physical address range limit. Note that a "system address" on the SPU is an address passed to the SPU DMA controller; the LS has only 2^14 addressable locations (256K/16B) ~~~~ --> In one typical usage scenario, the system will load the SPEs with small programs (similar to [[thread (computing)|threads]]), chaining the SPEs together to handle each step in a complex operation. For instance, a [[set-top box]] might load programs for reading a DVD, video and audio decoding, and display and the data would be passed off from SPE to SPE until finally ending up on the TV. Another possibility is to partition the input data set and have several SPEs performing the same kind of operation in parallel. At 3.2 GHz, each SPE gives a theoretical 25.6 [[GFLOPS]] of single-precision performance. Compared to its [[personal computer]] contemporaries, the relatively high overall floating-point performance of a Cell processor seemingly dwarfs the abilities of the SIMD unit in CPUs like the [[Pentium 4]] and the [[Athlon 64]]. However, comparing only floating-point abilities of a system is a one-dimensional and application-specific metric. Unlike a Cell processor, such desktop CPUs are more suited to the general-purpose software usually run on personal computers. In addition to executing multiple instructions per clock, processors from Intel and AMD feature [[branch predictor]]s. The Cell is designed to compensate for this with compiler assistance, in which prepare-to-branch instructions are created. For double-precision floating-point operations, as sometimes used in personal computers and often used in scientific computing, Cell performance drops by an order of magnitude, but still reaches 20.8 GFLOPS (1.8 GFLOPS per SPE, 6.4 GFLOPS per PPE). The PowerXCell 8i variant, which was specifically designed for double-precision, reaches 102.4 GFLOPS in double-precision calculations.<ref name="ppcnuxpowerxcell">{{Cite web |date=November 2007 |title=Cell successor with turbo mode - PowerXCell 8i |url=http://www.ppcnux.com/?q=node/7144 |url-status=dead |archive-url=https://web.archive.org/web/20090110230213/http://www.ppcnux.com/?q=node/7144 |archive-date=January 10, 2009 |access-date=June 10, 2008 |publisher=PPCNux}}</ref> Tests by IBM show that the SPEs can reach 98% of their theoretical peak performance running optimized parallel matrix multiplication.<ref name="pacellperf" /> [[Toshiba]] has developed a [[co-processor]] powered by four SPEs, but no PPE, called the [[SpursEngine]] designed to accelerate 3D and movie effects in consumer electronics. Each SPE has a local memory of 256 KB.<ref>{{Cite web |title=Supporting OpenMP on Cell |url=http://researcher.watson.ibm.com/researcher/files/us-zsura/iwomp07_cellOMP.pdf |url-status=dead |archive-url=https://web.archive.org/web/20190108125436/https://researcher.watson.ibm.com/researcher/files/us-zsura/iwomp07_cellOMP.pdf |archive-date=January 8, 2019 |website=[[Thomas J. Watson Research Center|IBM T. J Watson Research]]}}</ref> In total, the SPEs have 2 MB of local memory. ===Element Interconnect Bus (EIB)=== The EIB is a communication bus internal to the Cell processor which connects the various on-chip system elements: the PPE processor, the memory controller (MIC), the eight SPE coprocessors, and two off-chip I/O interfaces, for a total of 12 participants in the PS3 (the number of SPU can vary in industrial applications). The EIB also includes an arbitration unit which functions as a set of traffic lights. In some documents, IBM refers to EIB participants as 'units'. The EIB is presently implemented as a circular ring consisting of four 16-byte-wide unidirectional channels which counter-rotate in pairs. When traffic patterns permit, each channel can convey up to three transactions concurrently. As the EIB runs at half the system clock rate the effective channel rate is 16 bytes every two system clocks. At maximum [[Concurrency (computer science)|concurrency]], with three active transactions on each of the four rings, the peak instantaneous EIB bandwidth is 96 bytes per clock (12 concurrent transactions × 16 bytes wide / 2 system clocks per transfer). While this figure is often quoted in IBM literature, it is unrealistic to simply scale this number by processor clock speed. The arbitration unit [[#Bandwidth assessment|imposes additional constraints]]. IBM Senior Engineer [[David Krolak]], EIB lead designer, explains the concurrency model: {{blockquote|A ring can start a new op every three cycles. Each transfer always takes eight beats. That was one of the simplifications we made, it's optimized for streaming a lot of data. If you do small ops, it does not work quite as well. If you think of eight-car trains running around this track, as long as the trains aren't running into each other, they can coexist on the track.<ref name="Krolak">{{Cite web |date=2005-12-06 |title=Meet the experts: David Krolak on the Cell Broadband Engine EIB bus |url=http://www.ibm.com/developerworks/power/library/pa-expert9/ |access-date=2007-03-18 |publisher=IBM}}</ref>}} Each participant on the EIB has one 16-byte read port and one 16-byte write port. The limit for a single participant is to read and write at a rate of 16 bytes per EIB clock (for simplicity often regarded 8 bytes per system clock). Each SPU processor contains a dedicated [[Direct memory access|DMA]] management queue capable of scheduling long sequences of transactions to various endpoints without interfering with the SPU's ongoing computations; these DMA queues can be managed locally or remotely as well, providing additional flexibility in the control model. Data flows on an EIB channel stepwise around the ring. Since there are twelve participants, the total number of steps around the channel back to the point of origin is twelve. Six steps is the longest distance between any pair of participants. An EIB channel is not permitted to convey data requiring more than six steps; such data must take the shorter route around the circle in the other direction. The number of steps involved in sending the packet has very little impact on transfer latency: the clock speed driving the steps is very fast relative to other considerations. However, longer communication distances are detrimental to the overall performance of the EIB as they reduce available concurrency. <!-- thinking about the Krolak interview, I have no justification for using the term hops, they could be opening the circuit end to end for the transaction; still, it seems more likely that it functions in hops and I do not feel like rewriting this passage right now; changed to steps and stepwise after seeing a comment by HappyVR using this term instead ~~~~ --> Despite IBM's original desire to implement the EIB as a more powerful cross-bar, the circular configuration they adopted to spare resources rarely represents a limiting factor on the performance of the Cell chip as a whole. In the worst case, the programmer must take extra care to schedule communication patterns where the EIB is able to function at high concurrency levels. David Krolak explained: {{blockquote|Well, in the beginning, early in the development process, several people were pushing for a crossbar switch, and the way the bus is designed, you could actually pull out the EIB and put in a crossbar switch if you were willing to devote more silicon space on the chip to wiring. We had to find a balance between connectivity and area, and there just was not enough room to put a full crossbar switch in. So we came up with this ring structure which we think is very interesting. It fits within the area constraints and still has very impressive bandwidth.<ref name="Krolak" />}} ====Bandwidth assessment==== At 3.2 GHz, each channel flows at a rate of 25.6 GB/s. Viewing the EIB in isolation from the system elements it connects, achieving twelve concurrent transactions at this flow rate works out to an abstract EIB bandwidth of 307.2 GB/s. Based on this view many IBM publications depict available EIB bandwidth as "greater than 300 GB/s". This number reflects the peak instantaneous EIB bandwidth scaled by processor frequency.<ref>{{Cite web |title=Cell Multiprocessor Communication Network: Built for Speed |url=http://hpc.pnl.gov/people/fabrizio/papers/ieeemicro-cell.pdf |url-status=dead |archive-url=https://web.archive.org/web/20070107202021/http://hpc.pnl.gov/people/fabrizio/papers/ieeemicro-cell.pdf |archive-date=January 7, 2007 |access-date=March 22, 2007 |publisher=IEEE}}</ref> However, other technical restrictions are involved in the arbitration mechanism for packets accepted onto the bus. The IBM Systems Performance group explained: {{blockquote|Each unit on the EIB can simultaneously send and receive 16 bytes of data every bus cycle. The maximum data bandwidth of the entire EIB is limited by the maximum rate at which addresses are snooped across all units in the system, which is one per bus cycle. Since each snooped address request can potentially transfer up to 128 bytes, the theoretical peak data bandwidth on the EIB at 3.2 GHz is 128Bx1.6 GHz {{=}} 204.8 GB/s.<ref name="pacellperf" />}} This quote apparently represents the full extent of IBM's public disclosure of this mechanism and its impact. The EIB arbitration unit, the snooping mechanism, and interrupt generation on segment or page translation faults are not well described in the documentation set as yet made public by IBM.{{Citation needed|date=June 2009}} In practice, effective EIB bandwidth can also be limited by the ring participants involved. While each of the nine processing cores can sustain 25.6 GB/s read and write concurrently, the memory interface controller (MIC) is tied to a pair of XDR memory channels permitting a maximum flow of 25.6 GB/s for reads and writes combined and the two IO controllers are documented as supporting a peak combined input speed of 25.6 GB/s and a peak combined output speed of 35 GB/s. To add further to the confusion, some older publications cite EIB bandwidth assuming a 4 GHz system clock. This reference frame results in an instantaneous EIB bandwidth figure of 384 GB/s and an arbitration-limited bandwidth figure of 256 GB/s. All things considered the theoretic 204.8 GB/s number most often cited is the best one to bear in mind. The ''IBM Systems Performance'' group has demonstrated SPU-centric data flows achieving 197 GB/s on a Cell processor running at 3.2 GHz so this number is a fair reflection on practice as well.<ref name="pacellperf" /> ===Memory and I/O controllers=== Cell contains a dual channel [[Rambus]] XIO macro which interfaces to Rambus [[XDR DRAM|XDR memory]]. The memory interface controller (MIC) is separate from the XIO macro and is designed by IBM. The XIO-XDR link runs at 3.2 Gbit/s per pin. Two 32-bit channels can provide a theoretical maximum of 25.6 GB/s. The I/O interface, also a Rambus design, is known as [[FlexIO]]. The FlexIO interface is organized into 12 lanes, each lane being a unidirectional 8-bit wide point-to-point path. Five 8-bit wide point-to-point paths are inbound lanes to Cell, while the remaining seven are outbound. This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/s outbound, 26 GB/s inbound) at 2.6 GHz. The FlexIO interface can be clocked independently, typ. at 3.2 GHz. 4 inbound + 4 outbound lanes are supporting memory coherency. ==Applications== {{Main|Cell microprocessor implementations}} ===Video processing card=== Some companies, such as [[Leadtek]], have released [[PCI-E]] cards based upon the Cell to allow for "faster than real time" transcoding of [[H.264]], [[MPEG-2]] and [[MPEG-4]] video.<ref>{{Cite web |date=November 12, 2009 |title=Leadtek PxVC1100 MPEG-2/H.264 Transcoding Card |url=http://www.legitreviews.com/leadtek-pxvc1100-mpeg-2h-264-transcoding-card_1134}}</ref> ===Blade server=== On August 29, 2007, IBM announced the [[BladeCenter]] QS21. Generating a measured 1.05 giga–floating point operations per second (gigaFLOPS) per watt, with peak performance of approximately 460 GFLOPS it is one of the most power efficient computing platforms to date. A single BladeCenter chassis can achieve 6.4 tera–floating point operations per second (teraFLOPS) and over 25.8 teraFLOPS in a standard 42U rack.<ref>{{Cite press release |title=IBM Doubles Down on Cell Blade |date=August 29, 2007 |publisher=[[IBM]] |location=Armonk, New York |url=http://www.ibm.com/press/us/en/pressrelease/22258.wss |access-date=July 19, 2017}}</ref> On May 13, 2008, IBM announced the [[BladeCenter]] QS22. The QS22 introduces the PowerXCell 8i processor with five times the double-precision floating point performance of the QS21, and the capacity for up to 32 GB of DDR2 memory on-blade.<ref>{{Cite press release |title=IBM Offers High Performance Computing Outside the Lab |date=May 13, 2008 |publisher=[[IBM]] |location=Armonk, New York |url=http://www.ibm.com/press/us/en/pressrelease/24180.wss |access-date=July 19, 2017}}</ref> IBM has discontinued the Blade server line based on Cell processors as of January 12, 2012.<ref>{{Cite news |last=Morgan |first=Timothy Prickett |date=June 28, 2011 |title=IBM to snuff last Cell blade server |url=https://www.theregister.co.uk/2011/06/28/ibm_kills_qs22_blade/ |access-date=July 19, 2017 |work=The Register}}</ref> ===PCI Express board=== Several companies provide PCI-e boards utilising the IBM PowerXCell 8i. The performance is reported as 179.2 GFlops (SP), 89.6 GFlops (DP) at 2.8 GHz.<ref>{{Cite web |title=Fixstars Press Release |url=http://www.fixstars.com/en/company/press/20080403.html |url-status=dead |archive-url=https://web.archive.org/web/20090105224210/http://www.fixstars.com/en/company/press/20080403.html |archive-date=January 5, 2009 |access-date=August 18, 2008}}</ref><ref>{{Cite web |title=Cell-based coprocessor card runs Linux |url=http://www.linuxdevices.com/news/NS6832279023.html |url-status=dead |archive-url=https://web.archive.org/web/20090502220203/http://www.linuxdevices.com/news/NS6832279023.html |archive-date=May 2, 2009}}</ref> ===Console video games=== [[Sony]]'s [[PlayStation 3]] [[video game console]] was the first production application of the Cell processor, clocked at 3.2 [[GHz]] and containing seven out of eight operational SPEs, to allow Sony to increase the [[Fabrication (semiconductor)#Device test|yield]] on the processor manufacture. Only six of the seven SPEs are accessible to developers as one is reserved by the OS.<ref name="GameDevelMag">{{Cite news |last=Martin Linklater |title=Optimizing Cell Core |work=Game Developer Magazine, April 2007 |pages=15–18 |quote=To increase fabrication yields, Sony ships PlayStation 3 Cell processors with only seven working SPEs. And from those seven, one SPE will be used by the operating system for various tasks, This leaves six SPEs and 1 PPE for game programmers to use.}}</ref> ===Home cinema=== [[File:TOSHIBA 55X1 02.jpg|thumb|B-CAS cards in a Toshiba Cell Regza set-top box, based on the Cell Broadband Engine]] Toshiba has produced [[High-definition television|HDTVs]] using Cell. They presented a system to decode 48 [[Standard-definition television|standard definition]] [[MPEG-2]] streams simultaneously on a [[1080i|1920×1080]] screen.<ref name="techon">{{Cite news |date=April 25, 2005 |title=Toshiba Demonstrates Cell Microprocessor Simultaneously Decoding 48 MPEG-2 Streams |url=http://techon.nikkeibp.co.jp/english/NEWS_EN/20050425/104149/?ST=english |publisher=Tech-On!}}</ref><ref name="IEEE_Spectrum">{{Cite magazine |date=January 1, 2006 |title=Winner: Multimedia Monster |url=http://www.spectrum.ieee.org/jan06/2609 |url-status=dead |archive-url=https://web.archive.org/web/20060118103137/http://www.spectrum.ieee.org/jan06/2609 |archive-date=January 18, 2006 |access-date=January 22, 2006 |magazine=IEEE Spectrum |df=mdy-all}}</ref> This can enable a viewer to choose a channel based on dozens of thumbnail videos displayed simultaneously on the screen. === Laptop PCs === Toshiba produced a laptop, [[Qosmio]] G55, released in 2008, that contains Cell technology embedded into it. Its CPU otherwise is an [[Intel Core]] [[x86]]-based chip as is common on [[Toshiba computers]].<ref>{{Cite web |last=Eaton |first=Kit |date=July 15, 2008 |title=Toshiba Qosmio G55 is First Laptop With Cell Processor Aboard |url=https://gizmodo.com/toshiba-qosmio-g55-is-first-laptop-with-cell-processor-5025238 |access-date=November 22, 2024 |website=Gizmodo |language=en-US}}</ref> ===Supercomputing=== IBM's supercomputer, [[IBM Roadrunner]], was a hybrid of General Purpose x86-64 [[Opteron]] as well as Cell processors. This system assumed the #1 spot on the June 2008 Top 500 list as the first supercomputer to run at [[FLOPS|petaFLOPS]] speeds, having gained a sustained 1.026 petaFLOPS speed using the standard [[LINPACK benchmark]]. IBM Roadrunner used the PowerXCell 8i version of the Cell processor, manufactured using 65 nm technology and enhanced SPUs that can handle double precision calculations in the 128-bit registers, reaching double precision 102 GFLOPs per chip.<ref name="roadrunner">{{Cite web |title=Beyond a Single Cell |url=http://www.cs.utk.edu/~dongarra/cell2006/cell-slides/04-Ken-Koch.pdf |url-status=dead |archive-url=https://web.archive.org/web/20090708143100/http://www.cs.utk.edu/~dongarra/cell2006/cell-slides/04-Ken-Koch.pdf |archive-date=July 8, 2009 |access-date=April 6, 2017 |publisher=Los Alamos National Laboratory |df=mdy-all}}</ref><ref name="cellscientific">{{Cite web |last=Williams |first=Samuel |last2=Shalf |first2=John |last3=Oliker |first3=Leonid |last4=Husbands |first4=Parry |last5=Kamil |first5=Shoaib |last6=Yelick |first6=Katherine |date=2005 |title=The Potential of the Cell Processor for Scientific Computing |url=http://repositories.cdlib.org/cgi/viewcontent.cgi?article=4262&context=lbnl |access-date=April 6, 2017 |publisher=ACM Computing Frontiers}}</ref> ===Cluster computing=== {{Main | PlayStation 3 cluster}} Clusters of [[PlayStation 3]] consoles are an attractive alternative to high-end systems based on Cell blades. Innovative Computing Laboratory, a group led by [[Jack Dongarra]], in the Computer Science Department at the University of Tennessee, investigated such an application in depth.<ref name="scop3">{{Cite web |title=SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3 |url=http://www.netlib.org/netlib/utk/people/JackDongarra/PAPERS/scop3.pdf |url-status=dead |archive-url=https://web.archive.org/web/20081015202416/http://www.netlib.org/netlib/utk/people/JackDongarra/PAPERS/scop3.pdf |archive-date=October 15, 2008 |access-date=May 8, 2007 |publisher=Computer Science Department, University of Tennessee |df=mdy-all}}</ref> Terrasoft Solutions is selling 8-node and 32-node PS3 clusters with [[Yellow Dog Linux]] pre-installed, an implementation of Dongarra's research. As first reported by ''[[Wired (magazine)|Wired]]'' on October 17, 2007,<ref>{{Cite news |last=Gardiner |first=Bryan |date=October 17, 2007 |title=Astrophysicist Replaces Supercomputer with Eight PlayStation 3s |url=https://www.wired.com/techbiz/it/news/2007/10/ps3_supercomputer/ |access-date=October 17, 2007 |work=[[Wired (website)|Wired]]}}</ref> an interesting application of using PlayStation 3 in a cluster configuration was implemented by Astrophysicist [[Gaurav Khanna (physicist)|Gaurav Khanna]], from the Physics department of [[University of Massachusetts Dartmouth]], who replaced time used on supercomputers with a cluster of eight PlayStation 3s. Subsequently, the next generation of this machine, now called the ''[[PlayStation 3]] Gravity Grid'', uses a network of 16 machines, and exploits the Cell processor for the intended application which is binary [[black hole]] coalescence using [[perturbation theory]]. In particular, the cluster performs astrophysical simulations of large [[supermassive black hole]]s capturing smaller compact objects and has generated numerical data that has been published multiple times in the relevant scientific research literature.<ref>{{Cite web |title=PS3 Gravity Grid |url=http://gravity.phy.umassd.edu/ps3.html |publisher=Gaurav Khanna, Associate Professor, College of Engineering, University of Massachusetts Dartmouth}}</ref> The Cell processor version used by the PlayStation 3 has a main CPU and 6 SPEs available to the user, giving the Gravity Grid machine a net of 16 general-purpose processors and 96 vector processors. The machine has a one-time cost of $9,000 to build and is adequate for black-hole simulations which would otherwise cost $6,000 per run on a conventional supercomputer. The black hole calculations are not memory-intensive and are highly localizable, and so are well-suited to this architecture. Khanna claims that the cluster's performance exceeds that of a 100+ Intel Xeon core based traditional Linux cluster on his simulations. The PS3 Gravity Grid gathered significant media attention through 2007,<ref>{{Cite web |last=Gaudin |first=Sharon |date=October 24, 2007 |title=PS3 cluster creates homemade, cheaper supercomputer |url=http://www.computerworld.com/s/article/9043942/PS3_cluster_creates_homemade_cheaper_supercomputer |website=Computerworld}}</ref> 2008,<ref>{{Cite news |last=Highfield |first=Roger |date=February 17, 2008 |title=Why scientists love games consoles |url=https://www.telegraph.co.uk/science/science-news/3325757/Why-scientists-love-games-consoles.html |url-status=dead |archive-url=https://web.archive.org/web/20090906114152/http://www.telegraph.co.uk/science/science-news/3325757/Why-scientists-love-games-consoles.html |archive-date=September 6, 2009 |work=The Daily Telegraph |location=London}}</ref><ref>{{Cite news |last=Peckham |first=Matt |date=December 23, 2008 |title=Nothing Escapes the Pull of a PlayStation 3, Not Even a Black Hole |url=https://www.washingtonpost.com/wp-dyn/content/article/2008/12/22/AR2008122201980.html |work=The Washington Post}}</ref> 2009,<ref>{{Cite web |last=Malik |first=Tariq |date=January 28, 2009 |title=Playstation 3 Consoles Tackle Black Hole Vibrations |url=http://www.space.com/businesstechnology/090128-playstation3-blackholes.html |website=[[Space.com]]}}</ref><ref>{{Cite news |last=Lyden |first=Jacki |date=February 21, 2009 |title=Playstation 3: A Discount Supercomputer? |url=https://www.npr.org/templates/story/story.php?storyId=100969805 |work=[[NPR]]}}</ref><ref>{{Cite news |last=Wallich |first=Paul |date=April 1, 2009 |title=The Supercomputer Goes Personal |url=https://spectrum.ieee.org/the-supercomputer-goes-personal |work=[[IEEE Spectrum]]}}</ref> and 2010.<ref>{{Cite news |date=September 4, 2010 |title=The PlayStation powered super-computer |url=https://www.bbc.co.uk/news/technology-11168150 |work=BBC News}}</ref><ref>{{Cite news |last=Farrell |first=John |date=November 12, 2010 |title=Black Holes and Quantum Loops: More Than Just a Game |url=https://blogs.forbes.com/johnfarrell/2010/11/12/black-holes-and-quantum-loops-more-than-just-a-game/ |work=Forbes}}</ref> The computational Biochemistry and Biophysics lab at the [[Universitat Pompeu Fabra]], in [[Barcelona]], deployed in 2007 a [[BOINC]] system called [[GPUGRID.net|PS3GRID]]<ref>{{Cite web |title=PS3GRID.net |url=http://www.ps3grid.net}}</ref> for collaborative computing based on the CellMD software, the first one designed specifically for the Cell processor. The United States [[Air Force Research Laboratory]] has deployed a PlayStation 3 cluster of over 1700 units, nicknamed the "Condor Cluster", for analyzing [[high-resolution]] [[satellite imagery]]. The Air Force claims the Condor Cluster would be the 33rd largest supercomputer in the world in terms of capacity.<ref>{{Cite web |date=November 30, 2010 |title=Defense Department discusses new Sony PlayStation supercomputer |url=http://blog.cleveland.com/metro/2010/11/defense_department_discusses_n.html}}</ref> The lab has opened up the supercomputer for use by universities for research.<ref>{{Cite web |title=PlayStation 3 Clusters Providing Low-Cost Supercomputing to Universities |url=http://www.govtech.com/technology/PlayStation-3-Providing-Supercomputing-to-Universities.html |url-status=dead |archive-url=https://web.archive.org/web/20130514024226/http://www.govtech.com/technology/PlayStation-3-Providing-Supercomputing-to-Universities.html |archive-date=May 14, 2013 |df=mdy-all}}</ref> ===Distributed computing=== With the help of the computing power of over half a million PlayStation 3 consoles, the distributed computing project [[Folding@home]] has been recognized by ''[[Guinness World Records]]'' as the most powerful distributed network in the world. The first record was achieved on September 16, 2007, as the project surpassed one [[FLOPS|petaFLOPS]], which had never previously been attained by a distributed computing network. Additionally, the collective efforts enabled PS3 alone to reach the petaFLOPS mark on September 23, 2007. In comparison, the world's second-most powerful supercomputer at the time, IBM's [[Blue Gene/L]], performed at around 478.2 teraFLOPS, which means Folding@home's computing power is approximately twice Blue Gene/L's (although the CPU interconnect in Blue Gene/L is more than one million times faster than the mean network speed in Folding@home). As of May 7, 2011, Folding@home runs at about 9.3 x86 petaFLOPS, with 1.6 petaFLOPS generated by 26,000 active PS3s alone. ===Mainframes=== IBM announced on April 25, 2007, that it would begin integrating its Cell Broadband Engine Architecture microprocessors into the company's [[IBM Z|System z]] line of mainframes.<ref>{{Cite magazine |date=April 26, 2007 |title=IBM Mainframes Go 3-D |url=http://www.eweek.com/article2/0,1895,2122352,00.asp?kc=EWEWKEMLP042807BOE1 |access-date=May 18, 2007 |magazine=[[eWeek]]}}</ref> This has led to a [[gameframe]]. ===Password cracking=== The architecture of the processor makes it better suited to hardware-assisted cryptographic [[brute-force attack]] applications than conventional processors.<ref>{{Cite news |date=November 30, 2007 |title=PlayStation speeds password probe |url=http://news.bbc.co.uk/2/hi/technology/7118997.stm |access-date=January 17, 2011 |work=[[BBC News]]}}</ref> ==Software engineering== {{Main|Cell software development}} Due to the flexible nature of the Cell, there are several possibilities for the utilization of its resources, not limited to just different computing paradigms:<ref name="scei">{{Cite news |date=March 9, 2005 |title=CELL: A New Platform for Digital Entertainment |url=http://www.research.scea.com/research/html/CellGDC05/ |url-status=dead |archive-url=https://web.archive.org/web/20051028094125/http://www.research.scea.com/research/html/CellGDC05/ |archive-date=October 28, 2005 |publisher=Sony Computer Entertainment Inc. |df=mdy-all}}</ref> ===Job queue=== The PPE maintains a job queue, schedules jobs in SPEs, and monitors progress. Each SPE runs a "mini kernel" whose role is to fetch a job, execute it, and synchronize with the PPE. ===Self-multitasking of SPEs=== The mini kernel and scheduling is distributed across the SPEs. Tasks are synchronized using [[Mutual exclusion|mutexes]] or [[Semaphore (programming)|semaphores]] as in a conventional [[operating system]]. Ready-to-run tasks wait in a queue for an SPE to execute them. The SPEs use shared memory for all tasks in this configuration. ===Stream processing=== Each SPE runs a distinct program. Data comes from an input stream and is sent to SPEs. When an SPE has terminated the processing, the output data is sent to an output stream. This provides a flexible and powerful architecture for [[stream processing]], and allows explicit scheduling for each SPE separately. Other processors are also able to perform streaming tasks but are limited by the kernel loaded. ===Open source software development=== In 2005, patches enabling Cell support in the Linux kernel were submitted for inclusion by IBM developers.<ref>{{Cite web |last=Bergmann |first=Arnd |date=June 21, 2005 |title=ppc64: Introduce Cell/BPA platform, v3 |url=https://lkml.org/lkml/2005/6/21/390 |access-date=March 22, 2007}}</ref> Arnd Bergmann (one of the developers of the aforementioned patches) also described the Linux-based Cell architecture at [[LinuxTag]] 2005.<ref name="linuxtag">{{Cite web |title=The Cell Processor Programming Model |url=http://www.linuxtag.org/typo3site/freecongress-details.html?talkid=156 |url-status=dead |archive-url=https://web.archive.org/web/20051118073736/http://www.linuxtag.org/typo3site/freecongress-details.html?talkid=156 <!-- Bot retrieved archive --> |archive-date=November 18, 2005 |access-date=June 11, 2005 |website=LinuxTag 2005}}</ref> As of release 2.6.16 (March 20, 2006), the Linux kernel officially supports the Cell processor.<ref>{{Cite web |last=Shankland |first=Stephen |date=March 21, 2006 |title=Linux gets built-in Cell processor support |url=http://news.cnet.com/2100-7344_3-6052314.html |access-date=March 22, 2007 |website=CNET}}</ref> Both PPE and SPEs are programmable in C/C++ using a common API provided by libraries. [[Fixstars Solutions]] provides [[Yellow Dog Linux]] for IBM and Mercury Cell-based systems, as well as for the PlayStation 3.<ref>{{Cite web |title=Terra Soft to Provide Linux for PLAYSTATION3 |url=http://us.fixstars.com/news/2006/2006-10-17.shtml |url-status=dead |archive-url=https://web.archive.org/web/20090330150430/http://us.fixstars.com/news/2006/2006-10-17.shtml |archive-date=March 30, 2009}}</ref> Terra Soft strategically partnered with Mercury to provide a Linux Board Support Package for Cell, and support and development of software applications on various other Cell platforms, including the IBM BladeCenter JS21 and Cell QS20, and Mercury Cell-based solutions.<ref>[http://www.terrasoftsolutions.com/products/mercury/intro.shtml Terra Soft - Linux for Cell, PlayStation PS3, QS20, QS21, QS22, IBM System p, Mercury Cell, and Apple PowerPC] {{webarchive |url=https://web.archive.org/web/20070223120949/http://www.terrasoftsolutions.com/products/mercury/intro.shtml |date=February 23, 2007 }}</ref> Terra Soft also maintains the Y-HPC (High Performance Computing) Cluster Construction and Management Suite and Y-Bio gene sequencing tools. Y-Bio is built upon the RPM Linux standard for package management, and offers tools which help bioinformatics researchers conduct their work with greater efficiency.<ref>{{Cite web |date=August 31, 2007 |title=Y-Bio |url=http://www.terrasoftsolutions.com/products/y-bio/programs.shtml |url-status=dead |archive-url=https://web.archive.org/web/20070902135012/http://www.terrasoftsolutions.com/products/y-bio/programs.shtml |archive-date=September 2, 2007 |df=mdy-all}}</ref> IBM has developed a pseudo-filesystem for Linux coined "Spufs" that simplifies access to and use of the SPE resources. IBM is currently maintaining a Linux [[kernel (operating system)|kernel]] and [[GNU Debugger|GDB]] ports, while Sony maintains the [[GNU toolchain]] ([[GNU Compiler Collection|GCC]], [[GNU Binutils|binutils]]).<ref>{{Cite news |date=June 25, 2005 |title=Arnd Bergmann on Cell |url=http://www.ibm.com/developerworks/power/library/pa-expert4/ |publisher=IBM developerWorks}}</ref><ref>{{Cite journal |last=Gschwind |first=Michael |last2=Erb |first2=David |last3=Manning |first3=Sid |last4=Nutter |first4=Mark |date=June 2007 |title=An Open Source Environment for Cell Broadband Engine System Software |url=https://ieeexplore.ieee.org/document/4249810 |journal=IEEE Computer |volume=40 |issue=6 |pages=37–47 |doi=10.1109/MC.2007.192}}</ref> In November 2005, IBM released a "Cell Broadband Engine (CBE) Software Development Kit Version 1.0", consisting of a simulator and assorted tools, to its web site. Development versions of the latest kernel and tools for [[Fedora Linux|Fedora Core]] 4 are maintained at the [[Barcelona Supercomputing Center]] website.<ref>{{Cite web |title=Linux on Cell BE-based Systems |url=http://www.bsc.es/projects/deepcomputing/linuxoncell/ |url-status=dead |archive-url=https://web.archive.org/web/20070308121821/http://www.bsc.es/projects/deepcomputing/linuxoncell/ |archive-date=March 8, 2007 |access-date=March 22, 2007 |publisher=Barcelona Supercomputing Center}}</ref> In August 2007, Mercury Computer Systems released a Software Development Kit for PlayStation 3 for High-Performance Computing.<ref>{{Cite press release |title=Mercury Computer Systems Releases Software Development Kit for PLAYSTATION(R)3 for High-Performance Computing |date=August 3, 2007 |publisher=[[Mercury Computer Systems]] |url=http://www.mc.com/mediacenter/pressrelease.aspx?id=10454 |url-status=dead |archive-url=https://web.archive.org/web/20070818062554/http://www.mc.com/mediacenter/pressrelease.aspx?id=10454 |archive-date=August 18, 2007}}</ref> In November 2007, Fixstars Corporation released the new "CVCell" module aiming to accelerate several important [[OpenCV]] APIs for Cell. In a series of software calculation tests, they recorded execution times on a 3.2 GHz Cell processor that were between 6x and 27x faster compared with the same software on a 2.4 GHz Intel Core 2 Duo.<ref>{{Cite web |date=November 28, 2007 |title="CVCell" - Module developed by Fixstars that accelerates OpenCV Library for the Cell/B.E. processor |url=http://www.fixstars.com/en/company/press/20071128.html |url-status=dead |archive-url=https://web.archive.org/web/20100717014736/http://www.fixstars.com/en/company/press/20071128.html |archive-date=July 17, 2010 |access-date=December 12, 2008 |publisher=Fixstars Corporation |df=mdy-all}}</ref> In October 2009, IBM released an [[OpenCL]] driver for POWER6 and CBE. This allows programs written in the cross-platform API to be easily run on Cell PSE.<ref>{{Cite web |date=September 2, 2023 |title=IBM Releases OpenCL Drivers for POWER6 and Cell/B.E. |url=https://www.khronos.org/news/permalink/ibm-releases-opencl-drivers-for-power6-and-cell-b.e/ |website=The Khronos Group |language=en}}</ref> ==Gallery== Illustrations of the different generations of Cell/B.E. processors and the PowerXCell 8i. The images are not to scale; All Cell/B.E. packages measures 42.5×42.5 mm and the PowerXCell 8i measures 47.5×47.5 mm. <gallery> File:Cell-BE-90nm-lid.jpg|The 90 nm Cell/B.E. that shipped with the first PlayStation 3. The usual way one would see it is with its lid on, as it is glued on and not easily removed. File:Cell-BE-90nm.jpg|The 90 nm Cell/B.E. that shipped with the first PlayStation 3. It has its [[Decapping|lid removed]] to show the size of the processor die underneath. File:Cell-BE-90-underside.jpg|The underside of the 90 nm Cell/B.E. processor showing its 1242 solder balls, each 0.6 mm in diameter, and its array of 35 capacitors File:Cell-BE-65nm.jpg|The 65 nm Cell/B.E. that shipped with updated PlayStation 3s. It has its lid removed to show the size of the processor die underneath. File:Cell-BE-45nm.jpg|The 45 nm Cell/B.E. that shipped with updated PlayStation 3s such as the Slim and Super Slim versions. It has its lid removed to show the size of the processor die underneath. File:PowerXCell-8i.jpg|The 65 nm high-performance PowerXCell 8i with extra capacitors on top due to decoupling needed for noise introduced by the DDR2 interface </gallery> ==See also== * [[Sony Toshiba IBM Center of Competence for the Cell Processor|STI Center of Competence for the Cell Processor]] * [[Adapteva|Adapteva Epiphany architecture]], a similar network-on-a-chip with local stores and DMA, but more cores and easier off-core communication. * [[Vision Processing Unit]], an emerging class of processor with some similar features * [[Multiprocessor system on a chip]] * [[Cell software development]] * [[Xenon (processor)]] * [[PowerPC]] == Notes == {{notelist}} ==References== {{Reflist|colwidth=30em}} ==External links== * [http://www.ibm.com/developerworks/power/cell/ Cell Broadband Engine resource center] * [https://web.archive.org/web/20050923093239/http://cell.scei.co.jp/ Sony Computer Entertainment Incorporated's Cell resource page] * [http://www.cmpware.com/Docs/ProductBrief_3.0_CellBE.pdf Cmpware Configurable Multiprocessor Development Kit for Cell BE] * [https://web.archive.org/web/20120510093208/http://www.realworldtech.com/page.cfm?ArticleID=rwt021005084318 ISSCC 2005: The CELL Microprocessor, a comprehensive overview of the CELL microarchitecture] * [https://web.archive.org/web/20120209214616/http://members.forbes.com/forbes/2006/0130/076.html Holy Chip!] * [http://www.ibm.com/developerworks/power/library/pa-tacklecell5/index.html The little broadband engine that could] * [https://arstechnica.com/articles/paedia/cpu/cell-1.ars Introducing the IBM/Sony/Toshiba Cell Processor — Part I: the SIMD processing units] * [https://arstechnica.com/articles/paedia/cpu/cell-2.ars Introducing the IBM/Sony/Toshiba Cell Processor -- Part II: The Cell Architecture] * [http://www.gamezero.com/team-0/articles/interviews/dr_h_peter_hofstee/ The Soul of Cell: An interview with Dr. H. Peter Hofstee] {{Sony Corp}} {{Cell microprocessor segments}} {{Navboxes|list1= {{IBM}} {{Sony}} {{Toshiba}} {{PlayStation 3}} {{RISC-based processor architectures}} }} {{DEFAULTSORT:Cell (Microprocessor)}} [[Category:Cell BE architecture|*]] [[Category:IBM microprocessors]] [[Category:PowerPC microprocessors]] [[Category:SIMD computing]] [[Category:Sony semiconductors]] [[Category:Power microprocessors]] [[Category:64-bit microprocessors]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Ambox
(
edit
)
Template:Anchor
(
edit
)
Template:Blockquote
(
edit
)
Template:Cbignore
(
edit
)
Template:Cell microprocessor segments
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite magazine
(
edit
)
Template:Cite news
(
edit
)
Template:Cite press release
(
edit
)
Template:Cite web
(
edit
)
Template:Comma separated entries
(
edit
)
Template:Copy edit
(
edit
)
Template:Dead link
(
edit
)
Template:Error
(
edit
)
Template:Hatnote
(
edit
)
Template:Infobox CPU architecture
(
edit
)
Template:Main
(
edit
)
Template:Main other
(
edit
)
Template:More citations needed
(
edit
)
Template:Navboxes
(
edit
)
Template:Notelist
(
edit
)
Template:POWER, PowerPC, and Power ISA
(
edit
)
Template:Reflist
(
edit
)
Template:Rp
(
edit
)
Template:Short description
(
edit
)
Template:Sony Corp
(
edit
)
Template:US$
(
edit
)
Template:Use mdy dates
(
edit
)
Template:Webarchive
(
edit
)