Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cell (processor)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Synergistic Processing Element (SPE){{anchor|SPE}}=== {{hatnote|Not to be confused with Signal Processing Engine (SPE), an extension found on [[PowerPC e500]].}} [[File:SPE (cell).png|thumb|SPE]] Each SPE is a dual issue in order processor composed of a "Synergistic Processing Unit",<ref>{{Cite book |url=https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/02E544E65760B0BF87257060006F8F20/$file/SPU_ABI-Specification_1.9.pdf |title=SPU Application Binary Interface Specification |date=July 18, 2008 |access-date=January 24, 2015 |archive-url=https://web.archive.org/web/20141118214923/https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/02E544E65760B0BF87257060006F8F20/$file/SPU_ABI-Specification_1.9.pdf |archive-date=November 18, 2014 |url-status=dead}}</ref> SPU, and a "Memory Flow Controller", MFC ([[Direct memory access|DMA]], [[Memory management unit|MMU]], and [[Bus (computing)|bus]] interface). SPEs do not have any [[branch prediction]] hardware (hence there is a heavy burden on the compiler).<ref name="ibmresearch">{{Cite web |title=IBM Research - Cell |url=http://www.research.ibm.com/cell/ |url-status=dead |archive-url=https://web.archive.org/web/20050614003851/http://www.research.ibm.com/cell/ |archive-date=June 14, 2005 |access-date=June 11, 2005 |website=IBM}}</ref> Each SPE has 6 execution units divided among odd and even pipelines on each SPE : The SPU runs a specially developed [[instruction set]] (ISA) with [[128-bit]] [[SIMD]] organization<ref name="seminar" /><ref name="ibmrpaper" /><ref name="spearch">{{Cite web |date=August 15, 2005 |title=A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor |url=http://www.hotchips.org/archives/hc17/2_Mon/HC17.S1/HC17.S1T1.pdf |url-status=dead |archive-url=https://web.archive.org/web/20080709051040/http://www.hotchips.org/archives/hc17/2_Mon/HC17.S1/HC17.S1T1.pdf |archive-date=July 9, 2008 |access-date=January 1, 2006 |publisher=Hot Chips 17 |df=mdy-all}}</ref> for single and double precision instructions. With the current generation of the Cell, each SPE contains a 256 [[KiB]] [[1T-SRAM|embedded SRAM]] for instruction and data, called [[Scratchpad memory|"Local Storage"]] (not to be mistaken for "Local Memory" in Sony's documents that refer to the VRAM) which is visible to the PPE and can be addressed directly by software. Each SPE can support up to 4 [[GiB]] of local store memory. The local store does not operate like a conventional [[CPU cache]] since it is neither transparent to software nor does it contain hardware structures that predict which data to load. The SPEs contain a 128-bit, 128-entry [[register file]] and measures 14.5 mm<sup>2</sup> on a 90 nm process. An SPE can operate on sixteen 8-bit integers, eight 16-bit integers, four 32-bit integers, or four single-precision floating-point numbers in a single clock cycle, as well as a memory operation. Note that the SPU cannot directly access system memory; the 64-bit virtual memory addresses formed by the SPU must be passed from the SPU to the SPE memory flow controller (MFC) to set up a DMA operation within the system address space. <!-- Far from perfect but trending toward accuracy. Could not find either the virtual address range limit or the physical address range limit. Note that a "system address" on the SPU is an address passed to the SPU DMA controller; the LS has only 2^14 addressable locations (256K/16B) ~~~~ --> In one typical usage scenario, the system will load the SPEs with small programs (similar to [[thread (computing)|threads]]), chaining the SPEs together to handle each step in a complex operation. For instance, a [[set-top box]] might load programs for reading a DVD, video and audio decoding, and display and the data would be passed off from SPE to SPE until finally ending up on the TV. Another possibility is to partition the input data set and have several SPEs performing the same kind of operation in parallel. At 3.2 GHz, each SPE gives a theoretical 25.6 [[GFLOPS]] of single-precision performance. Compared to its [[personal computer]] contemporaries, the relatively high overall floating-point performance of a Cell processor seemingly dwarfs the abilities of the SIMD unit in CPUs like the [[Pentium 4]] and the [[Athlon 64]]. However, comparing only floating-point abilities of a system is a one-dimensional and application-specific metric. Unlike a Cell processor, such desktop CPUs are more suited to the general-purpose software usually run on personal computers. In addition to executing multiple instructions per clock, processors from Intel and AMD feature [[branch predictor]]s. The Cell is designed to compensate for this with compiler assistance, in which prepare-to-branch instructions are created. For double-precision floating-point operations, as sometimes used in personal computers and often used in scientific computing, Cell performance drops by an order of magnitude, but still reaches 20.8 GFLOPS (1.8 GFLOPS per SPE, 6.4 GFLOPS per PPE). The PowerXCell 8i variant, which was specifically designed for double-precision, reaches 102.4 GFLOPS in double-precision calculations.<ref name="ppcnuxpowerxcell">{{Cite web |date=November 2007 |title=Cell successor with turbo mode - PowerXCell 8i |url=http://www.ppcnux.com/?q=node/7144 |url-status=dead |archive-url=https://web.archive.org/web/20090110230213/http://www.ppcnux.com/?q=node/7144 |archive-date=January 10, 2009 |access-date=June 10, 2008 |publisher=PPCNux}}</ref> Tests by IBM show that the SPEs can reach 98% of their theoretical peak performance running optimized parallel matrix multiplication.<ref name="pacellperf" /> [[Toshiba]] has developed a [[co-processor]] powered by four SPEs, but no PPE, called the [[SpursEngine]] designed to accelerate 3D and movie effects in consumer electronics. Each SPE has a local memory of 256 KB.<ref>{{Cite web |title=Supporting OpenMP on Cell |url=http://researcher.watson.ibm.com/researcher/files/us-zsura/iwomp07_cellOMP.pdf |url-status=dead |archive-url=https://web.archive.org/web/20190108125436/https://researcher.watson.ibm.com/researcher/files/us-zsura/iwomp07_cellOMP.pdf |archive-date=January 8, 2019 |website=[[Thomas J. Watson Research Center|IBM T. J Watson Research]]}}</ref> In total, the SPEs have 2 MB of local memory.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)