Editing Parallel computing (section)

====Specialized parallel computers====
Within parallel computing, there are specialized parallel devices that remain niche areas of interest. While not [[Domain-specific programming language|domain-specific]], they tend to be applicable to only a few classes of parallel problems.

=====Reconfigurable computing with field-programmable gate arrays=====
[[Reconfigurable computing]] is the use of a [[field-programmable gate array]] (FPGA) as a co-processor to a general-purpose computer. An FPGA is, in essence, a computer chip that can rewire itself for a given task.

FPGAs can be programmed with [[hardware description language]]s such as [[VHDL]]<ref>{{Cite journal|last1=Valueva|first1=Maria|last2=Valuev|first2=Georgii|last3=Semyonova|first3=Nataliya|last4=Lyakhov|first4=Pavel|last5=Chervyakov|first5=Nikolay|last6=Kaplun|first6=Dmitry|last7=Bogaevskiy|first7=Danil|date=2019-06-20|title=Construction of Residue Number System Using Hardware Efficient Diagonal Function|journal=Electronics|language=en|volume=8|issue=6|pages=694|doi=10.3390/electronics8060694|issn=2079-9292|quote=All simulated circuits were described in very high speed integrated circuit (VHSIC) hardware description language (VHDL). Hardware modeling was performed on Xilinx FPGA Artix 7 xc7a200tfbg484-2.|doi-access=free}}</ref> or [[Verilog]].<ref>{{Cite book|last1=Gupta|first1=Ankit|last2=Suneja|first2=Kriti|title=2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS) |chapter=Hardware Design of Approximate Matrix Multiplier based on FPGA in Verilog |date=May 2020|chapter-url=https://ieeexplore.ieee.org/document/9121004|location=Madurai, India|publisher=IEEE|pages=496–498|doi=10.1109/ICICCS48265.2020.9121004|isbn=978-1-7281-4876-2|s2cid=219990653}}</ref> Several vendors have created [[C to HDL]] languages that attempt to emulate the syntax and semantics of the [[C programming language]], with which most programmers are familiar. The best known C to HDL languages are [[Mitrionics|Mitrion-C]], [[Impulse C]], and [[Handel-C]]. Specific subsets of [[SystemC]] based on C++ can also be used for this purpose.

AMD's decision to open its [[HyperTransport]] technology to third-party vendors has become the enabling technology for high-performance reconfigurable computing.<ref name="DAmour">D'Amour, Michael R., Chief Operating Officer, DRC Computer Corporation. "Standard Reconfigurable Computing". Invited speaker at the University of Delaware, February 28, 2007.</ref> According to Michael R. D'Amour, Chief Operating Officer of DRC Computer Corporation, "when we first walked into AMD, they called us 'the [[CPU socket|socket]] stealers.' Now they call us their partners."<ref name="DAmour"/>

=====General-purpose computing on graphics processing units (GPGPU)=====
{{main|GPGPU}}

[[File:NvidiaTesla.jpg|right|thumbnail|Nvidia's [[Nvidia Tesla|Tesla GPGPU card]]]]

General-purpose computing on [[graphics processing unit]]s (GPGPU) is a fairly recent trend in computer engineering research. GPUs are co-processors that have been heavily optimized for [[computer graphics]] processing.<ref>Boggan, Sha'Kia and Daniel M. Pressel (August 2007). [https://discover.dtic.mil/results/?q=ARL-SR-154 GPUs: An Emerging Platform for General-Purpose Computation]  (PDF). ARL-SR-154, U.S. Army Research Lab. Retrieved on November 7, 2007.</ref> Computer graphics processing is a field dominated by data parallel operations—particularly [[linear algebra]] [[Matrix (mathematics)|matrix]] operations.

In the early days, GPGPU programs used the normal graphics APIs for executing programs. However, several new programming languages and platforms have been built to do general purpose computation on GPUs with both [[Nvidia]] and [[AMD]] releasing programming environments with [[CUDA]] and [[AMD FireStream#Software Development Kit|Stream SDK]] respectively. Other GPU programming languages include [[BrookGPU]], [[PeakStream]], and [[RapidMind]]. Nvidia has also released specific products for computation in their [[Nvidia Tesla|Tesla series]]. The technology consortium Khronos Group has released the [[OpenCL]] specification, which is a framework for writing programs that execute across platforms consisting of CPUs and GPUs. [[AMD]], [[Apple Inc.|Apple]], [[Intel]], [[Nvidia]] and others are supporting [[OpenCL]].

=====Application-specific integrated circuits=====
{{main|Application-specific integrated circuit}}
Several [[application-specific integrated circuit]] (ASIC) approaches have been devised for dealing with parallel applications.<ref>Maslennikov, Oleg (2002). [https://doi.org/10.1007%2F3-540-48086-2_30 "Systematic Generation of Executing Programs for Processor Elements in Parallel ASIC or FPGA-Based Systems and Their Transformation into VHDL-Descriptions of Processor Element Control Units".] ''Lecture Notes in Computer Science'', '''2328/2002:''' p.&nbsp;272.</ref><ref>{{cite book|last=Shimokawa|first=Y.|author2=Fuwa, Y. |author3=Aramaki, N. |title=&#91;Proceedings&#93; 1991 IEEE International Joint Conference on Neural Networks |chapter=A parallel ASIC VLSI neurocomputer for a large number of neurons and billion connections per second speed |date=18–21 November 1991|volume=3|pages=2162–2167|doi=10.1109/IJCNN.1991.170708|isbn=978-0-7803-0227-3|s2cid=61094111}}</ref><ref>{{cite journal|last=Acken|first=Kevin P.|author2=Irwin, Mary Jane |author3=Owens, Robert M.|title=A Parallel ASIC Architecture for Efficient Fractal Image Coding |journal=The Journal of VLSI Signal Processing|date=July 1998|volume=19|issue=2|pages=97–113|doi=10.1023/A:1008005616596|bibcode=1998JSPSy..19...97A |s2cid=2976028}}</ref>

Because an ASIC is (by definition) specific to a given application, it can be fully optimized for that application. As a result, for a given application, an ASIC tends to outperform a general-purpose computer. However, ASICs are created by [[photolithography|UV photolithography]]. This process requires a mask set, which can be extremely expensive. A mask set can cost over a million US dollars.<ref>Kahng, Andrew B. (June 21, 2004) "[http://www.future-fab.com/documents.asp?grID=353&d_ID=2596 Scoping the Problem of DFM in the Semiconductor Industry] {{webarchive|url=https://web.archive.org/web/20080131221732/http://www.future-fab.com/documents.asp?grID=353&d_ID=2596 |date=2008-01-31 }}." University of California, San Diego. "Future design for manufacturing (DFM) technology must reduce design [non-recoverable expenditure] cost and directly address manufacturing [non-recoverable expenditures]—the cost of a mask set and probe card—which is well over $1&nbsp;million at the 90&nbsp;nm technology node and creates a significant damper on semiconductor-based innovation."</ref> (The smaller the transistors required for the chip, the more expensive the mask will be.) Meanwhile, performance increases in general-purpose computing over time (as described by [[Moore's law]]) tend to wipe out these gains in only one or two chip generations.<ref name="DAmour"/> High initial cost, and the tendency to be overtaken by Moore's-law-driven general-purpose computing, has rendered ASICs unfeasible for most parallel computing applications. However, some have been built. One example is the PFLOPS [[RIKEN MDGRAPE-3]] machine which uses custom ASICs for [[molecular dynamics]] simulation.

=====Vector processors=====
{{main|Vector processor}}

[[File:Cray 1 IMG 9126.jpg|right|thumbnail|The [[Cray-1]] is a vector processor.]]

A vector processor is a CPU or computer system that can execute the same instruction on large sets of data. Vector processors have high-level operations that work on linear arrays of numbers or vectors. An example vector operation is ''A'' = ''B'' × ''C'', where ''A'', ''B'', and ''C'' are each 64-element vectors of 64-bit [[floating-point]] numbers.<ref name=PH751>Patterson and Hennessy, p.&nbsp;751.</ref> They are closely related to Flynn's SIMD classification.<ref name=PH751/>

[[Cray]] computers became famous for their vector-processing computers in the 1970s and 1980s. However, vector processors—both as CPUs and as full computer systems—have generally disappeared. Modern [[Instruction set|processor instruction sets]] do include some vector processing instructions, such as with [[Freescale Semiconductor]]'s [[AltiVec]] and [[Intel]]'s [[Streaming SIMD Extensions]] (SSE).