Editing Digital signal processor (section)

===Software architecture===

By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations. Both traditional and DSP-optimized instruction sets are able to compute any arbitrary operation but an operation that might require multiple [[ARM architecture family|ARM]] or [[x86]] instructions to compute might require only one instruction in a DSP optimized instruction set.

One implication for software architecture is that hand-optimized [[assembly language|assembly-code]] [[Subroutine|routines]] (assembly programs) are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms. Even with modern compiler optimizations hand-optimized assembly code is more efficient and many common algorithms involved in DSP calculations are hand-written in order to take full advantage of the architectural optimizations.

====Instruction sets====
*[[Multiply–accumulate operation|multiply–accumulates]] (MACs, including [[fused multiply–add]], FMA) operations
**used extensively in all kinds of [[matrix (mathematics)|matrix]] operations
***[[convolution]] for filtering
***[[dot product]]
***[[Horner scheme|polynomial evaluation]]
**Fundamental DSP algorithms depend heavily on multiply–accumulate performance
***[[Finite impulse response|FIR filters]]
***[[Fast Fourier transform]] (FFT)
*related instructions:
**[[Single instruction, multiple data|SIMD]]
**[[VLIW]]
*Specialized instructions for [[modular arithmetic|modulo]] addressing in [[circular buffer|ring buffers]] and bit-reversed addressing mode for [[Fast Fourier transform|FFT]] cross-referencing
*DSPs sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.{{Citation needed|date=March 2020|reason=Please link to something that defines 'time-stationary encoding'}}
*Multiple arithmetic units may require [[memory architecture]]s to support several accesses per instruction cycle – typically supporting reading 2 data values from 2 separate data buses and the next instruction (from the instruction cache, or a 3rd program memory) simultaneously.<ref>
[http://users.ece.utexas.edu/~bevans/courses/ee382c/lectures/02_signal_processing/project1.html "Memory and DSP Processors"].
</ref><ref>{{Cite web |url=http://www.bores.com/courses/intro/chips/6_mem.htm |title="DSP processors: memory architectures" |access-date=2020-03-03 |archive-date=2020-02-17 |archive-url=https://web.archive.org/web/20200217084008/http://www.bores.com/courses/intro/chips/6_mem.htm |url-status=dead}}</ref><ref>
[http://www.dspguide.com/ch28/3.htm "Architecture of the Digital Signal Processor"]
</ref><ref>
[https://www.synopsys.com/designware-ip/technical-bulletin/performance-coding-advantages.html "ARC XY Memory DSP Option"].
</ref>
*Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing—such as [[zero-overhead looping]]<ref>
[https://microchipdeveloper.com/dsp0201:zero-overhead-loops "Zero Overhead Loops"].
</ref><ref>
[https://www.analog.com/media/en/dsp-documentation/processor-manuals/ADSP-BF533_hwr_rev3.6.pdf "ADSP-BF533 Blackfin Processor Hardware Reference"].
p. 4-15.
</ref> and hardware loop buffers.<ref>
[https://www.analog.com/media/en/technical-documentation/technical-articles/350395352047424547665311ProgrammingTechniquesForDSPs.pdf "Understanding Advanced Processor Features Promotes Efficient Coding"].
</ref><ref>{{cite book | chapter-url=https://link.springer.com/content/pdf/10.1007/3-540-46423-9_11.pdf | doi=10.1007/3-540-46423-9_11 | chapter=Techniques for Effectively Exploiting a Zero Overhead Loop Buffer | title=Compiler Construction | series=Lecture Notes in Computer Science | date=2000 | last1=Uh | first1=Gang-Ryung | last2=Wang | first2=Yuhong | last3=Whalley | first3=David | last4=Jinturkar | first4=Sanjay | last5=Burns | first5=Chris | last6=Cao | first6=Vincent | volume=1781 | pages=157–172 | isbn=978-3-540-67263-0 }}</ref>

====Data instructions====
*[[Saturation arithmetic]], in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
*[[Fixed-point arithmetic]] is often used to speed up arithmetic processing.
*Single-cycle operations to increase the benefits of [[Pipeline (computing)|pipelining]].

====Program flow====
*[[Floating-point]] unit integrated directly into the [[datapath]]
*[[Pipeline (computing)|Pipelined]] architecture
*Highly parallel [[multiplier–accumulator]]s (MAC units)
*Hardware-controlled [[Control flow#Loops|looping]], to reduce or eliminate the overhead required for looping operations