Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
SSE3
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|CPU instruction set}} {{Distinguish|SSSE3}} '''SSE3''', '''Streaming SIMD Extensions 3''', also known by its [[Intel]] code name '''Prescott New Instructions''' ('''PNI'''),<ref name=":1">{{Cite web |last1=Shimpi |first1=Anand Lal |last2=Wilson |first2=Derek |title=Intel's Pentium 4 E: Prescott Arrives with Luggage |url=https://www.anandtech.com/show/1230 |access-date=2023-04-10 |website=www.anandtech.com}}</ref> is the third iteration of the [[Streaming SIMD Extensions|SSE]] instruction set for the [[IA-32]] (x86) architecture. Intel introduced SSE3 in early 2004 with the [[Pentium 4#Prescott|Prescott]] revision of their [[Pentium 4]] CPU.<ref name=":1" /> In April 2005, [[AMD]] introduced a subset of SSE3 in revision E (Venice and San Diego) of their [[Athlon 64]] CPUs.<ref>{{Cite web |last=Shimpi |first=Anand Lal |title=Industry Update - Q4-2004: AMD adds SSE3 Support, Intel's 925/915 not selling and more |url=https://www.anandtech.com/show/1532 |access-date=2023-04-10 |website=www.anandtech.com}}</ref> The earlier [[SIMD]] instruction sets on the [[x86]] platform, from oldest to newest, are [[MMX (instruction set)|MMX]], [[3DNow!]] (developed by AMD, no longer supported on newer CPUs), [[Streaming SIMD Extensions|SSE]], and [[SSE2]]. SSE3 contains 13 new instructions over [[SSE2]].<ref>{{Cite web |title=Intel Instruction Set Extensions Technology |url=https://www.intel.com/content/www/us/en/support/articles/000005779/processors.html |access-date=2023-04-10 |website=Intel |language=en}}</ref> ==Changes== The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added.<ref name=":2">{{Cite web |last=Wright |first=Christopher |title=SSE3 Instruction Set |url=https://softpixel.com/~cwright/programming/simd/sse3.php |access-date=2023-04-10 |website=softpixel.com |language=en}}</ref> These instructions can be used to speed up the implementation of a number of [[Digital signal processing|DSP]] and [[3D computer graphics|3D]] operations. There is also a new instruction to convert floating point values to integers without having to change the global rounding mode, thus avoiding costly [[Instruction pipeline|pipeline]] stalls. Finally, the extension adds <code>LDDQU</code>, an alternative misaligned integer vector load that has better performance on [[NetBurst]] based platforms for loads that cross cacheline boundaries.<ref>{{Cite web |title=LDDQU β Load Unaligned Integer 128 Bits |url=https://www.felixcloutier.com/x86/lddqu |access-date=2023-04-10 |website=www.felixcloutier.com}}</ref> ==CPUs with SSE3== *[[AMD]]: **[[Opteron]] (since Stepping E4<ref>{{Cite web |last=Wilson |first=Derek |title=AMD K8 E4 Stepping: SSE3 Performance |url=https://www.anandtech.com/show/1618 |access-date=2023-04-10 |website=www.anandtech.com}}</ref>) **[[Sempron]] (since Palermo. Stepping E3) **[[Athlon 64]] (since Venice Stepping E3 and San Diego Stepping E4) **[[Athlon 64|Athlon 64 FX]] (since San Diego Stepping E4) **[[Athlon 64 X2]] **[[Phenom 64 X2]] **[[AMD Turion|Turion]] family **[[AMD 10h|K10]] family **[[AMD Accelerated Processing Unit|APU]] family (including without GPU) **[[AMD FX|FX Series]] ** [[Zen (microarchitecture)|Zen]] family *[[Intel]]: **[[Celeron D]] **[[Celeron]] (starting with Core microarchitecture) **[[Pentium 4]] (since Prescott) **[[Pentium D]] **[[Pentium Extreme Edition]] (but NOT Pentium 4 Extreme Edition) **[[Pentium Dual-Core]] **[[Pentium]] (starting with Core microarchitecture) **[[Intel Core|Core]] **[[Xeon]] (since Nocona<ref>{{Cite web |date=2004-08-18 |title=Intel Xeon 3.4GHz ['Nocona' core] |url=https://hexus.net/business/reviews/enterprise/822-intel-xeon-34ghz-nocona-core/ |access-date=2023-04-10 |website=HEXUS}}</ref>) **[[Intel Atom|Atom]] *[[VIA Technologies|VIA]]/[[Centaur Technology|Centaur]]: **[[VIA C7|C7]] **[[VIA Nano|Nano]] *[[Transmeta Efficeon]] TM88xx with Code Morphing software update (NOT Model Numbers TM86xx) ==New instructions== ===Common instructions=== ====Arithmetic==== ;<code>ADDSUBPD</code> :''Add-Subtract-Packed-Double''<ref name=":0">{{Cite web |title=SSE3 Instructions - x86 Assembly Language Reference Manual |url=https://docs.oracle.com/cd/E53394_01/html/E54851/gntby.html |access-date=2023-04-10 |website=docs.oracle.com}}</ref> :*Input: { A0, A1 }, { B0, B1 } :*Output: { A0 β B0, A1 + B1 } ;<code>ADDSUBPS</code> :''Add-Subtract-Packed-Single''<ref name=":0" /> :* Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } :* Output: { A0 β B0, A1 + B1, A2 β B2, A3 + B3 } ====AOS ( Array Of Structures )==== ;<code>HADDPD</code> :''Horizontal-Add-Packed-Double''<ref name=":0" /> :* Input: { A0, A1 }, { B0, B1 } :* Output: { A0 + A1, B0 + B1 } ;<code>HADDPS</code> :''Horizontal-Add-Packed-Single''<ref name=":0" /> :* Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } :* Output: { A0 + A1, A2 + A3, B0 + B1, B2 + B3 } ;<code>HSUBPD</code> :''Horizontal-Subtract-Packed-Double''<ref name=":0" /> :* Input: { A0, A1 }, { B0, B1 } :* Output: { A0 β A1, B0 β B1 } ;<code>HSUBPS</code> :''Horizontal-Subtract-Packed-Single''<ref name=":0" /> :* Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } :* Output: { A0 β A1, A2 β A3, B0 β B1, B2 β B3 } ;<code>LDDQU</code> :As stated above, this is an alternative misaligned integer vector load.<ref name=":0" /> It can be helpful for video compression tasks. ;<code>[[MOVDDUP]]</code>, <code>MOVSHDUP</code>, <code>MOVSLDUP</code><ref name=":2" /> :These are useful for complex numbers and wave calculation like sound. ;<code>FISTTP</code> :Like the older x87 <code>FISTP</code> instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead.<ref name=":2" /> Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard. ===Other instructions=== ;<code>MONITOR</code>, <code>MWAIT</code> :The <code>MONITOR</code> instruction is used to specify a memory address for monitoring, while the <code>MWAIT</code> instruction puts the processor into a low-power state and waits for a write event to the monitored address.<ref name=":2" /> ==References== {{reflist}} ==External links== *[https://web.archive.org/web/20060531094837/http://www.xbitlabs.com/articles/cpu/display/prescott_10.html X-bit Labs] {{Multimedia extensions}} {{DEFAULTSORT:Sse3}} [[Category:X86 instructions]] [[Category:SIMD computing]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite web
(
edit
)
Template:Distinguish
(
edit
)
Template:Multimedia extensions
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)