Editing AltiVec (section)

==Comparison to x86-64 SSE ==

Both VMX/AltiVec and [[Streaming SIMD Extensions|SSE]] feature 128-bit vector registers that can represent sixteen 8-bit signed or unsigned chars, eight 16-bit signed or unsigned shorts, four 32-bit ints or four [[IEEE floating-point standard|32-bit]] floating-point variables.  Both provide [[CPU cache|cache]]-control instructions intended to minimize [[cache pollution]] when working on streams of data.

They also exhibit important differences. Unlike [[SSE2]], VMX/AltiVec supports a special [[RGB color model|RGB]] "[[pixel]]" data type, but it does not operate on 64-bit double-precision floats, and there is no way to move data directly between scalar and [[vector processor|vector]] registers. In keeping with the "load/store" model of the PowerPC's [[RISC]] design, the vector registers, like the scalar registers, can only be loaded from and stored to memory. However, VMX/AltiVec provides a much more complete set of "horizontal" operations that work across all the elements of a vector; the allowable combinations of data type and operations are much more complete. Thirty-two 128-bit vector registers are provided, compared to eight for SSE and SSE2 (extended to 16 in [[x86-64]]), and most VMX/AltiVec instructions take three register operands compared to only two register/register or register/memory operands on [[IA-32]].

VMX/AltiVec is also unique in its support for a flexible vector [[permute instruction]], in which each byte of a resulting vector value can be taken from any byte of either of two other vectors, parametrized by yet another vector.  This allows for sophisticated manipulations in a single instruction.

Recent versions{{when|date=October 2020}} of the [[GNU Compiler Collection]] (GCC), [[IBM VisualAge]] compiler and other compilers provide [[intrinsic function|intrinsics]] to access VMX/AltiVec instructions directly from [[C (programming language)|C]] and [[C++]] programs. As of version 4, the GCC also includes [[Automatic vectorization|auto-vectorization]] capabilities that attempt to intelligently create VMX/Altivec accelerated binaries without the need for the programmer to use intrinsics directly. The "vector" type keyword is introduced to permit the declaration of native vector types, e.g., "<code>vector unsigned char foo;</code>" declares a 128-bit vector variable named "foo" containing sixteen 8-bit unsigned chars. The full complement of arithmetic and binary operators is defined on vector types so that the normal C expression language can be used to manipulate vector variables. There are also overloaded intrinsic functions such as "<code>vec_add</code>" that emit the appropriate [[opcode]] based on the type of the elements within the vector, and very strong type checking is enforced. In contrast, the Intel-defined data types for IA-32 SIMD registers declare only the size of the vector register (128 or 64 bits) and in the case of a 128-bit register, whether it contains integers or floating-point values. The programmer must select the appropriate intrinsic for the data types in use, e.g., "<code>_mm_add_epi16(x,y)</code>" for adding two vectors containing eight 16-bit integers.