Editing SPARC (section)

===Instructions===
====Loads and stores====
Load and store instructions have a three-operand format, in that they have two operands representing values for the address and one operand for the register to read or write to. The address is created by adding the two address operands to produce an address. The second address operand may be a constant or a register. Loads take the value at the address and place it in the register specified by the third operand, whereas stores take the value in the register specified by the first operand and place it at the address. To make this more obvious, the [[assembler language]] indicates address operands using square brackets with a plus sign separating the operands, instead of using a comma-separated list. Examples:<ref name=ncsu/>

 ld [%L1+%L2],%L3  !load the 32-bit value at address %L1+%L2 and put the value into %L3
 ld [%L1+8],%L2    !load the value at %L1+8 into %L2
 ld [%L1],%L2      !as above, but no offset, which is the same as +%G0
 st %L1,[%I2]      !store the value in %L1 into the location stored in %I2
 st %G0,[%I1+8]    !clear the memory at %I1+8

Due to the widespread use of non-32-bit data, such as 16-bit or 8-bit integral data or 8-bit bytes in strings, there are instructions that load and store 16-bit half-words and 8-bit bytes, as well as instructions that load 32-bit words. During a load, those instructions will read only the byte or half-word at the indicated location and then either fill the rest of the target register with zeros (unsigned load) or with the value of the uppermost bit of the byte or half-word (signed load). During a store, those instructions discard the upper bits in the register and store only the lower bits. There are also instructions for loading double-precision values used for [[floating-point arithmetic]], reading or writing eight bytes from the indicated register and the "next" one, so if the destination of a load is L1, L1 and L2 will be set. The complete list of load and store instructions for the general-purpose registers in 32-bit SPARC is {{code|LD}}, {{code|ST}}, {{code|LDUB}} (unsigned byte), {{code|LDSB}} (signed byte), {{code|LDUH}} (unsigned half-word), {{code|LDSH}} (signed half-word), {{code|LDD}} (load double), {{code|STB}} (store byte), {{code|STH}} (store half-word), {{code|STD}} (store double).<ref name=ncsu/>

In SPARC V9, registers are 64-bit, and the {{code|LD}} instruction, renamed {{code|LDUW}}, clears the upper 32 bits in the register and loads the 32-bit value into the lower 32 bits, and the {{code|ST}} instruction, renamed {{code|STW}}, discards the upper 32 bits of the register and stores only the lower 32 bits.  The new {{code|LDSW}} instruction sets the upper bits in the register to the value of the uppermost bit of the word and loads the 32-bit value into the lower bits. The new {{code|LDX}} instruction loads a 64-bit value into the register, and the {{code|STX}} instruction stores all 64 bits of the register.

The {{code|LDF}}, {{code|LDDF}}, and {{code|LDQF}} instructions load a single-precision, double-precision, or quad-precision value from memory into a floating-point register; the {{code|STF}}, {{code|STDF}}, and {{code|STQF}} instructions store a single-precision, double-precision, or quad-precision floating-point register into memory.

The [[memory barrier]] instruction, MEMBAR, serves two interrelated purposes: it articulates order constraints among memory references and facilitates explicit control over the completion of memory references. For example, all effects of the stores that appear prior to the MEMBAR instruction must be made visible to all processors before any loads following the MEMBAR can be executed.<ref>{{Cite web |url=https://www.fujitsu.com/hk/imagesgig5/sparc64ixfx-extensions.pdf#page=103 |title=SPARC64 IXfx Extensions Fujitsu Limited Ver 12, 2 Dec. 2013 |pages=103–104 |accessdate=2023-12-17}}</ref>

====ALU operations====
Arithmetic and logical instructions also use a three-operand format, with the first two being the operands and the last being the location to store the result. The middle operand can be a register or a 13-bit signed integer constant; the other operands are registers. Any of the register operands may point to G0; pointing the result to G0 discards the results, which can be used for tests. Examples include:<ref name=ncsu/>

 add %L1,%L2,%L3   !add the values in %L1 and %L2 and put the result in %L3
 add %L1,1,%L1     !increment %L1
 add %G0,%G0,%L4   !clear any value in %L4

The list of mathematical instructions is {{code|ADD}}, {{code|SUB}}, {{code|AND}}, {{code|OR}}, {{code|XOR}}, and negated versions {{code|ANDN}}, {{code|ORN}}, and {{code|XNOR}}. One quirk of the SPARC design is that most arithmetic instructions come in pairs, with one version setting the NZVC condition code bits in the [[status register]], and the other not setting them, with the default being ''not'' to set the codes. This is so that the compiler has a way to move instructions around when trying to fill delay slots. If one wants the condition codes to be set, this is indicated by adding {{code|cc}} to the instruction:<ref name=ncsu/>

 subcc %L1,10,%G0  !compare %L1 to 10 and ignore the result, but set the flags

add and sub also have another modifier, X, which indicates whether the operation should set the carry bit:

 addx %L1,100,%L1  !add 100 to the value in %L1 and track carry

SPARC V7 does not have multiplication or division instructions, but it does have {{code|MULSCC}}, which does one step of a multiplication testing one bit and conditionally adding the multiplicand to the product. This was because {{code|MULSCC}} can complete over one clock cycle in keeping with the RISC philosophy.  SPARC V8 added {{code|UMUL}} (unsigned multiply), {{code|SMUL}} (signed multiply), {{code|UDIV}} (unsigned divide), and {{code|SDIV}} (signed divide) instructions, with both versions that do not update the condition codes and versions that do.  {{code|MULSCC}} and the multiply instructions use the Y register to hold the upper 32 bits of the product; the divide instructions use it to hold the upper 32 bits of the dividend.  The {{code|RDY}} instruction reads the value of the Y register into a general-purpose register; the {{code|WRY}} instruction writes the value of a general-purpose register to the Y register.<ref name="sparc-v8-whitepaper" />{{rp|page=32}}  SPARC V9 added {{code|MULX}}, which multiplies two 64-bit values and produces a 64-bit result, {{code|SDIVX}}, which divides a 64-bit signed dividend by a 64-bit signed divisor and produces a 64-bit signed quotient, and {{code|UDIVX}}, which divides a 64-bit unsigned dividend by a 64-bit unsigned divisor and produces a 64-bit signed quotient; none of those instructions use the Y register.<ref name="sparc-v9-whitepaper" />{{rp|page=199}}

====Branching====
Conditional branches test condition codes in a [[status register]], as seen in many instruction sets such the [[IBM System/360 architecture]] and successors and the [[x86]] architecture. This means that a test and branch is normally performed with two instructions; the first is an ALU instruction that sets the condition codes, followed by a branch instruction that examines one of those flags. The SPARC does not have specialized test instructions; tests are performed using normal ALU instructions with the destination set to %G0. For instance, to test if a register holds the value 10 and then branch to code that handles it, one would:

 subcc %L1,10,%G0 !subtract 10 from %L1, setting the zero flag if %L1 is 10
 be WASEQUAL      !if the zero flag is set, branch to the address marked WASEQUAL

In a conditional branch instruction, the '''icc''' or '''fcc''' field specifies the condition being tested. The 22-bit displacement field is the address, relative to the current PC, of the target, in words, so that conditional branches can go forward or backward up to 8 megabytes. The ''ANNUL'' (A) bit is used to get rid of some delay slots. If it is 0 in a conditional branch, the delay slot is executed as usual. If it is 1, the delay slot is only executed if the branch is taken. If it is not taken, the instruction following the conditional branch is skipped.

There are a wide variety of conditional branches: {{code|BA}} (branch always, essentially a jmp), {{code|BN}} (branch never), {{code|BE}} (equals), {{code|BNE}} (not equals), {{code|BL}} (less than), {{code|BLE}} (less or equal), {{code|BLEU}} (less or equal, unsigned), {{code|BG}} (greater), {{code|BGE}} (greater or equal), {{code|BGU}} (greater unsigned), {{code|BPOS}} (positive), {{code|BNEG}} (negative), {{code|BCC}} (carry clear), {{code|BCS}} (carry set), {{code|BVC}} (overflow clear), {{code|BVS}} (overflow set).<ref name="sparc-v8-whitepaper" />{{rp|119{{hyp}}120}}

The FPU and CP have sets of condition codes separate from the integer condition codes and from each other; two additional sets of branch instructions were defined to test those condition codes. Adding an F to the front of the branch instruction in the list above performs the test against the FPU's condition codes,<ref name="sparc-v8-whitepaper" />{{rp|121{{hyp}}122}} while, in SPARC V8, adding a C tests the flags in the otherwise undefined CP.<ref name="sparc-v8-whitepaper" />{{rp|123{{hyp}}124}}

The {{code|CALL}} (jump to subroutine) instruction uses a 30-bit [[program counter]]-relative ''word'' offset. As the target address is specifying the start of a word, not a byte, 30-bits is all that is needed to reach any address in the 4 gigabyte address space.<ref name=ncsu/> The CALL instruction deposits the return address in register R15, also known as output register O7.

The {{code|JMPL}} (jump and link) instruction is a three-operand instruction, with two operands representing values for the target address and one operand for a register in which to deposit the return address. The address is created by adding the two address operands to produce a 32-bit address. The second address operand may be a constant or a register.

====Large constants====
As the instruction opcode takes up some bits of the 32-bit instruction word, there is no way to load a 32-bit constant using a single instruction. This is significant because addresses are manipulated through registers and they are 32-bits. To ease this, the special-purpose {{code|SETHI}} instruction copies its 22-bit immediate operand into the high-order 22 bits of any specified register, and sets each of the low-order 10 bits to 0. In general use, SETHI is followed by an or instruction with only the lower 10 bits of the value set. To ease this, the assembler includes the {{code|%hi(X)}} and {{code|%lo(X)}} macros. For example:<ref name=ncsu/>

 sethi %hi(0x89ABCDEF),%L1       !sets the upper 22 bits of L1
 or    %L1,%lo(0x89ABCDEF),%L1   !sets the lower 10 bits of L1 by ORing

The hi and lo macros are performed at assembly time, not runtime, so it has no performance hit yet makes it clearer that L1 is set to a single value, not two unrelated ones. To make this even easier, the assembler also includes a "synthetic instruction", {{code|set}}, that performs these two operations in a single line:

 set   0x89ABCDEF,% L1

This outputs the two instructions above if the value is larger than 13 bits, otherwise it will emit a single {{code|ld}} with the value.<ref name=ncsu/>