Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sum-addressed decoder
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Overview== The L1 [[Cache (computing)|data cache]] should usually be in the most critical CPU resource, because few things improve [[instructions per cycle]] (IPC) as directly as a larger data cache, a larger data cache takes longer to access, and [[Instruction pipeline|pipelining]] the data cache makes IPC worse. One way of reducing the latency of the L1 data cache access is by fusing the address generation sum operation with the decode operation in the cache SRAM. The address generation sum operation still must be performed, because other units in the memory pipe will use the resulting virtual address. That sum will be performed in parallel with the fused add/decode described here. The most profitable recurrence to accelerate is a load, followed by a use of that load in a chain of integer operations leading to another load. Assuming that load results are bypassed with the same priority as integer results, then it's possible to summarize this recurrence as a load followed by another loadโas if the program was following a linked list. The rest of this page assumes an [[instruction set architecture]] (ISA) with a single addressing mode (register+offset), a virtually indexed data cache, and sign-extending loads that may be variable-width. Most [[RISC]] ISAs fit this description. In ISAs such as the [[Intel]] [[x86]], three or four inputs are summed to generate the virtual address. Multiple-input additions can be reduced to a two-input addition with carry save adders, and the remaining problem is as described below. The critical recurrence, then, is an [[Adder (electronics)|adder]], a [[Binary decoder|decoder]], the SRAM word line, the SRAM bit line(s), the sense amp(s), the byte steering [[multiplexer|muxes]], and the bypass muxes. For this example, a direct-mapped 16 [[Kibibyte|KB]] data cache which returns doubleword (8-byte) aligned values is assumed. Each line of the SRAM is 8 bytes, and there are 2048 lines, addressed by Addr[13:3]. The sum-addressed SRAM idea applies equally well to set associative caches.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)