Editing Lookup table (section)

== Examples ==

=== Trivial hash function ===
For a [[trivial hash function]] lookup, the unsigned [[raw data]] value is used ''directly'' as an index to a one-dimensional table to extract a result. For small ranges, this can be amongst the fastest lookup, even exceeding binary search speed with zero branches and executing in [[constant time]].<ref>{{cite book|last1=Cormen|first1=Thomas H.|title=Introduction to algorithms|date=2009|publisher=MIT Press|location=Cambridge, Mass.|isbn=9780262033848|pages=253–255|edition=3rd|url=https://mitpress.mit.edu/books/introduction-algorithms|accessdate=26 November 2015}}</ref>

====Counting bits in a series of bytes====
One discrete problem that is expensive to solve on many computers is that of counting the number of bits that are set to 1 in a (binary) number, sometimes called the ''[[Hamming weight|population function]]''. For example, the decimal number "37" is "00100101" in binary, so it contains three bits that are set to binary "1".<ref name="apress11">{{cite book|doi=10.1007/978-1-4302-4159-1_26|publisher=Apress|url=https://link.springer.com/chapter/10.1007/978-1-4302-4159-1_26|title= Developing for Performance. In: packetC Programming |author1=Jungck P.|author2=Dencan R.|author3=Mulcahy D.|year=2011|isbn=978-1-4302-4159-1}}</ref>{{rp|p=282}}

A simple example of [[C (programming language)|C]] code, designed to count the 1 bits in a ''int'', might look like this:{{r|apress11|p=283}}

<syntaxhighlight lang="c">
int count_ones(unsigned int x) {
  int result = 0;
  while (x != 0) {
    x = x & (x - 1);
    result++;
  }
  return result;
}</syntaxhighlight>

The above implementation requires 32 operations for an evaluation of a 32-bit value, which can potentially take several [[Cycles per instruction|clock cycles]] due to [[Control flow|branching]]. It can be "[[loop unrolling|unrolled]]" into a lookup table which in turn uses [[trivial hash function]] for better performance.{{r|apress11|p=282-283}}

The bits array, ''bits_set'' with 256 entries is constructed by giving the number of one bits set in each possible byte value (e.g. 0x00 = 0, 0x01 = 1, 0x02 = 1, and so on). Although a [[Runtime (program lifecycle phase)|runtime]] algorithm can be used to generate the ''bits_set'' array, it's an inefficient usage of clock cycles when the size is taken into consideration, hence a precomputed table is used—although a [[compile time]] script could be used to dynamically generate and append the table to the [[source file]]. Sum of ones in each byte of the [[Integer (computer science)|integer]] can be calculated through [[trivial hash function]] lookup on each byte; thus, effectively avoiding branches resulting in considerable improvement in performance.{{r|apress11|p=284}}

<syntaxhighlight lang="c">
int count_ones(int input_value) {
  union four_bytes {
    int big_int;
    char each_byte[4];
  } operand = input_value;
  const int bits_set[256] = {
      0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4,
      2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4,
      2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6,
      4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5,
      3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6,
      4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
      4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};
  return (bits_set[operand.each_byte[0]] + bits_set[operand.each_byte[1]] +
          bits_set[operand.each_byte[2]] + bits_set[operand.each_byte[3]]);
}}
</syntaxhighlight>

=== Lookup tables in image processing ===
{{original research section|date=October 2021}}
[[File:Red Green Blue 16 bit Look up Table Sample.svg|thumb|right|Red (A), Green (B), Blue (C) 16-bit lookup table file sample. (Lines 14 to 65524 not shown)]]

{{blockquote|"Lookup tables (LUTs) are an excellent technique for optimizing the evaluation of functions that are expensive to compute and inexpensive to cache. ... For data requests that fall between the table's samples, an interpolation algorithm can generate reasonable approximations by averaging nearby samples."<ref>[https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/chapter-24-using-lookup-tables-accelerate-color nvidia gpu gems2 : using-lookup-tables-accelerate-color]</ref>}}

In data analysis applications, such as [[image processing]], a lookup table (LUT) can be used to transform the input data into a more desirable output format.  For example, a grayscale picture of the planet Saturn could be transformed into a color image to emphasize the differences in its rings.

In image processing, lookup tables are often called '''[[3D LUT|LUT]]'''s (or 3DLUT), and give an output value for each of a range of index values. One common LUT, called the ''colormap'' or ''[[palette (computing)|palette]]'', is used to determine the colors and intensity values with which a particular image will be displayed. In [[computed tomography]],  "windowing" refers to a related concept for determining how to display the intensity of measured radiation.

===Discussion===
{{original research section|date=October 2021}}
A classic example of reducing run-time computations using lookup tables is to obtain the result of a [[trigonometry]] calculation, such as the [[sine]] of a value.<ref>{{cite web |author1=Sasao, T.; Butler, J. T.; Riedel, M. D. |title=Application of LUT Cascades to Numerical Function Generators |url=https://apps.dtic.mil/sti/citations/ADA596280 |website=Defence Technical Information Center |publisher=NAVAL POSTGRADUATE SCHOOL MONTEREY CA DEPT OF ELECTRICAL AND COMPUTER ENGINEERING |access-date=17 May 2024}}</ref> Calculating trigonometric functions can substantially slow a computing application. The same application can finish much sooner when it first precalculates the sine of a number of values, for example for each whole number of degrees (The table can be defined as static variables at compile time, reducing repeated run time costs).
When the program requires the sine of a value, it can use the lookup table to retrieve the closest sine value from a memory address, and may also interpolate to the sine of the desired value, instead of calculating by mathematical formula. Lookup tables can thus be used by mathematics [[coprocessor]]s in computer systems. An error in a lookup table was responsible for Intel's infamous [[Pentium FDIV bug|floating-point divide bug]].

Functions of a single variable (such as sine and cosine) may be implemented by a simple array.  Functions involving two or more variables require multidimensional array indexing techniques.  The latter case may thus employ a two-dimensional array of '''power[x][y]''' to replace a function to calculate '''x<sup>y</sup>''' for a limited range of x and y values. Functions that have more than one result may be implemented with lookup tables that are arrays of structures.

As mentioned, there are intermediate solutions that use tables in combination with a small amount of computation, often using [[interpolation]]. Pre-calculation combined with interpolation can produce higher accuracy for values that fall between two precomputed values. This technique requires slightly more time to be performed but can greatly enhance accuracy in applications that require it. Depending on the values being precomputed, [[precomputation]] with interpolation can also be used to shrink the lookup table size while maintaining accuracy.

While often effective, employing a lookup table may nevertheless result in a severe penalty if the computation that the LUT replaces is relatively simple. Memory retrieval time and the complexity of memory requirements can increase application operation time and system complexity relative to what would be required by straight formula computation. The possibility of [[cache pollution|polluting the cache]] may also become a problem. Table accesses for large tables will almost certainly cause a [[cache miss]]. This phenomenon is increasingly becoming an issue as processors outpace memory. A similar issue appears in [[rematerialization]], a [[compiler optimization]]. In some environments, such as the [[Java (programming language)|Java programming language]], table lookups can be even more expensive due to mandatory bounds-checking involving an additional comparison and branch for each lookup.

There are two fundamental limitations on when it is possible to construct a lookup table for a required operation. One is the amount of memory that is available: one cannot construct a lookup table larger than the space available for the table, although it is possible to construct disk-based lookup tables at the expense of lookup time. The other is the time required to compute the table values in the first instance; although this usually needs to be done only once, if it takes a prohibitively long time, it may make the use of a lookup table an inappropriate solution. As previously stated however, tables can be statically defined in many cases.

=== Computing sines ===
Most computers only perform basic arithmetic operations and cannot directly calculate the [[sine]] of a given value. Instead, they use the [[CORDIC]] algorithm or a complex formula such as the following [[Taylor series]] to compute the value of sine to a high degree of precision:<ref name="sharif14">{{cite journal|journal= Journal of Circuits, Systems and Computers|volume=23|number=4|year=2014|first=Haidar|last=Sharif|doi=10.1142/S0218126614500510|url=https://www.worldscientific.com/doi/abs/10.1142/S0218126614500510|title=High-performance mathematical functions for single-core architectures|publisher=World Scientific}}</ref>{{rp|p=5}}

:<math>\operatorname{sin}(x) \approx x - \frac{x^3}{6} + \frac{x^5}{120} - \frac{x^7}{5040}</math> (for ''x'' close to 0)

However, this can be expensive to compute, especially on slow processors, and there are many applications, particularly in traditional [[computer graphics]], that need to compute many thousands of sine values every second. A common solution is to initially compute the sine of many evenly distributed values, and then to find the sine of ''x'' we choose the sine of the value closest to ''x'' through array indexing operation. This will be close to the correct value because sine is a [[continuous function]] with a bounded rate of change.{{r|sharif14|p=6}} For example:<ref>{{cite book|title= The Art of Assembly Language, 2nd Edition |author=Randall Hyde|date=1 March 2010|publisher=No Starch Press|isbn=978-1593272074|url=https://www.ic.unicamp.br/~pannain/mc404/aulas/pdfs/Art%20Of%20Intel%20x86%20Assembly.pdf|via=University of Campinas Institute of Computing}}</ref>{{rp|p=545–548}}

<syntaxhighlight lang="abap">
real array sine_table[-1000..1000]
for x from -1000 to 1000
    sine_table[x] = sine(pi * x / 1000)

function lookup_sine(x)
    return sine_table[round(1000 * x / pi)]
</syntaxhighlight>

[[File:Interpolation example linear.svg|thumb|Linear interpolation on a portion of the sine function|right]]

Unfortunately, the table requires quite a bit of space: if IEEE double-precision floating-point numbers are used, over 16,000 bytes would be required. We can use fewer samples, but then our precision will significantly worsen. One good solution is [[linear interpolation]], which draws a line between the two points in the table on either side of the value and locates the answer on that line. This is still quick to compute, and much more accurate for [[smooth function]]s such as the sine function. Here is an example using linear interpolation:

<syntaxhighlight lang="abap">
function lookup_sine(x)
    x1 = floor(x*1000/pi)
    y1 = sine_table[x1]
    y2 = sine_table[x1+1]
    return y1 + (y2-y1)*(x*1000/pi-x1)
</syntaxhighlight>

Linear interpolation provides for an interpolated function that is continuous, but will not, in general, have continuous [[derivative]]s.  For smoother interpolation of table lookup that is continuous '''and''' has continuous [[first derivative]], one should use the [[cubic Hermite spline#Interpolation on the unit interval with matched derivatives at endpoints|cubic Hermite spline]].

When using interpolation, the size of the lookup table can be reduced by using ''[[nonuniform sampling]]'', which means that where the function is close to straight, we use few sample points, while where it changes value quickly we use more sample points to keep the approximation close to the real curve. For more information, see [[interpolation]].