Editing Horner's method (section)

== Polynomial evaluation and long division ==

Given the polynomial
<math display="block">p(x) = \sum_{i=0}^n a_i x^i = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \cdots + a_n x^n,</math>
where <math>a_0, \ldots, a_n</math> are constant coefficients, the problem is to evaluate the polynomial at a specific value <math>x_0</math> of <math>x.</math>

For this, a new sequence of constants is defined [[Recurrence relation|recursively]] as follows:
{{NumBlk||<math display="block">\begin{align}
b_n & := a_n \\
b_{n-1} & := a_{n-1} + b_n x_0 \\
& ~~~ \vdots \\
b_1 & := a_1 + b_2 x_0 \\
b_0 & := a_0 + b_1 x_0.
\end{align}</math>|{{EquationRef|1}}}}
Then <math>b_0</math> is the value of <math>p(x_0)</math>.

To see why this works, the polynomial can be written in the form
<math display="block">p(x) = a_0 + x \bigg(a_1 + x \Big(a_2 + x \big(a_3 + \cdots + x(a_{n-1} + x \, a_n) \cdots \big) \Big) \bigg) \ .</math>

Thus, by iteratively substituting the <math>b_i</math> into the expression,
<math display="block">\begin{align}
p(x_0) & = a_0 + x_0\Big(a_1 + x_0\big(a_2 + \cdots + x_0(a_{n-1} + b_n x_0) \cdots \big)\Big) \\
& = a_0 + x_0\Big(a_1 + x_0\big(a_2 + \cdots + x_0 b_{n-1}\big)\Big) \\
& ~~ \vdots \\
& = a_0 + x_0 b_1 \\
& = b_0.
\end{align}</math>

Now, it can be proven that;
{{NumBlk||<math display="block">
p(x) = \left(b_1 + b_2 x + b_3 x^2 + b_4x^3 + \cdots + b_{n-1} x^{n-2} +b_nx^{n-1}\right) \left(x - x_0\right) + b_0
</math>|{{EquationRef|2}}}}
This expression constitutes Horner's practical application, as it offers a very quick way of determining the outcome of;
<math display="block">p(x) / (x-x_0) </math>
with <math>b_0</math> (which is equal to <math>p(x_0)</math>) being the division's remainder, as is demonstrated by the examples below. If <math>x_0</math> is a root of <math>p(x)</math>, then <math>b_0 = 0</math> (meaning the remainder is <math>0</math>), which means you can factor <math>p(x)</math> as <math>x-x_0</math>. 

To finding the consecutive <math>b</math>-values, you start with determining <math>b_n</math>, which is simply equal to <math>a_n</math>.  Then you then work recursively using the formula:
<math display="block"> b_{n-1} = a_{n-1} + b_{n}x_0 </math>
till you arrive at <math>b_0</math>.

=== Examples ===

Evaluate <math>f(x)=2x^3-6x^2+2x-1</math> for <math>x=3</math>.

We use [[synthetic division]] as follows:

  ''x''{{sub|0}}│   ''x''{{sup|3}}    ''x''{{sup|2}}    ''x''{{sup|1}}    ''x''{{sup|0}}
  3 │   2    −6     2    −1
    │         6     0     6
    └────────────────────────
        2     0     2     5

The entries in the third row are the sum of those in the first two. Each entry in the second row is the product of the {{mvar|x}}-value ({{val|3}} in this example) with the third-row entry immediately to the left. The entries in the first row are the coefficients of the polynomial to be evaluated. Then the remainder of <math>f(x)</math> on division by <math>x-3</math> is {{val|5}}.

But by the [[polynomial remainder theorem]], we know that the remainder is <math>f(3) </math>. Thus, <math>f(3) = 5</math>.

In this example, if <math>a_3 = 2, a_2 = -6, a_1 = 2, a_0 = -1</math> we can see that <math>b_3 = 2, b_2 = 0, b_1 = 2, b_0 = 5 </math>, the entries in the third row. So, synthetic division (which was actually invented and published by Ruffini 10 years before Horner's publication) is easier to use;  it can be shown to be equivalent to Horner's method.

As a consequence of the polynomial remainder theorem, the entries in the third row are the coefficients of the second-degree polynomial, the quotient of <math>f(x)</math> on division by <math> x-3 </math>. 
The remainder is {{val|5}}. This makes Horner's method useful for [[polynomial long division]].

Divide <math>x^3-6x^2+11x-6</math> by <math>x-2</math>:

  2 │   1    −6    11    −6
    │         2    −8     6
    └────────────────────────
        1    −4     3     0

The quotient is <math>x^2-4x+3</math>.

Let <math>f_1(x)=4x^4-6x^3+3x-5</math> and <math>f_2(x)=2x-1</math>. Divide <math>f_1(x)</math> by <math>f_2\,(x)</math> using Horner's method.

   0.5 │ 4  −6   0   3  −5
       │     2  −2  −1   1
       └───────────────────────
         2  −2  −1   1  −4

The third row is the sum of the first two rows, divided by {{val|2}}. Each entry in the second row is the product of {{val|1}} with the third-row entry to the left. The answer is
<math display="block">\frac{f_1(x)}{f_2(x)}=2x^3-2x^2-x+1-\frac{4}{2x-1}.</math>

=== Efficiency ===

Evaluation using the monomial form of a degree <math>n</math> polynomial requires at most <math>n</math> additions and <math>(n^2+n)/2</math> multiplications, if powers are calculated by repeated multiplication and each monomial is evaluated individually.  The cost can be reduced to <math>n</math> additions and <math>2n-1</math> multiplications by evaluating the powers of <math>x</math> by iteration.  

If numerical data are represented in terms of digits (or bits), then the naive algorithm also entails storing approximately <math>2n</math> times the number of bits of <math>x</math>: the evaluated polynomial has approximate magnitude <math>x^n</math>, and one must also store <math>x^n</math> itself.  By contrast, Horner's method requires only <math>n</math> additions and <math>n</math> multiplications, and its storage requirements are only <math>n</math> times the number of bits of <math>x</math>. Alternatively, Horner's method can be computed with <math>n</math> [[fused multiply–add]]s.  Horner's method can also be extended to evaluate the first <math>k</math> derivatives of the polynomial with <math>kn</math> additions and multiplications.<ref>{{harvnb|Pankiewicz|1968}}.</ref>

Horner's method is optimal, in the sense that any algorithm to evaluate an arbitrary polynomial must use at least as many operations. [[Alexander Ostrowski]] proved in 1954 that the number of additions required is minimal.<ref>{{harvnb|Ostrowski|1954}}.</ref> [[Victor Pan]] proved in 1966 that the number of multiplications is minimal.<ref>{{harvnb|Pan|1966}}.</ref> However, when <math>x</math> is a matrix, [[Polynomial evaluation#Matrix polynomials|Horner's method is not optimal]].

This assumes that the polynomial is evaluated in monomial form and no [[preconditioning]] of the representation is allowed, which makes sense if the polynomial is evaluated only once. However, if preconditioning is allowed and the polynomial is to be evaluated many times, then [[Polynomial evaluation#Evaluation with preprocessing|faster algorithms are possible]]. They involve a transformation of the representation of the polynomial. In general, a degree-<math>n</math> polynomial can be evaluated using only {{floor|''n''/2}}+2 multiplications and <math>n</math> additions.<ref>{{harvnb|Knuth|1997}}.</ref>

====Parallel evaluation====
{{see also|Estrin's scheme}}
A disadvantage of Horner's rule is that all of the operations are [[data dependency|sequentially dependent]], so it is not possible to take advantage of [[instruction level parallelism]] on modern computers.  In most applications where the efficiency of polynomial evaluation matters, many low-order polynomials are evaluated simultaneously (for each pixel or polygon in computer graphics, or for each grid square in a numerical simulation), so it is not necessary to find parallelism within a single polynomial evaluation.

If, however, one is evaluating a single polynomial of very high order, it may be useful to break it up as follows:
<math display="block">\begin{align}
p(x)
& = \sum_{i=0}^n a_i x^i \\[1ex]
& = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \cdots + a_n x^n \\[1ex]
& = \left( a_0 + a_2 x^2 + a_4 x^4 + \cdots\right) + \left(a_1 x + a_3 x^3 + a_5 x^5 + \cdots \right) \\[1ex]
& = \left( a_0 + a_2 x^2 + a_4 x^4 + \cdots\right) + x \left(a_1 + a_3 x^2 + a_5 x^4 + \cdots \right) \\[1ex]
& = \sum_{i=0}^{\lfloor n/2 \rfloor} a_{2i} x^{2i} + x \sum_{i=0}^{\lfloor n/2 \rfloor} a_{2i+1} x^{2i} \\[1ex]
& = p_0(x^2) + x p_1(x^2).
\end{align}</math>

More generally, the summation can be broken into ''k'' parts:
<math display="block">p(x)
= \sum_{i=0}^n a_i x^i
= \sum_{j=0}^{k-1} x^j \sum_{i=0}^{\lfloor n/k \rfloor} a_{ki+j} x^{ki}
= \sum_{j=0}^{k-1} x^j p_j(x^k)</math>
where the inner summations may be evaluated using separate parallel instances of Horner's method.  This requires slightly more operations than the basic Horner's method, but allows ''k''-way [[SIMD]] execution of most of them. Modern compilers generally evaluate polynomials this way when advantageous, although for [[floating-point]] calculations this requires enabling (unsafe) reassociative math{{Citation needed|date=February 2022}}. Another use of breaking a polynomial down this way is to calculate steps of the inner summations in an alternating fashion to take advantage of [[instruction-level parallelism]].

===Application to floating-point multiplication and division===
{{main|binary logarithm#Calculation|multiplication algorithm#Shift and add}}

Horner's method is a fast, code-efficient method for multiplication and division of binary numbers on a [[microcontroller]] with no [[binary multiplier|hardware multiplier]].  One of the binary numbers to be multiplied is represented as a trivial polynomial, where (using the above notation) <math>a_i = 1</math>, and <math>x = 2</math>.  Then, ''x'' (or ''x'' to some power) is repeatedly factored out.  In this [[binary numeral system]] (base 2), <math>x = 2</math>, so powers of 2 are repeatedly factored out.

====Example====
For example, to find the product of two numbers (0.15625) and ''m'':
<math display="block">\begin{align}
  (0.15625) m & = (0.00101_b) m = \left( 2^{-3} + 2^{-5} \right) m = \left( 2^{-3})m + (2^{-5} \right)m \\
              & = 2^{-3} \left(m + \left(2^{-2}\right)m\right) = 2^{-3} \left(m + 2^{-2} (m)\right).
\end{align}</math>

====Method====
To find the product of two binary numbers ''d'' and ''m'':
# A register holding the intermediate result is initialized to ''d''.
# Begin with the least significant (rightmost) non-zero bit in ''m''. {{ordered list | list-style-type = lower-alpha | start = 2
| Count (to the left) the number of bit positions to the next most significant non-zero bit.  If there are no more-significant bits, then take the value of the current bit position.
| Using that value, perform a left-shift operation by that number of bits on the register holding the intermediate result}}
# If all the non-zero bits were counted, then the intermediate result register now holds the final result.  Otherwise, add d to the intermediate result, and continue in step 2 with the next most significant bit in ''m''.

====Derivation====
In general, for a binary number with bit values (<math> d_3 d_2 d_1 d_0 </math>) the product is
<math display="block"> (d_3 2^3 + d_2 2^2 + d_1 2^1 + d_0 2^0)m = d_3 2^3 m + d_2 2^2 m + d_1 2^1 m + d_0 2^0 m. </math>
At this stage in the algorithm, it is required that terms with zero-valued coefficients are dropped, so that only binary coefficients equal to one are counted, thus the problem of multiplication or [[division by zero]] is not an issue, despite this implication in the factored equation:
<math display="block"> = d_0\left(m + 2 \frac{d_1}{d_0} \left(m + 2 \frac{d_2}{d_1} \left(m + 2 \frac{d_3}{d_2} (m)\right)\right)\right). </math>

The denominators all equal one (or the term is absent), so this reduces to
<math display="block"> = d_0(m + 2 {d_1} (m + 2 {d_2} (m + 2 {d_3} (m)))),</math>
or equivalently (as consistent with the "method" described above)
<math display="block"> = d_3(m + 2^{-1} {d_2} (m + 2^{-1}{d_1} (m + {d_0} (m)))). </math>

In binary (base-2) math, multiplication by a power of 2 is merely a [[arithmetic shift|register shift]] operation.  Thus, multiplying by 2 is calculated in base-2 by an [[arithmetic shift]].  The factor (2<sup>−1</sup>) is a right [[arithmetic shift]], a (0) results in no operation (since 2<sup>0</sup> = 1 is the multiplicative [[identity element]]), and a (2<sup>1</sup>) results in a left arithmetic shift.
The multiplication product can now be quickly calculated using only arithmetic shift operations, addition and [[subtraction]].

The method is particularly fast on processors supporting a single-instruction shift-and-addition-accumulate.  Compared to a C floating-point library, Horner's method sacrifices some accuracy, however it is nominally 13 times faster (16 times faster when the "[[canonical signed digit]]" (CSD) form is used) and uses only 20% of the code space.<ref>{{harvnb|Kripasagar|2008|p=62}}.</ref>

=== Other applications ===

Horner's method can be used to convert between different positional [[numeral system]]s – in which case ''x'' is the base of the number system, and the ''a''<sub>''i''</sub> coefficients are the digits of the base-''x'' representation of a given number – and can also be used if ''x'' is a [[matrix (math)|matrix]], in which case the gain in computational efficiency is even greater. However, for such cases [[Polynomial evaluation#Matrix polynomials|faster methods]] are known.<ref>{{harvnb|Higham|2002|loc=Section 5.4}}.</ref>