I was in the library the other day and an introduction to numerical mathematics1 fell into my hands. It's a nice little book, but pretty dense and maybe not ideal for self-study. Already in the second chapter I had to work things out with pen and paper. This is a clean and more verbose version of my paper notes.
A polynomial is defined as the following function. $$ p_n(x) = \sum_{i=0}^n a_i x^i $$ where the coefficients \(a_i\) can be real or complex numbers. If \(a_n \neq 0\), we say that the degree of the polynomial is \(n.\) To evaluate a polynomial for some \(x=x_0\), we can simply plug this value into the formula and calculate the result. Since there are \(n+1\) terms in the sum, we'll need \(n\) additions, and because each term has \(i = 0, 1, \ldots, n\) factors \(x\), we'll need \(1 + 2 + \cdots n = \frac{n (n+1)}{2}\) multiplications.
We can do with less, though. With a bit of clever nesting, we get the same result with only \(n\) additions and \(n\) multiplications. $$ p_n(x) = a_0 + x_0 ( a_1 + x_0 ( a_2 + \ldots + x_0( a_{n-1} + x_0 a_n )) \cdots ) $$
The expression on the right should have \(n-1\) pairs of parentheses if fully expanded. This is called Horner's method.
Instead of the equation above with the handwavy dots, we can also give a short recursive definition of the same method. $$ \begin{align*} b_n &= a_n\\ b_i &= a_i + x_0 b_{i+1},\quad i = n-1, n-2, \ldots, 0 \end{align*} $$
Applying this definition, of course, yields the same result.
Another thing we can do with the recursive definition is to solve for \(a_i = b_i - x_0 b_{i+1}\) and plug this back into our polynomial definition. It's not obvious why we would want to do this, but we'll just go ahead. $$ \begin{align*} p_n(x) &= \sum_{i=0}^n a_i x^i \\ &= \sum_{i=0}^{n-1} a_i x^i + a_n x^n\\ &= \sum_{i=0}^{n-1} ( b_i - x_0 b_{i+1}) x^i + b_n x^n\\ \end{align*} $$
We pulled the base case of the recursion (\(b_n = a_n\)) out of the sum and did the substitution. With a bit of shuffling we get an interesting formula which involves a new polynomial2 \(p_{n-1} = \sum_{i=1}^n b_i x^{i-1}\) and the value of \(p_n\) at \(x=x_0,\) which is \(b_0.\) $$ \begin{align*} p_n(x) &= b_0 + \sum_{i=1}^{n-1} b_i x^i - x_0 \sum_{i=1}^n b_i x^{i-1} + b_n x^n\\ &= b_0 + x \sum_{i=1}^n b_i x^{i-1} - x_0 \sum_{i=1}^n b_i x^{i-1}\\ &= \left( \sum_{i=1}^n b_i x^{i-1} \right) (x - x_0) + b_0\\ &= p_{n-1}(x) (x - x_0) + p_n(x_0) \end{align*} $$
What's the significance of this formula? It's a special case of synthetic division called Ruffini's rule, in which the divisor is a linear factor \(x - x_0.\) The quotient \(p_{n-1}\) is a polynomial of one degree less than the dividend \(p_n.\)
Manually performing such a division is best done in a table, for example, with \(p_3 = 2x^3 + x^2 - 4x -7\) and \(x_0 = 2,\) we can set up the following table.
\(x_0 = 2\) | \(a_3 = 2\) | \(a_2 = 1\) | \(a_1 = -4\) | \(a_0 = -7\) |
\(x_0 b_3 = 4\) | \(x_0 b_2 = 10\) | \(x_0 b_1 = 12\) | ||
\(b_3 = 2\) | \(b_2 = 5\) | \(b_1 = 6\) | \(b_0 = 5\) |
The first row has \(x_0\) and the coefficients of the dividend. From the recursive definition of Horner's rule, we know that \(b_3 = a_3.\) Knowing \(b_3,\) we can calculate the first entry in the second row \(x_0 b_3 = 4\). Adding this value to the coefficient above gives us \(b_2 = a_2 + x_0 b_3 = 5\) (following the second case of Horner's rule), and so on. The result can be read off the last row: \(b_1\) to \(b_3\) are the coefficients of \(p_2,\) and \(b_0 = p_3(x_0)\) is the remainder. $$ p_3(x) = ( 2x^2 + 5x + 6 ) ( x - 2 ) + 5 $$
But the usefulness of Horner's method doesn't end here.
If we take the derivative of \(p_n(x) = p_{n-1}(x) (x - x_0) + p_n(x_0)\), the second summand is dropped because it's constant while we can apply the product rule to the first summand. $$ p_n'(x) = p_{n-1}'(x) (x - x_0) + p_{n-1}(x) $$
What about the second derivative? We simply calculate the first derivative of the first derivative. $$ \begin{align*} p_n''(x) &= p_{n-1}''(x) (x - x_0) + p_{n-1}'(x) + p_{n-1}'(x)\\ &= p_{n-1}''(x) (x - x_0) + 2 \cdot p_{n-1}'(x) \end{align*} $$
I think I see a pattern here. Let's replace the ticks with a superscript number and calculate the third derivative. $$ \begin{align*} p_n^{(3)}(x) &= p_{n-1}^{(3)}(x) (x - x_0) + p_{n-1}^{(2)}(x) + 2 \cdot p_{n-1}^{(2)}(x)\\ &= p_{n-1}^{(3)}(x) (x - x_0) + 3 \cdot p_{n-1}^{(2)}(x) \end{align*} $$
Clearly we're seeing this pattern for \(0 \leq k \leq n\): $$ p_n^{(k)}(x) = p_{n-1}^{(k)}(x) (x - x_0) + k \cdot p_{n-1}^{(k-1)}(x) $$
Now if we're only interested in calculating the derivative at a certain \(x = x_0\), the formula simplifies. $$ p_n^{(k)}(x_0) = k \cdot p_{n-1}^{(k-1)}(x_0) $$
This equation can be applied recursively, so $$ k \cdot p_{n-1}^{(k-1)}(x_0) = k (k-1) \cdot p_{n-2}^{(k-2)}(x_0) $$ and following through with these applications $$ p_n^{(k)}(x_0) = k! \cdot p_{n-k}(x_0). $$
This means that we can calculate the kth derivative of \(p_n\) by \(k+1\) successive applications of Horner's rule. For example, we can continue our table from above. I omit the coefficient names for readability and to avoid confusion between as and bs.
\(x_0 = 2\) | \(2\) | \(1\) | \(-4\) | \(-7\) |
\(4\) | \(10\) | \(12\) | ||
\(2\) | \(5\) | \(6\) | \(5 = p_3(2)\) | |
\(4\) | \(18\) | |||
\(2\) | \(9\) | \(24 = \frac{p'_3(2)}{1!} \) | ||
\(4\) | ||||
\(2\) | \(13 = \frac{p^{(2)}_3(2)}{2!}\) | |||
\(2 = \frac{p^{(3)}_3(2)}{3!}\) |
With each application of Horner's rule, the degree of the polynomial decrements by one, so we need \(n + (n-1) + \ldots + (n-k) = (k+1) \frac{2n - k}{2}\) additions and multiplications.
1 Opfer, Gerhard (2008). Numerische Mathematik für Anfänger, 5. Auflage. Vieweg-Teubner.
2 The indices of the coefficients run from \(1\) through \(n\) here, but that doesn't really matter. To comply with our initial definition of a polynomial, we could simply shift all the \(b\) indices by \(-1\), so \(p_n(x_0) = b_{-1}\), or define the recursion as \(b_{n-1} = a_n,\) \(b_j = a_{j+1} + x_0 b_{j+1}\) for \(j = n-2, n-3, \ldots, 0\) and \(p_n(x_0) = a_0 + x_0 b_0.\) The latter is what the book does, but I found it a bit unwieldly (basically you need to understand the whole derivation before you know why the recursion was set up in this weird way).