Fundamental Theorem of Algebra: Two Proofs

Introduction

Fundamental theorem of algebra is one of the most famous results provided in higher secondary courses of mathematics. Normally it is mentioned in chapter related to complex numbers where the reader is made aware of the power of complex numbers in solving polynomial equations. The theorem guarantees that any non-constant polynomial with real or complex coefficients has a complex root.

At the same time this famous theorem also belongs to the category of "theorems whose proof lies beyond the scope of the book/syllabus." I somehow don't see why its statement belongs to the syllabus or what purpose does it really have in an introductory chapter on complex numbers unless just to let the students keep wondering and finally forgetting it. Anyway this theorem has been proved in a variety of ways and the first proof which I read was presented in my favorite book A Course of Pure Mathematics. In this book Hardy provided two proofs of this significant theorem, both of which have analytical flavor. Since this theorem is ultimately dependent on the completeness of real number system, it is useless to expect a proof devoid of analytical processes. However, I found two proofs one of which uses elementary results from analysis and the other one uses the bare minimum of analysis and is mostly algebraical in nature. These I will present in the current post.

First Proof

Let us first prove simpler cases of this theorem. In particular we start with binomial equation of the form $z^{n} = a + ib$ where $n$ is a positive integer and $a + ib$ is some fixed complex number. For $n = 1$ the solution is obvious and for $n = 2$ we have the standard formula for square root of a complex number: $$z = \pm\sqrt{\frac{\sqrt{a^{2} + b^{2}} + a}{2}} \pm i\sqrt{\frac{\sqrt{a^{2} + b^{2}} - a}{2}}$$ where both the $\pm$ signs are same if $b > 0$ and opposite if $b < 0$ (if $b = 0$ then $z$ is either purely real or purely imaginary and then the $\pm$ sign gives both values of $z$). In the use of above formula we implicitly assume that non-negative real numbers possess square roots (check $\sqrt{a^{2} + b^{2}}$ and $\sqrt{\left\{\sqrt{a^{2} + b^{2}} \pm a\right\} / 2}$ above). This simple fact is a non-algebraic property of the real number system.

If $n > 2$ then we can always write $n = 2^{p}m$ where $p \geq 0$ is an integer and $m$ is an odd positive integer. By a series of $p$ square root extractions we then reach an equation of the form $z^{m} = c + id$. Hence there is no loss of generality if we assume $n$ to be odd in the starting equation. Thus we have to solve $z^{n} = a + ib$ where $n$ is odd. Now if $b = 0$ then the equation has real coefficients and then we know that every polynomial equation of odd degree with real coefficients has a real root. Hence we can assume that $b \neq 0$. And then we expect $z$ to have non-zero imaginary part.

Let us write $z = c(d + i)$ where $c, d$ are real with $c \neq 0$ and then we have $$c^{n}(d + i)^{n} = a + ib,\,\, \Rightarrow c^{n} = \frac{a + ib}{(d + i)^{n}}$$ Since $c^{n}$ is real we can change $i$ with $-i$ in expression for $c^{n}$ above to get \begin{align}&c^{n} = \frac{a + ib}{(d + i)^{n}} = \frac{a - ib}{(d - i)^{n}}\notag\\ &\Rightarrow (a - ib)(d + i)^{n} - (a + ib)(d - i)^{n} = 0\notag\\ &\Rightarrow i(a - ib)(d + i)^{n} - i(a + ib)(d - i)^{n} = 0\tag{1}\end{align} The LHS of this equation does not change by replacing $i$ with $-i$ and hence it is real. And if we write the LHS in the form of $a_{0}d^{n} + a_{1}d^{n - 1} + \cdots + a_{n}$ all the coefficients $a_{0}, a_{1}, a_{2},\ldots$ will be real. Also it can be seen that $a_{0} = i\{a -ib -a - ib\} = 2b \neq 0$. Thus we have an equation in $d$ which is of odd degree and having real coefficients. Thus there is a real number $d$ which satisfies equation $(1)$. And because of the way this equation was formed the expression $(a + ib)/(d + i)^{n}$ is real and hence we have another equation $c^{n} = (a + ib)/(d + i)^{n}$ in $c$ which is of odd degree and with real coefficients. It follows that there is a real number $c$ satisfying this equation. Hence we have found a complex number $z = c(d + i) = cd + ic$ which satisfies $z^{n} = a + ib$. We have thus shown that given any positive integer $n$ and any complex number $a + ib$ there is a complex number $z$ which satisfies $z^{n} = a + ib$.

The above solution of the binomial equation is taken from the book The Taylor Series by P. Dienes and this seems to be the smartest and shortest solution of the binomial equation. To proceed further we will need another result from real analysis namely the fact that a continuous function on a closed interval attains its extreme values. Actually a generalization of this result for functions of two variables is needed. The proof for the two variable version is similar to one provided in the linked post.

Let us now focus on the general polynomial equation $$a_{0}z^{n} + a_{1}z^{n - 1} + a_{2}z^{n - 2} + \cdots + a_{n} = 0\tag{2}$$ where $a_{0}, a_{1}, a_{2}, \ldots$ are arbitrary complex with $a_{0} \neq 0$. If $a_{n} = 0$ then $z = 0$ satisfies $(2)$ and hence there is nothing to prove. Therefore we also assume $a_{n} \neq 0$. We have to establish that there is a complex number $z$ satisfying the above equation. Let the LHS of equation $(2)$ be denoted by $F(z)$. If $z = x + iy$ then we can write $F(x + iy) = f(x, y) + ig(x, y)$ where $f, g$ are polynomials in $x, y$ with real coefficients. It follows that $|F(x + iy)| = \sqrt{f^{2}(x, y) + g^{2}(x, y)}$ is a function of two real variables which is continuous everywhere. We will denote this function by the symbol $|F(z)|$.

We have \begin{align} |F(z)| &= |a_{0}z^{n} + a_{1}z^{n - 1} + a_{2}z^{n - 2} + \cdots + a_{n}|\notag\\ &= |a_{0}z^{n}|\left|1 + \frac{a_{1}}{z} + \frac{a_{2}}{z^{2}} + \cdots + \frac{a_{n}}{z^{n}}\right|\notag\\ &\geq |a_{0}z^{n}|\left|1 - \left|\frac{a_{1}}{z} + \frac{a_{2}}{z^{2}} + \cdots + \frac{a_{n}}{z^{n}}\right|\right| \tag{3}\end{align} Let $A$ denote the maximum of $|a_{1}|, |a_{2}|, \ldots, |a_{n}|$ and let $R > A + 1$ be a real number. If $|z| \geq R$ then \begin{align} \left|\frac{a_{1}}{z} + \frac{a_{2}}{z^{2}} + \cdots + \frac{a_{n}}{z^{n}}\right| &\leq \left|\frac{a_{1}}{z}\right| + \left|\frac{a_{2}}{z^{2}}\right| + \cdots + \left|\frac{a_{n}}{z^{n}}\right|\notag\\ &\leq \frac{A}{R} + \frac{A}{R^{2}} + \cdots + \frac{A}{R^{n}}\notag\\ &\leq \frac{A}{R - 1} < 1\notag\end{align} and therefore \begin{align} \left|1 - \left|\frac{a_{1}}{z} + \frac{a_{2}}{z^{2}} + \cdots + \frac{a_{n}}{z^{n}}\right|\right| &\geq 1 - \frac{A}{R - 1}\notag\\ &= \frac{R - A - 1}{R - 1}\notag\end{align} It follows from $(3)$ that $$|F(z)| \geq |a_{0}|R^{n}\frac{R - A - 1}{R - 1}$$ whenever $|z| \geq R$.

Since the expression $|a_{0}|R^{n}(R - A - 1)/(R - 1) \to \infty$ as $R \to \infty$ it follows that we can choose a suitable value of $R > A + 1$ such that $|a_{0}|R^{n}(R - A - 1)/(R - 1) > |a_{n}|$. It thus follows that there is a real number $R > 0$ with the property that $|F(z)| > |a_{n}| = |F(0)|$ whenever $|z| \geq R$.

Let us now consider the function $|F(z)|$ in the closed disk $|z| \leq R$. Clearly by continuity the function $|F(z)|$ attains its minimum value, say $m$, in the closed disk $|z| \leq R$. Since $|F(0)| = |a_{n}| < |F(z)|$ when $|z| = R$, it follows that the minimum value $m$ of $|F(z)|$ is attained at an interior point $z_{0}$ so that $|z_{0}| < R$. We will now show that $|F(z_{0})| = m = 0$ and thereby $F(z_{0}) = 0$ so that $z_{0}$ is a root of the equation $(2)$.

On the contrary let us assume that $|F(z_{0})| = m > 0$. Let $z = z_{0} + h$ and then we can write \begin{align}F(z) &= F(z_{0} + h)\notag\\ &= a_{0}(z_{0} + h)^{n} + a_{1}(z_{0} + h)^{n - 1} + \cdots + a_{n - 1}(z_{0} + h) + a_{n}\notag\\ &= (a_{0}z_{0}^{n} + a_{1}z_{0}^{n - 1} + \cdots + a_{n - 1}z_{0} + a_{n}) + b_{1}h + b_{2}h^{2} + \cdots + b_{n}h^{n}\notag\\ &= F(z_{0}) + b_{1}h + b_{2}h^{2} + \cdots + b_{n}h^{n}\notag\end{align} where $b_{1}, b_{2}, \ldots$ are dependent only on $z_{0}$ and $b_{n} = a_{0} \neq 0$.

Let $b_{k}$ be the first non-zero coefficient and then we can write $$F(z) = F(z_{0}) + b_{k}h^{k}\left(1 + \frac{b_{k + 1}}{b_{k}}h + \cdots + \frac{b_{n}}{b_{k}}h^{n - k}\right) = F(z_{0}) + b_{k}h^{k}(1 + g(h))$$ where $g(h)$ tends to zero with $h$. Therefore \begin{align} |F(z)| &= |F(z_{0}) + b_{k}h^{k} + b_{k}h^{k}g(h)|\notag\\ &\leq |F(z_{0}) + b_{k}h^{k}| + |b_{k}h^{k}g(h)|\notag\end{align} Since $g(h)$ tends to zero with $h$ we can find a $\delta > 0$ such that $|g(h)| < 1$ whenever $|h| \leq \delta$. Let us choose $\delta$ so small that $|b_{k}|\delta^{k}/m < 1$ and $|g(h)| < 1$ for $|h| \leq \delta$. Also $\delta$ should satisfy another condition $\delta < R - |z_{0}|$ so that the disk $|z - z_{0}| \leq \delta$ lies entirely in the interior of the disk $|z| \leq R$.

We have already established that binomial equation has a complex root therefore it is possible to find a complex number $h$ satisfying the binomial equation (of degree $k$) $b_{k}h^{k} = -F(z_{0})|b_{k}|\delta^{k}/m$ and with this value of $h$ we will also have $|h| = \delta$ so that $|g(h)| < 1$ and hence \begin{align} |F(z)| &\leq |F(z_{0}) + b_{k}h^{k}| + |b_{k}h^{k}g(h)|\notag\\ &< \left|F(z_{0}) - F(z_{0})\frac{|b_{k}|\delta^{k}}{m}\right| + |b_{k}h^{k}|\notag\\ &= |F(z_{0})| \left|1 - \frac{|b_{k}|\delta^{k}}{m}\right| + |b_{k}|\delta^{k}\notag\\ &= m\left(1 - \frac{|b_{k}|\delta^{k}}{m}\right) + |b_{k}|\delta^{k}\notag\\ &= m = |F(z_{0})|\notag\end{align} Thus we have found a value $z = z_{0} + h$ with $|z| < R$ such that $|F(z)| < |F(z_{0})|$ and this contradicts that $|F(z)|$ attains its minimum value at $z_{0}$ in disk $|z| \leq R$. What we have proved so far is that for polynomial functions $F(z)$ it is always possible to decrease the modulus $|F(z)|$ by changing $z$ slightly. And hence polynomial functions must reach the least possible modulus $0$ and thus every polynomial equation has a root in the complex number system.

Since the above proof crucially depends on the solution of binomial equation it makes sense to see alternative approaches to the solution of binomial equations. It turns out binomial equations can be solved directly using the above method of proof. As discussed earlier it is sufficient to handle equations of the form $F(z) = z^{n} - c$ where $c$ is non zero complex number and $n$ is odd. Applying the above proof to this special binomial equation we can find a disk $|z| \leq R$ such that $|F(z)|$ attains its minimum value $m = |F(z_{0})|$ at an interior point of the disk $|z| \leq R$. And then we can write $F(z) = F(z_{0} + h) = F(z_{0}) + nz_{0}^{n - 1}h + \cdots$ so that if $z_{0} \neq 0$ then we have $b_{1} = nz_{0}^{n - 1} \neq 0$ and then the remaining part of the proof needs the solution of a linear equation.

If on the other $z_{0} = 0$ then we have $F(z) = F(h) = h^{n} - c$ and $F(z_{0}) = -c$. In this case if we set $h = \pm \delta, \pm i\delta$ where $\delta$ is a real number satisfying $0 < \delta < R$ then we can see that $F(z) = \pm \delta^{n} - c, \pm i\delta^{n} - c$ (note the use of odd integer $n$ here) and clearly one of these values of $F(z)$ is near to origin compared to $F(z_{0}) = -c$ (the point $-c$ moves closer to origin for a slight displacement in one of the directions left, right, up, down in Agrand diagram) and thus we see that we can get value of $z$ in interior of disk $|z| \leq R$ for which $|F(z)| < |F(z_{0})|$ and thus the above proof is completed. On the other hand if we are ready to use a little bit of trigonometry then using polar form representation and DeMoivre's Theorem we can easily find roots of binomial equation $z^{n} = c$.

The above proof based on the minimum value of $|F(z)|$ is the second proof presented in Hardy's Pure Mathematics. Hardy prefers to use polar form and DeMoivre's theorem to solve the binomial equations. Hardy also mentions that use of polar form can be avoided and the proof can be applied to binomial equations (as we have shown in last two paragraphs) directly and he attributes this particular approach for binomial equations to his friend and collaborator J. E. Littlewood.

For people familiar with the theory of analytic functions, it is easy to observe that by expressing $F(z) = F(z_{0} + h)$ in the form $F(z_{0}) + b_{1}h + \cdots + b_{n}h^{n}$ we have tried to expand $F(z)$ in a Taylor series about point $z_{0}$ and thereby shown that $F(z)$ is analytic. Also in the initial part we had to establish the existence of $R$ which can be a large number depending on the coefficients of the polynomial $F(z)$. Thus we effectively use the fact that a polynomial is an entire function (i.e. analytic in any finite portion of the complex plane). For entire functions this result can be generalized and it can be shown that any entire function takes every complex value except possibly one exceptional value. For example $e^{z}$ takes all values except $0$. This deep result goes by the name of Little Picard's Theorem.

Second Proof

The second proof is completely algebraic (in the sense of Modern/Abstract Algebra of groups/rings/fields) except for the use of the result which guarantees existence of real root of polynomial equations of odd degree with real coefficients.

First of all we can see that the roots of a quadratic equation $az^{2} + bz + c = 0$ are given by the formula $$z = \frac{-b \pm \sqrt{b^{2} - 4ac}}{2a}$$ irrespective of the fact whether $a, b, c$ are real or complex numbers. The existence of the square root $\sqrt{b^{2} - 4ac}$ is guaranteed by the square root formula given in the first proof above. Also it is clear that in this case there are two roots, say $\alpha, \beta$, such that $\alpha + \beta = -b/a, \alpha\beta = c/a$ and $az^{2} + bz + c = a(z - \alpha)(z - \beta)$. Conversely any $\alpha, \beta$ satisfying $\alpha + \beta = a, \alpha\beta = b$ are given as roots of the quadratic equation $z^{2} - az + b = 0$.

Now we will show that in order to prove the fundamental theorem of algebra it is sufficient to prove that any non-constant polynomial with real coefficients has a complex root. Let us then assume that every non-constant polynomial with real coefficients has a complex root. Let $f(z) = a_{0}z^{n} + a_{1}z^{n - 1} + \cdots + a_{n - 1}z + a_{n}$ be a polynomial with complex coefficients. Let $g(z) = \overline{a_{0}}z^{n} + \overline{a_{1}}z^{n - 1} + \cdots + \overline{a_{n - 1}}z + \overline{a_{n}}$ be the polynomial obtained from $f(z)$ by replacing the coefficients with their conjugates. Clearly the polynomial $h(z) = f(z)g(z)$ has real coefficients and hence has a complex root $\alpha$. Now $h(\alpha) = 0$ implies that $f(\alpha)g(\alpha) = 0$ so that either $f(\alpha) = 0$ or $g(\alpha) = 0$. If $f(\alpha) = 0$ then $f(z)$ has a complex root $\alpha$. On the other hand if $g(\alpha) = 0$ then \begin{align}&\overline{a_{0}}\alpha^{n} + \overline{a_{1}}\alpha^{n - 1} + \cdots + \overline{a_{n - 1}}\alpha + \overline{a_{n}} = 0\notag\\ &\Rightarrow \overline{\overline{a_{0}}\alpha^{n} + \overline{a_{1}}\alpha^{n - 1} + \cdots + \overline{a_{n - 1}}\alpha + \overline{a_{n}}} = 0\notag\\ &\Rightarrow a_{0}{\overline{\alpha}}^{n} + a_{1}{\overline{\alpha}}^{n - 1} + \cdots + a_{n - 1}\overline{\alpha} + a_{n} = 0\notag\\ &\Rightarrow f(\overline{\alpha}) = 0\notag\end{align} so that $\overline{\alpha}$ is a root of $f(z)$.

We now prove the result that every non-constant polynomial with real coefficients has a complex root. Let $f(z)$ denote such a polynomial of degree $n$. Let us write $n = 2^{k}m$ where $k \geq 0$ is an integer and $m$ is odd positive integer. We will use induction on $k$. Clearly for $k = 0$ the result holds as $n = m$ is odd and the coefficients of $f(z)$ are real. Now let us assume that the result holds whenever $n = 2^{k - 1}m'$ where $m'$ is any odd positive integer. It can be easily shown (though the formal argument is unnecessarily complicated and presented routinely in textbooks of Modern Algebra) that for any polynomial $f(z)$ over any field $F$ there is a field $L \supseteq F$ which contains all roots of $f(z)$. Thus for our polynomial $f(z)$ with real coefficients of degree $n = 2^{k}m$ we have a field $L \supseteq \mathbb{R}$ containing all the roots $z_{1}, z_{2}, \ldots, z_{n}$. The challenge is to prove that at least one of these roots lies in the field $\mathbb{C}$ of complex numbers.

For any real number $t$ let us define a polynomial (with coefficients in $L$) $$g_{t}(z) = \prod_{1\,\leq\, i\, <\, j\,\leq\, n}(z - z_{i} - z_{j} -tz_{i}z_{j})$$ and since any permutation of the roots $z_{1}, z_{2},\ldots, z_{n}$ does not change the polynomial $g_{t}(z)$ it follows that the coefficients of $g_{t}(z)$ are formed of elementary symmetric functions of $z_{1}, z_{2}, \ldots, z_{n}$. Since $z_{1}, z_{2}, \ldots, z_{n}$ are roots of an equation with real coefficients it follows their symmetric functions have real values. Therefore $g_{t}(z)$ has real coefficients. Now the degree of $g_{t}(z)$ is $n(n - 1)/2 = 2^{k - 1}m(n - 1) = 2^{k - 1}m'$ where $m' = m(n - 1)$ is odd.

By our induction hypothesis the polynomial $g_{t}(z)$ has a complex root $z_{i} + z_{j} + tz_{i}z_{j}$. Thus for each real number $t$ we have some pair $(i, j)$ such that $z_{i} + z_{j} + tz_{i}z_{j}$ is a complex number. Since the number of pairs $(i, j)$ is finite and the set of real numbers is infinite, we must have at least one pair $(i, j)$ mapped to two distinct real numbers $s$ and $t$. Thus for some pair $(i, j)$ we have both $z_{i} + z_{j} + sz_{i}z_{j}$ and $z_{i} + z_{j} + tz_{i}z_{j}$ as complex numbers. By subtraction it follows that $(s - t)z_{i}z_{j}$ is a complex number. It now follows that $z_{i}z_{j}$ is complex and hence $z_{i} + z_{j} = (z_{i} + z_{j} + sz_{i}z_{j}) - sz_{i}z_{j}$ is also a complex number. Now $z_{i}, z_{j}$ are roots of $z^{2} - (z_{i} + z_{j})z + z_{i}z_{j} = 0$ which has complex coefficients. By quadratic formula (as explained in beginning of the proof) both $z_{i}, z_{j}$ are complex numbers. We have thus found a complex root of $f(z)$ namely $z_{i}$. By induction the proof is now complete.

I must say that when I read this proof I was absolutely astounded at the algebraic nature of the proof. Before knowing this proof I strongly believed that the fundamental theorem could not be proved without the use of any serious analytical argument. Compared to the somewhat analytical proof presented earlier, I now believe that this proof is much more suitable for students who are not well versed in analysis and have basic training in algebraic rules of symbol manipulation.

Print/PDF Version

Fundamental Theorem of Algebra: Two Proofs

Introduction

First Proof

Second Proof

2 comments :: Fundamental Theorem of Algebra: Two Proofs

Post a Comment

Paramanand's Math Notes

About

Bibliography

Topics

Archives