In one of the earlier posts we indicated that Johann H. Lambert proved the irrationality of $\exp(x)$ or $e^{x}$ for non-zero rational $x$ by means of continued fraction expansion of $\tanh x$. In this post we provide another proof for irrationality of $e^{x}$ which is based on a completely different approach. I first read this proof from Carl Ludwig Siegel's wonderful book

This proof by Siegel is based on approximation of $\exp(x)$ via rational functions (i.e. ratio of polynomials). Such approximants are now famously known as Padé approximants. Here we will not develop the full theory of Padé approximation, but rather concentrate on obtaining one such approximation for $e^{x}$.

Since each polynomial $A(x), B(x)$ is of degree $n$ their determination involves calculating $2 (n + 1) = 2n + 2$ coefficients. The criterion that Taylor series for $E(x) = -A(x)/B(x)$ should match till first $(2n + 1)$ terms of the Taylor series for $e^{x}$ gives us $(2n + 1)$ linear equations to calculate these $(2n + 2)$ coefficients and hence we are assured of a non-trivial solution. Also there are more coefficients to be found than the number of equations available and hence there will be no unique solution. This is obvious directly also: if $A(x), B(x)$ do our job then any constant multiple of them say $rA(x), rB(x)$ also do the job because the ratio of these polynomials is unchanged. This constant $r$ will be fixed later (in an implicit manner) to suit our needs.

However we don't compute the coefficients of $A(x), B(x)$ by solving linear equations but rather follow Siegel's beautiful approach based on algebra of differential operators. Let us denote the process of calculating derivative by an operator $\textbf{D}$ so that $$\textbf{D}f(x) = \frac{d}{dx}f(x) = f'(x) \tag{2}$$ Multiple applications of this operator will be denoted via powers of $\textbf{D}$ so that $$\textbf{D}^{2}f(x) = \textbf{D}\{\textbf{D}f(x)\} = \textbf{D}f'(x) = f''(x)$$ We further define a polynomial in $\textbf{D}$ as an operator like $$P(\textbf{D}) = a_{0} + a_{1}\textbf{D} + \cdots + a_{n - 1}\textbf{D}^{n - 1} + a_{n}\textbf{D}^{n}$$ such that $$P(\textbf{D})f(x) = a_{0}f(x) + a_{1}f'(x) + \cdots + a_{n - 1}f^{(n - 1)}(x) + a_{n}f^{(n)}(x)\tag{3}$$ Since powers of $\textbf{D}$ commute with each other it is possible to multiply two such polynomial operators $P(\textbf{D}), Q(\textbf{D})$ to get a single polynomial operator via the usual rules of multiplying polynomials.

With the basics in differential operator symbolism available, we can now observe that $$\textbf{D}e^{x}f(x) = e^{x}f'(x) + e^{x}f(x) = e^{x}\{f(x) + f'(x)\} = e^{x}(1 + \textbf{D})f(x)$$ and using the same rule multiple times we get $$\textbf{D}^{n}e^{x}f(x) = e^{x}(1 + \textbf{D})^{n}f(x)\tag{4}$$ We now apply the operator $\textbf{D}^{n + 1}$ to equation $(1)$ and noting that $A(x)$ is a polynomial of degree $n$ we get \begin{align} \textbf{D}^{n + 1}R(x) &= \textbf{D}^{n + 1}e^{x}B(x) + \textbf{D}^{n + 1}A(x)\notag\\ &= e^{x}(1 + \textbf{D})^{n + 1}B(x)\notag\\ &= c_{0}x^{n} + \cdots\tag{5} \end{align} where $c_{0} = (2n + 1)(2n)(2n - 1)\cdots (n + 1)c$. It now follows that $$(1 + \textbf{D})^{n + 1}B(x) = e^{-x}\{c_{0}x^{n} + \cdots\} = c_{0}x^{n}$$ because the expression $(1 + \textbf{D})^{n + 1}B(x)$ is a polynomial of degree at most $n$. We now choose $c_{0} = 1$ to make our calculations simple (this will fix the $r$ implicitly and thereby polynomials $A(x), B(x)$ will be determined uniquely). We thus have $$B(x) = (1 + \textbf{D})^{- n - 1}x^{n}\tag{6}$$

This

The above technique of Siegel can also be used to establish the irrationality of $\pi^{2}$ (and hence irrationality of $\pi$) with very limited amount of additional work. This we do next.

Let's now suppose that $\pi^{2} = a/b$ where $a, b$ are positive integers with no common factor. It is then clear that $R(i\pi) = D(-\pi^{2})$ is a rational number with denominator $b^{k}$ and hence $b^{k}R(i\pi)$ is an integer and from our expression of $R(i\pi)$ as an integral we see that the expression $b^{k}R(i\pi)$ is a non-zero integer and hence $\left|b^{k}R(i\pi)\right| \geq 1$ for all positive integers $n$. From equation $(13)$ we can see that $$\left|b^{k}R(i\pi)\right| \leq b^{k}\cdot\frac{\pi^{2n + 1}}{n!} \leq b^{n/2}\cdot\frac{\pi^{2n + 1}}{n!}$$ and the RHS can be made less than $1$ if we choose $n$ sufficiently large. Hence we arrive at a contradiction if $n$ is chosen suitably. This proves that we can't have $\pi^{2} = a/b$ for any positive integers $a, b$.

*Transcendental Numbers*and I was amazed by the simplicity and novelty of Siegel's argument.This proof by Siegel is based on approximation of $\exp(x)$ via rational functions (i.e. ratio of polynomials). Such approximants are now famously known as Padé approximants. Here we will not develop the full theory of Padé approximation, but rather concentrate on obtaining one such approximation for $e^{x}$.

### Padé approximation for $\exp(x)$

The basic idea in approximating $\exp(x)$ by a rational function is to choose two polynomials $p(x), q(x)$ such that when the quotient $p(x)/q(x)$ is expanded as a Taylor series in powers of $x$ then it should match upto a pre-determined number of terms of the Taylor series for $\exp(x)$. Specifically we choose two polynomials $A(x), B(x)$ of degree $n$ such that $$\exp(x) + \frac{A(x)}{B(x)} = ax^{2n + 1} + \cdots$$ so that the Taylor series for the rational function $E(x) = -\dfrac{A(x)}{B(x)}$ matches till first $(2n + 1)$ terms of the Taylor series for $\exp(x)$. Also the error of this approximation is determined by $R(x) = B(x)e^{x} + A(x)$ and clearly we have $$R(x) = B(x)e^{x} + A(x) = cx^{2n + 1} + \cdots\tag{1}$$ Our main objective is to determine these polynomials $A(x), B(x)$ such that they have*integer coefficients*and also to obtain effective bounds for the error $R(x)$.Since each polynomial $A(x), B(x)$ is of degree $n$ their determination involves calculating $2 (n + 1) = 2n + 2$ coefficients. The criterion that Taylor series for $E(x) = -A(x)/B(x)$ should match till first $(2n + 1)$ terms of the Taylor series for $e^{x}$ gives us $(2n + 1)$ linear equations to calculate these $(2n + 2)$ coefficients and hence we are assured of a non-trivial solution. Also there are more coefficients to be found than the number of equations available and hence there will be no unique solution. This is obvious directly also: if $A(x), B(x)$ do our job then any constant multiple of them say $rA(x), rB(x)$ also do the job because the ratio of these polynomials is unchanged. This constant $r$ will be fixed later (in an implicit manner) to suit our needs.

However we don't compute the coefficients of $A(x), B(x)$ by solving linear equations but rather follow Siegel's beautiful approach based on algebra of differential operators. Let us denote the process of calculating derivative by an operator $\textbf{D}$ so that $$\textbf{D}f(x) = \frac{d}{dx}f(x) = f'(x) \tag{2}$$ Multiple applications of this operator will be denoted via powers of $\textbf{D}$ so that $$\textbf{D}^{2}f(x) = \textbf{D}\{\textbf{D}f(x)\} = \textbf{D}f'(x) = f''(x)$$ We further define a polynomial in $\textbf{D}$ as an operator like $$P(\textbf{D}) = a_{0} + a_{1}\textbf{D} + \cdots + a_{n - 1}\textbf{D}^{n - 1} + a_{n}\textbf{D}^{n}$$ such that $$P(\textbf{D})f(x) = a_{0}f(x) + a_{1}f'(x) + \cdots + a_{n - 1}f^{(n - 1)}(x) + a_{n}f^{(n)}(x)\tag{3}$$ Since powers of $\textbf{D}$ commute with each other it is possible to multiply two such polynomial operators $P(\textbf{D}), Q(\textbf{D})$ to get a single polynomial operator via the usual rules of multiplying polynomials.

With the basics in differential operator symbolism available, we can now observe that $$\textbf{D}e^{x}f(x) = e^{x}f'(x) + e^{x}f(x) = e^{x}\{f(x) + f'(x)\} = e^{x}(1 + \textbf{D})f(x)$$ and using the same rule multiple times we get $$\textbf{D}^{n}e^{x}f(x) = e^{x}(1 + \textbf{D})^{n}f(x)\tag{4}$$ We now apply the operator $\textbf{D}^{n + 1}$ to equation $(1)$ and noting that $A(x)$ is a polynomial of degree $n$ we get \begin{align} \textbf{D}^{n + 1}R(x) &= \textbf{D}^{n + 1}e^{x}B(x) + \textbf{D}^{n + 1}A(x)\notag\\ &= e^{x}(1 + \textbf{D})^{n + 1}B(x)\notag\\ &= c_{0}x^{n} + \cdots\tag{5} \end{align} where $c_{0} = (2n + 1)(2n)(2n - 1)\cdots (n + 1)c$. It now follows that $$(1 + \textbf{D})^{n + 1}B(x) = e^{-x}\{c_{0}x^{n} + \cdots\} = c_{0}x^{n}$$ because the expression $(1 + \textbf{D})^{n + 1}B(x)$ is a polynomial of degree at most $n$. We now choose $c_{0} = 1$ to make our calculations simple (this will fix the $r$ implicitly and thereby polynomials $A(x), B(x)$ will be determined uniquely). We thus have $$B(x) = (1 + \textbf{D})^{- n - 1}x^{n}\tag{6}$$

*What's that!!*We haven't yet defined negative powers of differential operators so the above equation seems meaningless, but the beauty of this operator algebra is that using binomial theorem for general exponent we can express $(1 + \textbf{D})^{-n - 1}$ into a power series in $\textbf{D}$ and apply it on $x^{n}$ to get $B(x)$. Also we need to have this series only till $\textbf{D}^{n}$ because higher powers of $\textbf{D}$ applied on $x^{n}$ will lead to $0$. Hence $B(x)$ will have integer coefficients.This

*highly intuitive but non-rigorous*argument is very clever and can be made rigorous by learning more of operator algebra. However we don't follow that route and show directly that $B(x)$ has integer coefficients. Also when we proceed in this manner we need to remember that equation $(6)$ is just another way to express that $(1 + \textbf{D})^{n + 1}B(x) = x^{n}$. In the same manner multiplying equation $(1)$ by $e^{-x}$ and applying $\textbf{D}^{n + 1}$ on the resulting equation we get $$(-1 + \textbf{D})^{n + 1}A(x) = x^{n}$$ which we express more fashionably as $$A(x) = (-1 + \textbf{D})^{-n - 1}x^{n}\tag{7}$$ We now proceed to analyze the coefficients of $B(x)$. Let $$B(x) = B_{0} + B_{1}x + \cdots + B_{n}x^{n}$$ and from $(1 + \textbf{D})^{n + 1}B(x) = x^{n}$ we get $$\left(1 + (n + 1)\textbf{D} + \frac{n(n + 1)}{2}\textbf{D}^{2} + \cdots + (n + 1)\textbf{D}^{n}\right)(B_{0} + B_{1}x + \cdots + B_{n}x^{n}) = x^{n}$$ Clearly the term containing $x^{n}$ on LHS is $B_{n}x^{n}$ so that $B_{n} = 1$. Similarly the terms containing $x^{n - 1}$ on LHS are $B_{n - 1}x^{n - 1}$ and $(n + 1)\textbf{D}B_{n}x^{n}$ so that $B_{n - 1} + n(n + 1)B_{n} = 0$ and hence $B_{n - 1} = -n(n + 1)$. Thus we note that the coefficients $B_{i}$ can be calculated starting with $B_{n}$ and then evaluating $B_{n - 1}, B_{n - 2},\dots$ and so on. Also the equation to determine $B_{i}$ (except $B_{n}$) is always of the form $$B_{i} + b_{1}B_{i + 1} + \cdots + b_{n - i}B_{n} = 0$$ where $b_{1}, b_{2}, \ldots, B_{i + 1}, B_{i + 2}, \ldots$ are integers. It follows that all the $B_{i}$ are integers. In the same manner we can show that the polynomial $A(x)$ also has integer coefficients.### Estimation of error term $R(x)$

From equations $(5)$ and $(6)$ we can see that $$\textbf{D}^{n + 1}R(x) = e^{x}(1 + \textbf{D})^{n + 1}B(x) = e^{x}x^{n}$$ and hence $R(x) = \textbf{D}^{-n - 1}e^{x}x^{n}$. Fortunately it is much easier to handle negative powers of $\textbf{D}$ than to handle expressions like $(1 + \textbf{D})^{-n - 1}x^{n}$ encountered earlier. We define the integral operator $\textbf{J}$ as $$\textbf{J}f(x) = \int_{0}^{x}f(t)\,dt\tag{8}$$ It is easily seen that $\textbf{D}\textbf{J}f(x) = f(x)$ and if we have the extra assumption that $f(0) = 0$ then $\textbf{J}\textbf{D}f(x) = f(x)$ and hence for functions with $f(0) = 0$ both the operators $\textbf{D}$ and $\textbf{J}$ are inverses of each other and we can write $\textbf{D}\textbf{J} = \textbf{J}\textbf{D} = 1$. Clearly we can define powers of $\textbf{J}$ as repeated application of $\textbf{J}$ and powers of $\textbf{J}$ and $\textbf{D}$ commute with each other. Thus it follows that $$R(x) = \textbf{J}^{n + 1}e^{x}x^{n}\tag{9}$$ Using integration by parts it can be easily shown that powers of $\textbf{J}$ can also be expressed as an integral and we have $$\textbf{J}^{n + 1}f(x) = \frac{1}{n!}\int_{0}^{x}f(t)(x - t)^{n}\,dt\tag{10}$$ where $f(0) = 0$. Using equations $(9), (10)$ we finally have an expression for the error term $R(x)$ as \begin{align} R(x) &= \frac{1}{n!}\int_{0}^{x}e^{t}t^{n}(x - t)^{n}\,dt\notag\\ &= \frac{x^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}e^{xt}\,dt\text{ (using }t = xu, u = t)\tag{11} \end{align} From the above equation it follows that if $x \neq 0$ then $R(x) \neq 0$.### Irrationality of $\exp(x)$

In order to prove that $e^{x}$ is irrational for non-zero rational $x$ it is sufficient to prove that $e^{x}$ is irrational when $x$ is a positive integer say $x = m$. Let us assume that $e^{m} = p/q$ where $p, q$ are positive integers with no common factors. Now from equation $(1)$ we have $$R(m) = e^{m}B(m) + A(m) = \frac{p}{q}\cdot B(m) + A(m)$$ and hence $$qR(m) = pB(m) + qA(m)$$ Since polynomials $A(x), B(x)$ have integer coefficients it follows that the RHS of the above equation is an integer and since $m \neq 0$ it follows that $R(m) \neq 0$ therefore $qR(m)$ is a non-zero integer and hence $|qR(m)| \geq 1$. Now we can see from equation $(11)$ that $$|qR(m)| \leq q\cdot\frac{m^{2n + 1}}{n!}\cdot e^{m} = p \cdot\frac{m^{2n + 1}}{n!}$$ Clearly we can choose the integer $n$ as large as we please (increasing $n$ increases the accuracy of Padé approximation) and since $m^{2n + 1}/n! \to 0$ as $n \to \infty$ it is possible to choose a positive integer $n$ (depending on $m, p$) such that $p \cdot\dfrac{m^{2n + 1}}{n!} < 1$ so that $|qR(m)| < 1$. This is contrary to the fact that $|qR(m)| \geq 1$ and hence $e^{m}$ must be irrational. We have thus shown that**Theorem:***If $x$ is a non-zero rational number then $e^{x}$ is irrational*.The above technique of Siegel can also be used to establish the irrationality of $\pi^{2}$ (and hence irrationality of $\pi$) with very limited amount of additional work. This we do next.

### Irrationality of $\pi^{2}$

In order to use the material presented so far to get any information about nature of $\pi$ we must be able to connect $\pi$ somehow with the exponential function $e^{x}$. Fortunately Euler did it for us a long time ago and we have the beautiful equation $$e^{i\pi} + 1 = 0\tag{12}$$ Putting $x = i\pi$ in equation $(11)$ we get \begin{align} R(i\pi) &= (-1)^{n}i\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}e^{i\pi t}\,dt\notag\\ &= (-1)^{n}i\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\{\cos \pi t + i\sin \pi t\}\,dt\notag\\ &= (-1)^{n}i\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\cos \pi t\,dt\notag\\ &\,\,\,\,\,\,\,\,+ (-1)^{n + 1}\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\sin \pi t\,dt\notag\\ &= 0 + (-1)^{n + 1}\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\sin \pi t\,dt\notag\\ &\,\,\,\,\,\,\,\,\text{ (as }\cos \pi(1 - t) = -\cos \pi t)\notag\\ &= (-1)^{n + 1}\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\sin \pi t\,dt\tag{13} \end{align} It follows that $R(i\pi)$ is a non-zero real number for all positive integers $n$. Next we need to analyze the expression of $R(i\pi)$ in terms of polynomials $A(x), B(x)$. Before we do that we need one relation between $A(x), B(x)$. Replacing $x$ with $(-x)$ in the equation $$(1 + \textbf{D})^{n + 1}B(x) = x^{n}$$ and noting that this changes $\textbf{D}$ into $-\textbf{D}$ as well we get $$(1 - \textbf{D})^{n + 1}B(-x) = (-1)^{n}x^{n}$$ or $$(-1)^{n + 1}(-1 + \textbf{D})^{n + 1}B(-x) = (-1)^{n}x^{n}$$ or $$(-1 + \textbf{D})^{n + 1}(-B(-x)) = x^{n}$$ Comparing with the equation $(-1 + \textbf{D})^{n + 1}A(x) = x^{n}$ we see that $B(-x) = -A(x)$ and hence $A(-x) = -B(x)$. Next we have \begin{align} R(x) &= A(x) + B(x)e^{x}\notag\\ R(-x) &= A(-x) + B(-x)e^{-x}\notag\\ &= -B(x) - A(x)e^{-x}\notag \end{align} and hence on putting $x = i\pi$ we get \begin{align} R(i\pi) &= A(i\pi) - B(i\pi)\notag\\ R(-i\pi) &= -B(i\pi) + A(i\pi)\notag \end{align} and we finally have $$R(i\pi) = R(-i\pi) = A(i\pi) + A(-i\pi)$$ It is easy to observe that the polynomial $C(x) = A(x) + A(-x)$ consists of only even powers of $x$ and hence it is effectively a polynomial $D(x^{2})$ in $x^{2}$ of degree $k = [n/2]$ with integer coefficients. And $R(i\pi) = D(-\pi^{2})$ which again shows that $R(i\pi)$ is a real number.Let's now suppose that $\pi^{2} = a/b$ where $a, b$ are positive integers with no common factor. It is then clear that $R(i\pi) = D(-\pi^{2})$ is a rational number with denominator $b^{k}$ and hence $b^{k}R(i\pi)$ is an integer and from our expression of $R(i\pi)$ as an integral we see that the expression $b^{k}R(i\pi)$ is a non-zero integer and hence $\left|b^{k}R(i\pi)\right| \geq 1$ for all positive integers $n$. From equation $(13)$ we can see that $$\left|b^{k}R(i\pi)\right| \leq b^{k}\cdot\frac{\pi^{2n + 1}}{n!} \leq b^{n/2}\cdot\frac{\pi^{2n + 1}}{n!}$$ and the RHS can be made less than $1$ if we choose $n$ sufficiently large. Hence we arrive at a contradiction if $n$ is chosen suitably. This proves that we can't have $\pi^{2} = a/b$ for any positive integers $a, b$.

**Note:**For another proof of irrationality of $e^{x}$ for non-zero rational $x$ see this question on MSE.**Print/PDF Version**
Nice post and amazing proofs!

Anonymous

October 28, 2015 at 2:50 PMDo you know of any book on transcendental analysis?

Unknown

November 10, 2015 at 11:35 AM@Sayan Chattopadhyay,

There are many books available on Transcendental Numbers which you can find by Google Search. The better ones among them are by Baker, Siegel. I don't know if these are what you want.

Paramanand

Paramanand

November 10, 2015 at 12:40 PM