Paramanand's Math Notes

Ramanujan's take on Chudnovsky series for 1/π(PI): Part 2

2026-01-03T23:49:00.014+05:30

In the previous post we have handled the evaluation of $P_n=P(-e^{-\pi\sqrt{n}}) $ for $n=11,27$. We will evaluate $P_n$ for $n=19,35$ in the current post and also discuss an empirical approach for $n=43,67,163$. Finally we will use the information in table given by Ramanujan to obtain certain series for $1/\pi$ (including the famous one by Chudnovsky brothers).

Evaluation of $P_n=P(-e^{-\pi\sqrt{n}}) =P(-q) $

To handle the case $n=19$ we start with a complicated modular equation of degree $19$ $$(\sqrt{kl} +\sqrt {k'l'} - 1)^5+ 112(\sqrt{kl}+\sqrt{k'l'}-1)^2\sqrt{klk'l'}\\-256(\sqrt{klk'l'}-\sqrt{kl} - \sqrt{k'l'})\sqrt{klk'l'} =0$$ Putting $l=k', k=l'$ we get $$(2\sqrt{kk'}-1)^5+112(2\sqrt{kk'}-1)^2kk'-256kk'(kk'-2\sqrt{kk'})=0$$ Putting $a=\sqrt{2}G_{19}^{-6}=2\sqrt{kk'}$ we get $$(a-1)^5+28(a-1)^2a^2-16a^4+64a^3=0$$ or $$a^5+7a^4+18a^3+18a^2+5a-1=0$$ Luckily $a=-1$ turns out to be a root and hence there is a factor $(a+1)$ involved here. We then have $$(a+1)(a^4+6a^3+12a^2+6a-1)=0$$ Since $a>0$ we have $$a^4+6a^3+12a^2+6a-1=0$$ Again we get a factor of $(a+1)$ and then $$(a+1)(a^3+5a^2+7a-1)=0$$ Since $a>0$ it turns out that $a=\sqrt{2}G_{19}^{-6}$ is a root of $$x^3+5x^2+7x-1\tag{1}$$ and the above polynomial is irreducible as neither $1$ nor $-1$ is a root.

Next we need the formula given by Ramanujan as $$nP(q^{2n})-P(q^2)=\frac{24KL}{\pi^2}\left\{1+kl+k'l'+\sqrt{kl}+\sqrt{k'l'}-\sqrt{klk'l'}\right\}\tag{2} $$ where $n=19$ and $k$ is of degree $19$ over $l$ so that nome $q$ corresponds to $l$ and $q^n$ to $k$. Putting $q=e^{-\pi/\sqrt{19}}$ so that $q^n=e^{-\pi\sqrt{19}}$ and $k=l', k'=l$ we get $$nP(q^{2n})-P(q^2)=6\left(\frac{2K}{\pi}\right)^2\frac{K'}{K}(1+kk'+2\sqrt{kk'})$$ or $$nP(q^{2n})-P(q^2)=6\sqrt{19}\left(\frac{2K}{\pi}\right)^2(1+(a^2/4)+a)=\frac{3\sqrt{19}}{2}\left(\frac{2K}{\pi}\right)^2(a+2)^2$$ Combining the above equation with the identity $$nP(q^{2n})+P(q^2)=\frac{6\sqrt{n}}{\pi}, q=e^{-\pi/\sqrt {n}}\tag{3} $$ we get $$2P(q^{2n})=\frac{6}{\pi\sqrt{n}}+\frac{3}{2\sqrt{19}}\left(\frac{2K}{\pi}\right)^2(a+2)^2$$ Switching notation such that $k, K$ correspond to nome $q=e^{-\pi\sqrt {19}}$ we get $$2P(q^2)-\frac{6}{\pi\sqrt {n}} =\frac{3}{2\sqrt{19}}\left(\frac{2K}{\pi}\right) ^2(a+2)^2$$ Using the identity $$2P(q^2)-P(-q)=\left(\frac{2K}{\pi}\right)^2(1-2k^2)=\left(\frac{2K}{\pi}\right)^2\sqrt{1-G^{-24}}\tag{4}$$ we get $$\sqrt{n} P(-q) - \frac{6}{\pi}=\left(\frac{2K}{\pi}\right)^2\frac{3(a+2)^2-\sqrt {19(4-a^4)}}{2}$$ And then using identity $$Q(-q) = \left(\frac{2K}{\pi}\right)^4(1-4G^{-24})$$ we get $$\frac{1}{\sqrt{Q_n}}\left (\sqrt{n}P_n-\frac{6}{\pi}\right) =\frac{3(a+2)^2-\sqrt{19(4-a^4)}} {2\sqrt{1-a^4}}\tag{5}$$ To simplify the expression on right we evaluate some integral powers of $a$ \begin{align} a^3&=-5a^2-7a+1\notag\\ a^4&=-5a^3-7a^2+a=18a^2+36a-5\notag\\ \frac{1}{a}&=a^2+5a+7\notag\\ \frac{1}{a^2}&=a+5+\frac{7}{a}=7a^2+36a+54\notag \end{align} We can observe that $$\frac{1}{a^2}=a^2+6(a^2+6a+9)=a^2+6(a+3)^2$$ and thus on multiplying by $a^2$ we get $$1-a^4=6a^2(a+3)^2$$ ie $$\sqrt{1-a^4}=a(a+3)\sqrt{6}$$ Further $$4-a^4=9(1-4a-2a^2)$$ and hence we can write the expression on right of equation $(5)$ as $$\frac{3}{2}\cdot\frac{(a+2)^2-\sqrt{19(1-4a-2a^2)}}{a(a+3)\sqrt{6}}$$ and this equals the proposed value $\sqrt{6}$ if $$\sqrt{19(1-4a-2a^2)}=(a+2)^2-4a(a+3)=-3a^2-8a+4$$ Using numerical value of $a=0.16071$ we can confirm that the right hand side of above equation is positive and the above equation will hold true if the squares of both sides are equal ie if $$19(1-4a-2a^2)=(3a^2+8a-4)^2$$ or $$19(1-4a-2a^2)=9a^4+64a^2+16+48a^3-24a^2-64a$$ or $$3-12a-78a^2=9a^4+48a^3$$ or $$1-4a-26a^2=3a^4+16a^3$$ or $$1-4a-26a^2=3(18a^2+36a-5)+16(-5a^2-7a+1)$$ or $$1-4a-26a^2=1-4a-26a^2$$ which is true.

The case $n=35$ is a bit complicated as I haven't been able to find a suitable modular equation of degree $35$ which can be used to evaluate the class invariant $G_{35}$. Instead I use the table by Ramanujan to conclude that $G_{35}^8$ is a root of $$x^3-(60+28\sqrt {5})x^2-4\tag{6}$$ and its numerical value is around $122.61017$.

To simplify typing we use the symbol $a=\sqrt {5}$ in handling the case $n=35$. Next we can note that $b=G_{35}^4$ is a root of $$x^6-(60+28a)x^4-4$$ and the above polynomial can be factored as $$(x^3-(6+2a)x^2-2(1+a)x-2)(x^3+(6+2a)x^2-2(1+a)x+2)$$ in polynomial ring $\mathbb{Q} (a) [x] $. The above factorization was done using Magma calculator online and one can verify it manually. The first factor has a positive root and thus $b=G_{35}^4$ is a root of $$x^3-(6+2a)x^2-2(1+a)x-2\tag{7}$$ Replacing $x$ with $x^2$ doesn't lead to any further factorization and thus $G_{35}^2$ is of degree $6$ over $\mathbb{Q} (a) $.

Luckily we can note that $2b$ is a root of $$x^3-(12+4a)x^2-(8+8a)x-16$$ and replacing $x$ with $x^2$ gives us the factorization (again via Magma) $$(x^3-4x^2+2(1-a)x-4)(x^3+4x^2+2(1-a)x+4)$$ with first factor having a positive root and thus $\sqrt {2}G_{35}^2$ is a root of $$x^3-4x^2+2(1-a)x-4$$ It follows that $c=G_{35}^2$ is a root of $$x^3-2\sqrt{2}x^2+(1-a)x-\sqrt{2}$$ and hence $$\sqrt {2}=\frac{c(c^2+1-a)}{2c^2+1}=\frac{c(b+1-a)}{2b+1}$$ as $b=c^2$. Therefore $$\frac{\sqrt{2}} {c} =\sqrt{\frac{2} {b}}=\frac{b+1-a}{2b+1}\tag{8}$$ is a rational function of $b$ with coefficients in $\mathbb{Q} (a) $ and this fact will be used later.

We also need to evaluate the expression $1/(2b+1)$ as a polynomial in $b$. While one can use Magma for this, the verification by hand is bit lengthy and hence we use algebra directly. We note that $(2b+1)$ is a root of $$(x-1)^3-(12+4a)(x-1)^2-(8+8a)(x-1)-16 $$ or $$x^3-(15+4a)x^2+19x-(21-4a)$$ and hence $$\frac{21-4a}{2b+1}=(2b+1)^2-(15+4a)(2b+1)+19=4b^2-(26+8a)b+5-4a$$ and noting that $$(21-4a)(21+4a)=19^2 $$ we get $$\frac{1}{2b+1}=\frac{1}{19^2}((84+16a)b^2-(706+272a)b+25-64a)$$ Next we evaluate the expression $\sqrt{2/b}=(b+1-a)/(2b+1)$ and this turns out to be simpler in form than $1/(2b+1)$ and we have $$\sqrt{\frac{2}{b}}=\frac{b+1-a}{2b+1}=\frac{1}{19}(-2(1+2a)b^2+(53+30a)b+3(9-a))\tag{9}$$
We now use the formula given by Ramanujan $$nP(q^{2n})-P(q^2)=\frac{4KL}{\pi^2}\left\{2(\sqrt{kl}+\sqrt{k'l'}-\sqrt {klk'l'}) +(4klk'l')^{-1/6}(1-\sqrt{kl}-\sqrt{k'l'})^3\right \}$$ for $n=35$. Here we assume $k$ to be of degree $35$ over $l$ with $q$ being nome corresponding to $l$ and $q^n$ the nome corresponding to $k$. Putting $q=e^{-\pi/\sqrt {35}}$ so that $q^n=e^{-\pi\sqrt{35}}$ and $k=l', l=k'$ and $L/K=K'/K=\sqrt {35}$ we get $$nP(q^{2n})-P(q^2)=\sqrt {35}\left (\frac{2K}{\pi}\right) ^2\left(2(2\sqrt {kk'} - kk') +(4k^2k'^2)^{-1/6}(1-2\sqrt{kk'})^{3}\right)$$ Using $2kk'=G_{35}^{-12}$ we get $$nP(q^{2n})-P(q^2)=\sqrt {35}\left(\frac{2K}{\pi}\right)^2\left(2(\sqrt{2}G_{35}^{-6}-G_{35}^{-12}/2)+G_{35}^4(1-\sqrt{2}G_{35}^{-6})^3\right)$$ We now have $$\sqrt{2}G_{35}^{-6}=\frac{\sqrt {2}}{G_{35}^2\cdot G_{35}^4}=\frac{1}{b}\sqrt{\frac{2}{b}}$$ and then $$nP(q^{2n})-P(q^2)=\sqrt {35}\left(\frac{2K}{\pi}\right)^2\left\{2\left(\frac{1}{b}\sqrt{\frac{2}{b}}-\frac{1}{2b^3}\right)+b\left(1- \frac{1}{b}\sqrt{\frac{2}{b}} \right)^3\right\}$$ Combining this with identity $$nP(q^{2n})+P(q^2)=\frac{6\sqrt {n}} {\pi}, q=e^{-\pi/\sqrt{n}} $$ we get $$2P(q^{2n})=\frac{6}{\pi\sqrt{n}}+\frac{1}{\sqrt{35}}\left(\frac{2K}{\pi}\right)^2\left\{2\left(\frac{1}{b}\sqrt{\frac{2}{b}}-\frac{1}{2b^3}\right)+b\left(1-\frac{1}{b}\sqrt{\frac{2}{b}}\right)^3\right\}$$ Changing the notation a bit so that $q=e^{-\pi\sqrt{35}}$ corresponds to $k$ we have $$2P(q^2)=\frac{6}{\pi\sqrt{n}}+\frac{1}{\sqrt{35}}\left(\frac{2K}{\pi}\right)^2\left\{2\left(\frac{1}{b}\sqrt{\frac{2}{b}}-\frac{1}{2b^3}\right)+b\left(1-\frac{1}{b}\sqrt{\frac{2}{b}}\right)^3\right\}$$ Using the identity $$2P(q^2)-P(-q)=\left(\frac{2K}{\pi}\right)^2(1-2k^2)=\left(\frac{2K}{\pi}\right)^2\sqrt{1-G^{-24}}$$ we get $$\sqrt{n} P(-q)-\frac{6}{\pi}=\frac{1}{b^3}\left(\frac{2K}{\pi}\right)^2\left\{2b^2\sqrt{\frac{2}{b}}-1+b\left(b-\sqrt{\frac{2}{b}}\right)^3-\sqrt{35(b^6-1)}\right\}$$ and finally using $$Q(-q) =\left(\frac{2K}{\pi}\right)^4(1-4G^{-24})$$ we get $$\frac{1}{\sqrt{Q_n}}\left(\sqrt{n}P_n-\frac{6}{\pi}\right)=\frac{1}{\sqrt{b^6-4}}\left\{2b^2\sqrt{\frac{2}{b}}-1+b\left(b-\sqrt{\frac{2}{b}}\right)^3-\sqrt{35(b^6-1)}\right\}$$ We have to show that this expression equals $$(2+\sqrt{5})\sqrt{\frac{2}{\sqrt{5}}}$$ Using equation $(6)$ we have $$b^6-4=(60+28a)b^4$$ and $$\sqrt{(60+28a)(2/a)}=\sqrt{56+24a}=6+2a$$ and thus we have to establish that $$2b^2\sqrt{\frac{2}{b}}-1+b\left(b-\sqrt{\frac{2}{b}}\right)^3-\sqrt{35(b^6-1)}=(2+a) (6+2a)b^2=(22+10a)b^2$$ or $$2b^2\sqrt{\frac{2}{b}}+b\left(b-\sqrt{\frac{2}{b}}\right)^3-(22+10a)b^2-1=\sqrt{35(b^6-1)}$$ We have $$b-\sqrt{\frac{2}{b}}=\frac{1}{19}(2(1+2a)b^2-2(17+15a)b-3(9-a))$$ and $$2b^2\sqrt{\frac{2} {b}} =\frac{1}{19}(2(9-a)b^2+4(9-a)b+4(1+2a))$$ Next we have $$\left(b-\sqrt{\frac{2}{b}} \right) ^2=\frac{1}{19}(4(9-a)b^2-8(10+a)b-2(15+11a))$$ and $$\left(b-\sqrt{\frac{2}{b}}\right) ^3=\frac{4}{19}((15+11a)b^2+2(15+11a)b+2(8-3a))$$ and $$b\left(b-\sqrt{\frac{2}{b}}\right) ^3=\frac{8}{19}((115+59a)b^2+(78+23a)b+(15+11a))$$ Using these expressions above we get $$2b^2\sqrt{\frac{2}{b}}+b\left(b-\sqrt{\frac{2}{b}}\right)^3-(22+10a)b^2-1\\=\frac{1}{19}((520+280a)b^2+(660+180a)b+(105+96a))$$ The expression on right is a positive real number (because $a, b$ are positive as well) and hence our job is done if we show that square of this number equals $$35(b^6-1)=35((22+10a)b^4+3)\\=35((7120+3184a)b^2 + (4280+1912a)b + 1283+576a) $$ I haven't verified this using hand calculation and relied on Magma for the same. All other calculations dealing with $a, b$ have been verified both via pen / paper and Magma. One should also note that Magma's function Sqrt when applied to $35(b^6-1)$ gives the negative square root. This is because number field calculations are not treated as calculations in an ordered field and hence if one wishes to use such functions one should try both positive and negative values and if needed check using numerical evaluation of the expressions as well.

The Magma code below creates the field extensions $K=\mathbb{Q} (a), L=K(b) $ and then one can evaluate expressions in field $L$ as polynomials in $b$ with coefficients in $K$. The code can be run using online Magma calculator.

R<x>:= PolynomialRing(Integers()) ;
f := x^2-5;
K<a> := NumberField(f) ;
T<y> := PolynomialRing(K) ;
Factorization(y^6-(60+28*a)*y^4-4);
g := y^3-(6+2*a)*y^2-2*(1+a)*y-2;
Factorization(y^6-(12+4*a)*y^4-(8+8*a)*y^2-16);
L<b> := ext<K|g>;
Sqrt(2/b);
(b+1-a)/(2*b+1);
2*b^2*Sqrt(2/b)+b*(b-Sqrt(2/b))^3-(22+10*a)*b^2-1;
Sqrt(35*(b^6-1));

More generally if there is a formula of type $$nP(q^{2n})-P(q^2)=\frac {4KL}{\pi^2} \cdot A_n(k, l) $$ then $$\frac{1}{\sqrt{Q_n}}\left(\sqrt{n} P_n-\frac{6}{\pi}\right)=\frac{G_n^{12}\cdot A_n(k, k') - \sqrt {n(G_n^{24}-1)}} {\sqrt{G_n^{24}-4} } $$ Ramanujan also gave formulas of type $$nP(-q^n) - P(-q) =\frac{4KL}{\pi^2}\cdot B_n(k,l)$$ for some odd positive integer values of $n$ and if such a formula is available then $$\frac{1}{\sqrt{Q_n}}\left(\sqrt{n} P_n-\frac{6}{\pi}\right)=\frac{G_n^{12}\cdot B_n(k, k') } {2\sqrt {G_n^{24}-4} }$$
There is also an empirical approach which works nicely for $n=11,19,43,67,163$. For each of these values of $n$ one can show that $G_n^8$ is a root of the equation $$x^3-u_nx^2-4$$ where $u_n$ is a positive integer. More specifically we have $$u_n=8,24,240,1320,160080$$ for the values of $n$ being considered respectively. And then we get $$\sqrt {G_n^{24}-4}=\sqrt {u_n} G_n^8$$ And one can check that $$d_n=\sqrt{\frac{u_n}{Q_n}} \left(\sqrt{n} P_n-\frac{6}{\pi}\right)$$ is a positive integer as well. Thus one can try to evaluate the above expression numerically and take the nearest integer to this numerical value as $d_n$ and make an empirical claim that $$\frac{1}{\sqrt{Q_n}}\left(\sqrt{n} P_n-\frac{6}{\pi}\right)=\frac{d_n}{\sqrt{u_n}}$$ Further for $n=43,67,163$ the expressions $P_n, Q_n$ are practically equal to $1$ and it is sufficient to evaluate just $$\sqrt{u_n} \left(\sqrt{n} - \frac{6}{\pi}\right)$$ and take $d_n$ to be nearest integer. The empirical claim however needs to be verified analytically as we have done above.

Let us further note that $G_n^4$ is a root of $$x^6-u_nx^4-4$$ and this can be factored into two cubics as $$(x^3+px^2+qx-2)(x^3-px^2+qx+2)$$ and using a bit of algebra one can evaluate $p, q$. More spefically if $v_n$ is a root of $$x^4-32x-16u_n$$ then $G_n^4$ is a root of $$x^3-(v_n^2/4)x^2+v_nx-2$$ For the values of $n$ being considered here $v_n$ turns out to be an even integer and then $$\sqrt{G_n^{12}-1} =\pm((v_n/2)G_n^4-1)$$ where the $\pm$ sign is same as that of $v_n$. For completeness sake let us note that $$v_n=4,-4, 8,-12,-40$$ for the above mentioned values of $n$ respectively.

For the specific case of $n=35$ both $u_n, v_n$ turn out to be algebraic integers from $\mathbb{Q} (\sqrt{5})$ with $$u_n=60+28\sqrt{5},v_n=-2(1+\sqrt{5})$$ Also when we evaluate the expression $$\sqrt{60+28\sqrt{5}}\left(\sqrt{35}-\frac{6}{\pi}\right)$$ numerically we get $44.3606214$ and the first four decimal digits match those of $10\sqrt{5}$. Subtracting $10\sqrt{5}$ from the above result we get $21.9999416$ and thus the expected value of the expression $$\frac{1}{\sqrt{Q_n}}\left(\sqrt{n} P_n-\frac{6}{\pi}\right)$$ is $$\frac{22+10\sqrt{5}}{\sqrt{60+28\sqrt{5}}}=(2+\sqrt{5})\sqrt{\frac{2}{\sqrt{5}}}$$ and we can verify this analytically as mentioned earlier.

While evaluating $P_n$ we have to deal with expressions like $$\sqrt{b^6-4},\sqrt{n(b^6-1)}$$ where $b=G_n^4$. We have mentioned earlier that $$\sqrt{b^6-4}=\sqrt{u_n}b^2,\sqrt{b^3-1}=\operatorname{sgn}(v_n)((v_n/2)b-1)$$ and it turns out that the expression $\sqrt{n(b^3+1)}$ also lies in $\mathbb{Q} (b) $ so that $\sqrt{n(b^6-1)}\in\mathbb{Q}(b)$ and this strange coincidence helps us verify the exact radical expression for $P_n$.

$P(-q)$ and Chudnovsky series for $1/\pi$

The values of $P_n=P(-q),Q_n=Q(-q),R_n=R(-q),q=e^{-\pi\sqrt{n}}$ in the table given by Ramanujan indicate that Ramanujan had managed to discover some more series for $1/\pi$ (apart from those listed in his famous paper Modular equations and approximations to $\pi$), although he never wrote them explicitly. And these series famously include the one given by Chudnovsky brothers.

Now I show how the information in the page from Ramanujan's lost notebook can be used to obtain series for $1/\pi$. It is also important to understand that Ramanujan presented the values in table in exactly the form needed to get these series. In particular the coefficient of $Q_n^3$ has a factor $n$ and Ramanujan specifically mentions that for $n=163$ we have $$53360^3+1=3^3\cdot 7^2\cdot 11^2\cdot 19^2\cdot 127^2\cdot 163$$ The factorization for coefficient of $Q_n^3$ is needed to get the desired series as will be shown below.

Let us start with hypergeometric identity $$\left(\frac{2K}{\pi}\right)^2=\frac{1}{\sqrt{1-4G^{-24}}}{}_3F_2\left(\frac{1}{6},\frac{5}{6},\frac{1}{2};1,1;-\frac{27G^{-24}}{(1-4G^{-24})^3}\right)\tag{10}$$ Also let us use the simplified notation $$f(x) ={}_3F_2\left(\frac{1}{6},\frac{5}{6},\frac{1}{2};1,1;x\right),g(x) =xf'(x) \tag{11}$$ in what follows.

Using derivatives of elliptic integrals we can prove that $$P(-q) = \left(\frac{2K}{\pi}\right)^2(1-2k^2)+3kk'^2\frac{d}{dk} \left(\frac{2K}{\pi}\right)^2\tag{12}$$ Applying the above formula on equation $(10)$ and in the process doing reasonable amount of symbol shunting we arrive at the following beautiful result $$P(-q) =\frac{(1-2k^2)(1+8G^{-24})}{(1-4G^{-24})^{3/2}}\left(f(v(k))+6g(v(k))\right),v(k)=-\frac{27G^{-24}}{(1-4G^{-24})^3}\tag{13}$$ Next let $n>3$ be an odd positive integer and we use $q=e^{-\pi\sqrt{n}} $ so that $$P_n=P(-q), Q_n=Q(-q), R_n=R(-q), G_n=G$$ If we carefully observe the formula $(13)$ we notice at once that the first factor on right is $$\frac{(1-2k^2)(1+8G_n^{-24})}{(1-4G_n^{-24})^{3/2}}=\sqrt {\frac{R_n^2} {Q_n^3}}$$ and the values of the right hand side are given using the relationship between $Q_n, R_n$ provided by Ramanujan. It should be noted that the factorization of coefficient of $Q_n^3$ (given in relation between $Q_n, R_n$) helps in finding the square root on right side of the above equation.

Let us now set $$p_n=\frac{1}{\sqrt{Q_n}}\left(\sqrt{n}P_n-\frac{6}{\pi}\right)$$ so that $$P_n=\frac{6}{\pi\sqrt{n}}+\frac{p_n\sqrt{Q_n}}{\sqrt{n}}=\frac{6}{\pi\sqrt{n}}+\frac{p_n}{\sqrt{n}}\left(\frac{2K}{\pi}\right)^2\sqrt{1-4G_n^{-24}}$$ and using equation $(12), (13)$ we get $$P_n= \frac{6}{\pi\sqrt{n}}+\frac{p_n}{\sqrt{n}}\cdot f(v(k)) \tag{14}$$ Comparing equations $(13),(14)$ and using the values $p_n$ and the relations between $Q_n, R_n$ we get desired series for $1/\pi$ as $$\frac{6}{\pi}=\left(\sqrt{\frac{nR_n^2}{Q_n^3}}-p_n\right)f(v(k))+6\sqrt{\frac{nR_n^2}{Q_n^3}}g(v(k))\tag{15}$$ For $n=11,19,43,67,163$ the relation between $Q_n, R_n$ and value $v(k) $ is easily found. One can prove with some effort that for each of these values of $n$ the value $x_n=G_n^8$ is a root of $$x^3-u_nx^2-4=0$$ where $u_n$ is a positive integer and $y_n=x_n^3=G_n^{24}$ is a root of $$y^3 - (u_n^3 + 12)y^2 + 48y - 64=0$$ The values of $u_n$ are $$u_n=8,24,240,1320,160080$$ for the above values of $n$ respectively. Let us now note that $$v(k) =-\frac{27G_n^{-24}}{(1-4G_n^{-24})^3}=-\frac{27y_n^2}{(y_n-4)^3}=-\frac{27}{u_n^3}$$ And further we have \begin{align} \frac{R_n^2} {Q_n^3}& =\frac{(1-2k^2)^2(1+8G_n^{-24})^2}{(1-4G_n^{-24})^{3}}\notag\\ &=\frac{(1-G_n^{-24})(1+8G_n^{-24})^2} {(1-4G_n^{-24}) ^3} \notag\\ &=\frac{y_n^3+15y_n^2+48y_n-64} {(y_n-4)^3 } \notag\\ &=1+\frac{27}{u_n^3}\notag \end{align} so that using the values of $u_n$ the relations between $Q_n, R_n$ are available.

Using these values the series $(15)$ becomes $$\frac{6u_n\sqrt{u_n}}{\pi}\\=\left(\sqrt{n(u_n^3+27)}-p_nu_n\sqrt{u_n}\right)f\left(-\frac{27}{u_n^3}\right)+6\sqrt{n(u_n^3+27)}\cdot g \left(-\frac{27}{u_n^3}\right) \tag{16}$$ It should be noted that the expressions $$\sqrt{n(u_n^3+27)},p_n\sqrt{u_n}$$ are positive integers for these particular values of $n$ (we have discussed the empirical approach to evaluate the value of $p_n$ using the fact that $p_n\sqrt{u_n} $ is a positive integer earlier). For $n=35$ we have $u_n=60+28\sqrt{5}$ and these expressions are algebraic integers from field $\mathbb{Q} (\sqrt{5})$. For $n=27$ we have $u_n^3=192000$ and $$\sqrt {n(u_n^3+27)},p_n\sqrt{u_n^3}$$ are positive integers.

The above formula holds for all positive integers $n>3$ with $u_n=(G_n^{24}-4)/G_n^{16}$ and in practice one uses odd values of $n$ because the evaluation and form of $G_n$ is simpler for odd $n$. Thus for example we have $u_n=15/4,p_n=\sqrt{3/5}$ for $n=7$. It turns out that for $n=7,11,19,27,43,67,163$ the series for $1/\pi$ involve only one quadratic irrationality and other numbers appearing in the series are rational.

We exhibit two series for $1/\pi$ based on above approach for $n=11,163$. For $n=11$ we have $u_n=8,p_n=\sqrt{2}$ and $$ \sqrt{n(u_n^3+27)}-p_nu_n\sqrt{u_n}=\sqrt{11\cdot 539}-32=45$$ and $$ \sqrt{n(u_n^3+27)}=77$$ and using equation $(16)$ we get the desired series as $$\frac{96\sqrt{2}}{\pi}=45f(-27/512)+6\cdot 77g(-27/512) $$ or $$\frac{32\sqrt{2}}{\pi}=15f(-27/512)+154g(-27/512)$$ or $$\frac{32\sqrt{2}}{\pi}=\sum_{j\geq 0}(-1)^j\frac{(1/6)_j(5/6)_j(1/2)_j} {(j!) ^3}(15+154j)\left(\frac{3} {8}\right)^{3j}$$ For $n=163$ we have $$u_n=160080,p_n=362\sqrt{\frac{3}{3335}}$$ so that $$\sqrt{n(u_n^3+27)}-p_nu_n\sqrt{u_n}\\=\sqrt{163\cdot 27(53360^3+1)}-160080\cdot 362\sqrt{\frac{3}{3335}}\sqrt{160080}\tag{17}$$ Now we use the factorization $$53360^3+1=3^3\cdot 7^2\cdot 11^2\cdot 19^2\cdot 127^2\cdot 163$$ given by Ramanujan to simplify the expression in $(17)$ as $$27\cdot 163\cdot 7\cdot 11\cdot 19\cdot 127-160080\cdot 362\cdot 12=122322681$$ and we have $$\sqrt{n(u_n^3+27)} =817710201$$ and using equation $(16)$ we get $$\frac{6\cdot 160080\sqrt {160080}}{\pi} =122322681f(-1/53360^3)+817710201\cdot 6g(-1/53360^3)$$ or $$\frac{106720\sqrt{160080}}{\pi}=\sum_{j\geq 0}(-1)^j\frac{(1/6)_j(5/6)_j(1/2)_j}{(j!)^3}\frac{13591409+545140134j} {53360^{3j}}$$ which is the famous series by Chudnovsky brothers.

Note: The above presentation is based on a thread on MathOverflow I posted sometime ago.

Print/PDF Version

Ramanujan's take on Chudnovsky series for 1/π(PI): Part 1

2025-12-23T14:08:00.009+05:30

We have discussed a proof of Chudnovsky series for $1/\pi$ in this post based on Ramanujan's ideas as presented in this post. However a serious look at one of the pages from his lost notebook suggests that Ramanujan used a slightly different approach to obtain Chudnovsky type series and he also performed all the desired calculations needed to get the series in explicit form. This is what we intend to discuss in the current post.

A page from Ramanujan's Lost Notebook

On page 211 of his Lost Notebook (original manuscript) Ramanujan provides a table of relations between his functions $Q,R$ and values of a cryptic expression alongside certain positive integers

Page 211 from the Lost Notebook

Bruce C. Berndt and his collaborators tried to make some guesses and discovered correctly the functions $Q,R$ as well the cryptic expression involved and its relation to the integers listed at the beginning of each row in the table. Based on their research let us demystify the above page and to that end we assume $n$ to be a positive integer and define \begin{align} P(q)&= 1-24\sum_{j=1}^{\infty}\frac{jq^j}{1-q^j}\tag{1}\\ Q(q)&= 1+240\sum_{j=1}^{\infty}\frac{j^3q^j}{1-q^j}\tag{2}\\ R(q)&= 1-504\sum_{j=1}^{\infty}\frac{j^5q^j}{1-q^j}\tag{3} \end{align} and $$P_n=P(-e^{-\pi\sqrt{n}}),Q_n=Q(-e^{-\pi\sqrt{n}}),R_n=R(-e^{-\pi\sqrt{n}}).\tag{4}$$ The table by Ramanujan above then consists of relations between $Q_n,R_n$ for different values of $n$ listed in first column. The cryptic expression in last column is $(\sqrt{n}P_n -(6/\pi))/\sqrt{Q_n}$. Ramanujan barely writes this expression with just first term $1$ for $P_n$ and two terms for $Q_n$. Let us then rewrite the table by Ramanujan as \begin{array}{|c|c|c|} \hline n& \text{Relation between } Q_n\text{ and } R_n&\frac{\sqrt{n} P_n-(6/\pi)}{\sqrt{Q_n}}\\ \hline 11& (8^3+27)Q_n^3-8^3R_n^2=0&\sqrt{2}\\ \hline 19&(8^3+1)Q_n^3-8^3R_n^2=0&\sqrt{6}\\ \hline 27&(40^3+9)Q_n^3-40^3R_n^2=0&3\sqrt{\frac{6}{5}}\\ \hline 43&(80^3+1)Q_n^3-80^3R_n^2=0&6\sqrt{\frac{3}{5}}\\ \hline 67&(440^3+1)Q_n^3-440^3R_n^2=0&19\sqrt{\frac{6}{55}}\\ \hline 163&(53360^3+1)Q_n^3-53360^3R_n^2=0&362\sqrt{\frac{3}{3335}}\\ \hline 35&((60+28\sqrt{5})^3+27)Q_n^3-(60+28\sqrt{5})^3R_n^2=0&(2+\sqrt{5})\sqrt{\frac{2}{\sqrt{5}}} \\ \hline \end{array} The table thus deals with values of Ramanujan's functions $P(-q),Q(-q),R(-q)$ with nome $q=e^{-\pi\sqrt{n}}$ for certain integer values of $n$. These functions can be evaluated in terms of elliptic moduli and integrals associated with nome $q$ as follows \begin{align} P(-q)&=\left(\frac{2K(k)}{\pi}\right)^2\left(\frac{6E(k)}{K(k)}+4k^2-5\right)\tag{5a}\\ Q(-q)&=\left(\frac{2K(k)}{\pi}\right)^4\left(1-4G^{-24}\right)\tag{5b}\\ R(-q)&=\left(\frac{2K(k)}{\pi}\right)^6\left(1-2k^2\right)\left(1+8G^{-24}\right)\tag{5c}\\ \end{align} where $G=(2kk')^{-1/12}$ is one of Ramanujan's class invariants, $k\in(0,1)$ is the elliptic modulus, $k'=\sqrt{1-k^2}$ is the complementary modulus and elliptic integrals $K,E$ are given by $$K(k)=\int_0^{\pi/2}\frac{dx}{\sqrt{1-k^2\sin^2x}},E(k)=\int_0^{\pi/2}\sqrt{1-k^2\sin^2x}\,dx\tag{6}$$ and nome $q$ is related to elliptic integrals via $$q=\exp\left(-\pi\frac{K(k')}{K(k)}\right).$$ The function $P(q)$ is crucial in obtaining series for $1/\pi$ as decribed by Ramanujan in his famous paper "Modular equations and approximations to $\pi$" and I have discussed his approach in detail in my blog posts (see posts starting from here). The approach described in these posts deals with values of $P(q^2)$ and its connection with Dedekind's eta function and elliptic integral $K=K(k)$ \begin{align} \eta(q)&=q^{1/24}\prod_{j=1}^{\infty}(1-q^j)\tag{7a}\\ P(q^2)&=12q\frac{d}{dq}\log\eta(q^2)\tag{7b}\\ P(q^2)&= \left(\frac{2K}{\pi}\right)^2(1-2k^2)+\frac{3kk'^2}{2}\frac{d}{dk} \left(\frac{2K}{\pi}\right)^2\tag{7c} \end{align} We also have the relation $$2P(q^2)-P(-q)=\left(\frac{2K}{\pi}\right)^2(1-2k^2)\tag{8}$$ and thus the values of $P_n=P(-q)$ given in lost notebook can be used to calculate the values of $P(q^2)$ and one can follow the procedure mentioned in Ramanujan's paper to obtain series for $1/\pi$ by choosing a suitable hypergeometric series for $(2K/\pi)^2$. However it is better to deal with the function $P(-q)$ directly and while doing so we will also see the usefulness of the relationship between $Q_n,R_n$ given in table above.

Evaluation of $P_n=P(-q) $

Let us observe that $$P(-q) =24q\frac {d} {dq} \log\eta(-q) =q\frac{d} {dq} \log\eta^{24}(-q) =q\frac{d} {dq} \log(-\eta^{24}(-q))$$ and we have \begin{align} -\eta^{24}(-q)&=q\prod_{j\geq 1}(1-(-q)^j)^{24}\notag\\ &=q\prod_{j\geq 1}(1+q^{2j-1})^{24}(1-q^{2j})^{24}\notag\\ &=\left(q^{-1}\prod_{j\geq 1}(1+q^{2j-1})^{24}\right) \left(q^2\prod_{j\geq 1}(1-q^{2j}) ^{24}\right) \notag\\ &=2^6(2kk')^{-2}\eta^{24}(q^2)\notag\\ &=2^4(kk')^{-2}\cdot 2^{-8}(2K/\pi)^{12}(kk')^{4}\notag\\ &=2^{-4}\left(\frac{2K}{\pi}\right)^{12} (kk') ^2\notag \end{align} Using logarithmic differentiation and noting that $$\frac{dq} {dk} =\frac{\pi^2q}{2kk'^2K^2}$$ we arrive at $$P(-q) =\left(\frac {2K}{\pi}\right)^2\left(\frac{6E}{K}+4k^2-5\right)$$ mentioned earlier. A similar procedure leads us to $$P(q^2)= \left(\frac {2K}{\pi}\right)^2\left(\frac{3E}{K}+k^2-2\right)$$ and then we get $$2P(q^2)-P(-q)= \left(\frac {2K}{\pi}\right)^2\left(1-2k^2\right)$$ which is also mentioned earlier.

Let us now take another elliptic modulus $l$ which is of degree $n$ over $k$ so that its nome is $q^n$ and the corresponding elliptic integral is denoted by $L$ and $L'/L=nK'/K$. Then we have $$\eta^{24}(-q^n)=-2^{-4}(2L/\pi)^{12}(ll')^{2}$$ Next we have $$nP(-q^n) =q\frac{d} {dq} \log\eta^{24}(-q^n)$$ so that \begin{align} nP(-q^n) - P(-q) &=q\frac{d} {dq} \log\frac{\eta^{24}(-q^n)}{\eta^{24}(-q)} \notag\\ &=q\frac{dk} {dq} \frac{d} {dk} \log\left\{\left(\frac{L} {K} \right) ^{12}\left(\frac{ll'} {kk'} \right) ^2\right\} \notag\\ &=\left(\frac{2K}{\pi}\right) ^2kk'^2\frac{d} {dk} \log\left\{\left(\frac{L} {K} \right) ^{6}\frac{ll'} {kk'} \right\} \notag\\ &=\frac{4KL}{\pi^2}mkk'^2\frac{d} {dk} \log\frac{ll'} {m^6kk'}\notag \end{align} where $m=K/L$ is the multiplier. Since $m$ as well as $l$ are algebraic functions of $k$ it follows that $$nP(-q^n) - P(-q) =\frac{4KL} {\pi^2}B_n(l, k)\tag{9} $$ where $B_n(l, k) $ is an algebraic function of $l, k$. If we put $l=k'$ so that $L=K'$ and $K'/K=1/\sqrt{n}$ the $q=e^{-\pi/\sqrt {n}} $ and we get $$nP(-e^{-\pi\sqrt{n}} ) - P(-e^{-\pi/\sqrt{n}} ) =\left(\frac{2L}{\pi}\right)^2B_n\tag{10}$$ where $L$ corresponds to nome $e^{-\pi\sqrt{n}} $ and $B_n$ is some algebraic number dependent on $n$.

Next note that $$\frac{\eta^{24}(-e^{-\pi\sqrt{n}})} {\eta^{24}(-e^{-\pi/\sqrt{n}}) } =\left(\frac{K'} {K} \right) ^{12}=n^{-6}$$ and logarithmic differentiation with respect to $n$ gives us $$nP(-e^{-\pi\sqrt{n}}) +P(-e^{-\pi/\sqrt{n}}) =\frac{12\sqrt{n}}{\pi}\tag{11}$$ Using equations $(10),(11)$ we see that $$P(-e^{-\pi\sqrt{n}}) =\frac{6}{\pi\sqrt{n}}+\left(\frac{2K}{\pi} \right) ^2\frac{B_n}{2n}$$ where elliptic integral $K$ corresponds to nome $e^{-\pi\sqrt{n}} $. Ramanujan perhaps calculated certain values of this expression for different values of $n$ and figured out that multiplying the above expression by $\sqrt{n} $ and further dividing the result by $\sqrt{Q(-q)} $ to get rid of factor $(2K/\pi) ^2 $ leads to very simple algebraic numbers. And this probably led to the table of values of the expression $(\sqrt{n} P_n-(6/\pi)) /\sqrt{Q_n} $.

Ramanujan gave the value $$\frac{1}{\sqrt{Q_n}}\left(\sqrt{n}P_n-\frac{6}{\pi}\right)=\sqrt{2}$$ for $n=11$ and we show the calculations needed to obtain this value. We need two ingredients here with the first one being the modular equation of degree $11$ given by Ramanujan $$\sqrt{kl} +\sqrt{k'l'} +2(4klk'l')^{1/6}=1\tag{12}$$ The equation is symmetric in $k, l$ and hence works when modulus $l$ is of degree $11$ over modulus $k$ as well as when $k$ is of degree $11$ over $l$. Let us assume the latter here and let $k, l$ correspond to nomes $e^{-\pi\sqrt{11}},e^{-\pi/\sqrt{11}} $ respectively so that $l=k'$ and then the equation $(12)$ reduces to $$2\sqrt{kk'}+2(2kk')^{1/3}=1$$ Let us further observe that $G=G_{11}=(2kk')^{-1/12}$ and hence the above equation can be written in terms of $G$ as $$2(G^{-12}/2)^{1/2}+2G^{-4}=1$$ or $$\sqrt{2}+2G^{2}=G^6$$ so that $a=G^2=G_{11}^2$ is the root of $$f(x) = x^3-2x-\sqrt{2}\tag{13}$$ It can be checked that this equation has no roots in $\mathbb{Q} (\sqrt{2})$ and hence $a$ is of degree $3$ over $\mathbb{Q} (\sqrt{2})$ and hence of degree $6$ over $\mathbb{Q} $.

The minimal polynomial for $a$ over rationals is then $$(x^3-2x-\sqrt{2})(x^3-2x+\sqrt{2})$$ or $$x^6-4x^4+4x^2-2$$ so that $b=a^2=G_{11}^4$ is a root of $$g(x) =x^3-4x^2+4x-2\tag{14}$$ Next ingredient we need is the expression for $nP(q^{2n})-P(q^2)$ for $n=11$ provided by Ramanujan as $$nP(q^{2n})-P(q^2)=\frac{8KL}{\pi^2}\left\{2(1+kl+k'l')+\sqrt{kl} +\sqrt{k'l'} - \sqrt{klk'l'} \right\}\tag{15}$$ and here we can again assume $k$ being of degree $11$ over $l$ so that $l$ corresponds to $q$ and $k$ to $q^{n} $ and $K'/K=nL'/L$. Putting $q=e^{-\pi/\sqrt{11}}$ so that $l=k'$ and $L/K=K'/K=\sqrt {11}$ we get $$nP(q^{2n})-P(q^2)=\sqrt{11}\left(\frac{2K}{\pi}\right)^2(4(1+2kk')+4\sqrt{kk'}-2kk')$$ or $$nP(q^{2n})-P(q^2)=\sqrt{11}\left(\frac{2K}{\pi}\right)^2(4(1+G^{-12})+2\sqrt{2}G^{-6}-G^{-12})$$ or $$nP(q^{2n})-P(q^2)=\sqrt{11}\left(\frac{2K}{\pi}\right)^2(4+3b^{-3}+2\sqrt{2}a^{-3})$$ Using $\sqrt{2}=a^3-2a$ we get $$nP(q^{2n})-P(q^2)=\sqrt{11}\left(\frac{2K}{\pi}\right)^2\frac{6b^3-4b^2+3}{b^3}$$ We need another identity $$nP(q^{2n})+P(q^2)=\frac{6\sqrt{n}}{\pi}$$ which holds for $q=e^{-\pi/\sqrt{n}} $. Using this identity together with previous equation we get $$2P(e^{-2\pi\sqrt {11}})=\frac{6}{\pi\sqrt{11}}+\frac{1}{\sqrt{11}} \left(\frac{2K}{\pi}\right)^2\frac{6b^3-4b^2+3}{b^3}\tag{16}$$ Next we use the identity $$2P(q^2)-P(-q)=\left(\frac{2K}{\pi}\right)^2(1-2k^2)$$ with $q=e^{-\pi\sqrt{11}}$ to get $$P(-q)=\frac{6}{\pi\sqrt{11}}+\left(\frac {2K}{\pi}\right)^2\left(\frac{6b^3-4b^2+3}{b^3\sqrt{11}}-(1-2k^2)\right)$$ or $$\sqrt{n} P_n-\frac{6}{\pi}=\left(\frac {2K}{\pi}\right)^2\left(\frac{6b^3-4b^2+3}{b^3}-\frac{\sqrt{11(b^6-1)}}{b^3}\right)\tag{17}$$ as $$(1-2k^2)^2=1-G^{-24}=1-b^{-6}$$ To evaluate $Q_n$ we note that $$Q(-q) =\left(\frac{2K}{\pi}\right) ^4(1-4G^{-24})=\left(\frac{2K}{\pi}\right) ^4\frac{b^6-4}{b^6}$$ and hence $$\frac{1}{\sqrt{Q_n}}\left(\sqrt{n}P_n-\frac{6}{\pi}\right)=\frac{6b^3-4b^2+3-\sqrt{11(b^6-1)}}{\sqrt {b^6-4}}.$$ The above expression can be simplified using equation $(14)$ which says that $$b^3-4b^2+4b-2=0$$ or $$b(b^2+4)=4b^2+2.$$ On squaring the above equation we get $$b^2(b^2+4)^2=(4b^2+2)^2$$ so that $b^2=G_{11}^8$ is a root of $$x(x+4)^2=(4x+2)^2$$ or $$x^3-8x^2-4=0.$$ This means that $$\sqrt {b^6-4}=\sqrt{8b^4}=2\sqrt {2}b^2.$$ We can thus prove that $$\frac{6b^3-4b^2+3-\sqrt{11(b^6-1)}}{\sqrt{b^6-4}}=\sqrt {2}$$ if $$\sqrt{11(b^6-1)}=6b^3-8b^2+3$$ or $$11(b^6-1)=(6(b^3-4b^2+4b-2)+16b^2-24b+15)^2$$ or $$11(8b^4+3)=(16b^2-24b+15)^2$$ or $$88b^4+33=256b^4+576b^2+480b^2+225-768b^3-720b$$ or $$168b^4-768b^3+1056b^2-720b+192=0$$ or $$7b^4-32b^3+44b^2-30b+8=0$$ or $$(7b-4)(b^3-4b^2+4b-2)=0$$ and this is true via equation $(14)$.

The case for $n=27$ can be handled with more ease using modular equations of degree $3$. Let the moduli $k, l$ and elliptic integrals $K, L$ correspond to nomes $q^3,q$ respectively so that $k$ is of degree $3$ over $l$. Ramanujan established that $$3P(q^6)-P(q^2)=\frac{4KL}{\pi^2}\left\{1+kl+k'l'\right\}\tag{18}$$ Using Landen transformation we can replace $q^2$ by $q$, $K$ by $(1+k)K$, $L$ by $(1+l)L$, $k$ by $2\sqrt{k}/(1+k)$, $l$ by $2\sqrt{l}/(1+l)$ in above formula to get $$3P(q^3)-P(q)=\frac{4(1+k)(1+l)KL}{\pi^2}\left\{1+\frac{4\sqrt {kl}} {(1+k)(1+l)} +\frac{(1-k)(1-l)}{(1+k)(1+l)}\right\}$$ and after some simplification we get $$3P(q^3)-P(q)=\frac{8KL}{\pi^2}\left\{1+\sqrt{kl}\right\}^2$$ Next we change $q$ to $-q$. This changes $K$ to $k'K$ and $L$ to $l'L$. To see how this transforms the expression $\sqrt{kl} $ let us observe that $$\sqrt{kl} =4q\frac{\psi(q^2)\psi(q^6)}{\phi(q)\phi(q^3)}$$ where $$\phi(q) =\sum_{n\in\mathbb{Z}} q^{n^2}$$ and $$\psi(q) =\sum_{n\geq 0}q^{n(n+1)/2}$$ are Ramanujan theta functions. The function $\phi(q) $ is same as $\vartheta_3(q)$ of Jacobi and one can now observe that changing $q$ to $-q$ changes $\sqrt {kl} $ to $-\sqrt {kl} /\sqrt{k'l'} $ and hence we get $$3P(-q^3)-P(-q)=\frac{8KL}{\pi^2}\left\{\sqrt{k'l'}-\sqrt{kl}\right\}^2\tag{19}$$ We are going to use this equation repeatedly in what follows. We will also need the following formulas related to multiplier $m=L/K$ established by Ramanujan: \begin{align} m&=\left\{1+4\left(\frac{(kk')^3}{ll'}\right)^{1/4}\right\} ^{1/2}\tag{20a}\\ k^{2} &= \frac{(m - 1)^{3}(m + 3)}{16m}\tag{20b} \\ l^{2} &= \frac{(m - 1)(m + 3)^{3}}{16m^{3}}\tag{20c} \\ k'^{2} &= \frac{(m + 1)^{3}(3 - m)}{16m}\tag{20d} \\ l'^{2} &= \frac{(m + 1)(3 - m)^{3}}{16m^{3}}\tag{20e} \end{align} Next let us now use moduli related to $n=27$ and let $k, l, r, s$ be elliptic moduli and $K, L, R, S$ be elliptic integrals corresponding to nomes $q^{27},q^9,q^3,q=\exp(-\pi/3\sqrt{3})$ respectively so that $s=k', r=l'$ and further $k, l, r$ are of degree $3$ over $l, r, s$ respectively. Also we have $$l=\frac{\sqrt{3}-1}{2\sqrt{2}},l'=\frac{\sqrt{3}+1}{2\sqrt{2}}$$ so that $ll'=1/4$. Using $(19)$ repeatedly we have $$3P(-q^3)-P(-q)=\frac{8RS}{\pi^2}(\sqrt{r's'}-\sqrt{rs})^2=\frac{72KL}{\pi^2}(\sqrt{k'l'}-\sqrt{kl} )^2\tag{21}$$ and $$3P(-q^9)-P(-q^3)=\frac{8LR}{\pi^2}(\sqrt{l'r'}-\sqrt{lr})^2=0\tag{22}$$ as $lr=l'r'$ and $$3P(-q^{27})-P(-q^9)=\frac{8KL}{\pi^2}(\sqrt{k'l'}-\sqrt{kl})^2\tag{23}$$ Multiplying equation $(23)$ by $9$, $(22)$ by $3$ and then adding these to $(21)$ we get $$27P(-q^{27})-P(-q)=\frac{144KL}{\pi^2}(\sqrt{k'l'}-\sqrt{kl})^2\tag{24}$$ Noting that $k$ is of degree $3$ over $l$ we have using equations $(20b),\dots,(20e)$ $$\sqrt {kl} - \sqrt{k'l'} =\frac{(m-1)(m+3)-(m+1)(3-m)}{4m}=\frac{m^2-3}{2m}$$ and hence $$27P(-q^{27})-P(-q)=\frac{4K^2}{\pi^2}\cdot\frac{9(m^2-3)^2}{m}$$ Next we note that the value $2kk'$ equals $G_{27}^{-12}$ and it is known that $$G_{27}=\frac{\sqrt[12]{2}} {\sqrt[3]{\sqrt[3]{2}-1}} $$ and hence $$2kk'=\frac{(a-1)^4}{2}$$ where $a=2^{1/3}$. We also note that $ll'=1/4$ and use the equation $(20a) $ to get $$m^2=1+2^{7/4}(2kk')^{3/4}=1+2(a-1)^3=1+2(a^3-3a^2+3a-1)=3+6a-6a^2$$ or $$m=\sqrt{3}\sqrt{1+2a-2a^2}$$ and thus $$\frac{9(m^2-3)^2}{m}=\frac{324(a^2+2a-4)}{\sqrt{3}\sqrt{1+2a-2a^2}}$$ and finally we have $$27P(-q^{27})-P(-q)=\frac{4K^2}{\pi^2}\cdot\frac{324(a^2+2a-4)}{\sqrt{3}\sqrt{1+2a-2a^2}}\tag{25}$$ We also have the identity $$27P(-q^{27})+P(-q)=\frac{12\sqrt{n}}{\pi}=\frac{36\sqrt{3}}{\pi}$$ From the above two equations we get $$\sqrt {n} P_n-\frac{6}{\pi}=\sqrt{27}P(-q^{27})-\frac{6}{\pi}=\frac{4K^2}{\pi^2}\cdot\frac{18(a^2+2a-4)}{\sqrt{1+2a-2a^2}}$$ Next we have $$Q_n=\left(\frac{2K}{\pi}\right)^4(1-4G_{27}^{-24})$$ or $$\sqrt{Q_n} =\left(\frac{2K}{\pi}\right)^2\sqrt{1-(a-1)^8}$$ and thus $$\frac{1}{\sqrt{Q_n}}\left(\sqrt {n} P_n-\frac{6}{\pi}\right)=\frac{18(a^2+2a-4)}{\sqrt{(1+2a-2a^2) (1-(a-1)^8) }}\tag{26} $$ The square of right hand side is a rational function of $a$ and one can use the relation $a^3=2$ to reduce numerator and denominator each to quadratic polynomials in $a$. First we deal with square of denominator which equals $$(1+2a-2a^2)(1-(a-1)^8)$$ which is same as $$(1+2a-2a^2)(-a^8 + 8a^7 - 28a^6 + 56a^5 - 70a^4 + 56a^3 - 28a^2 + 8a)$$ Reducing the above using $a^3=2$ we get $$(1+2a-2a^2)(-4a^2+32a-112+112a^2-140a+112-28a^2+8a)$$ or $$20a(1+2a-2a^2)(4a-5)=20a(-8a^3 + 18a^2 - 6a - 5) $$ and the above finally equals $$60a(6a^2-2a-7)=60(6a^3-2a^2-7a)=60(12-7a-2a^2)$$ The square of numerator of right side of $(26)$ is $$324(a^2+2a-4)^2=324(a^4+4a^2+16+4a^3-16a-8a^2)$$ which reduces to $$648(12-7a-2a^2)$$ Thus the square of right hand side of $(26)$ equals $648/60=54/5$ and hence the expression on right hand side of $(26)$ equals $\sqrt{54/5}=3\sqrt{6/5}$.

We postpone the evaluation of $P_n$ for other values of $n$ to next post to keep the length of the current post at a manageable level. In the next post we will also obtain certain nice series for $1/\pi$ based on these ideas.

Print/PDF Version

The General Binomial Theorem: Part 2

2016-07-23T14:21:00.000+05:30

In the previous post we established the general binomial theorem using Taylor's theorem which uses derivatives in a crucial manner. In this post we present another approach to the general binomial theorem by studying more about the properties of the binomial series itself. Needless to say, this approach requires some basic understanding about infinite series and we will assume that the reader is familiar with ideas of convergence/divergence of an infinite series and some of the tests for convergence of a series.

Apart from the basic understanding of infinite series we will require a fundamental theorem concerning the multiplication of two infinite series. This we present next.

Multiplication of Infinite Series

Consider two infinite series $$\sum a_{n} = a_{1} + a_{2} + \cdots + a_{n} + \cdots$$ and $$\sum b_{n} = b_{1} + b_{2} + \cdots + b_{n} + \cdots$$ If we multiply them term by term (i.e multiply each term of one series by first term of another series, then multiply each term of the first series by second term of the second series and so on) we get a sum (whose meaning we are yet to define) like the following \begin{align} a_{1}b_{1} &+ a_{1}b_{2} + a_{1}b_{3} + \cdots +\notag\\ a_{2}b_{1} &+ a_{2}b_{2} + a_{2}b_{3} + \cdots +\notag\\ a_{3}b_{1} &+ a_{3}b_{2} + a_{3}b_{3} + \cdots +\notag\\ \cdots &+ \cdots +\notag\\ a_{n}b_{1} &+ a_{n}b_{2} + \cdots +\notag\\ \cdots &+ \cdots +\notag \end{align} In order to give this 2 dimensional expression a meaning it is better to regroup the terms and arrange them in linear fashion to make a new infinite series. One particularly nice way to group terms is to add the diagonal entries and get terms like $(a_{1}b_{1}), (a_{1}b_{2} + a_{2}b_{1}), \cdots$ and in general the $n^{\text{th}}$ group is $$c_{n} = a_{1}b_{n} + a_{2}b_{n - 1} + \cdots + a_{n - 1}b_{2} + a_{n}b_{1}$$ and then we have another series $$\sum c_{n} = c_{1} + c_{2} + \cdots + c_{n} + \cdots$$ and since the series $c_{n}$ eventually captures all the terms which are obtained by multiplying the terms of series $\sum a_{n}$ and $\sum b_{n}$ it is reasonable to expect that the sum of series $\sum c_{n}$ will be the product of sums of $\sum a_{n}$ and $\sum b_{n}$. This is in fact true under very general circumstances and we will establish a result to that effect. Before we proceed to do that we first establish certain lemmas on sequences.

Lemma 1: If $\lim_{n \to \infty}a_{n} = A$ then $$\lim_{n \to \infty}\frac{a_{1} + a_{2} + \cdots + a_{n}}{n} = A$$ Let $t_{n} = a_{n} - A$ so that $t_{n} \to 0$ and we can see that $$\frac{a_{1} + a_{2} + \cdots + a_{n}}{n} = \frac{t_{1} + t_{2} + \cdots + t_{n}}{n} + A$$ and hence it suffices to prove that $$\frac{t_{1} + t_{2} + \cdots + t_{n}}{n} \to 0$$ under the condition that $t_{n} \to 0$. Consider $s_{n} = [\sqrt{n}]$ (greatest integer not exceeding $\sqrt{n}$) so that $s_{n}$ is an integer and $s_{n} \to \infty, s_{n}/n \to 0$ as $n \to \infty$. Let $\epsilon > 0$ be given. Since $t_{n} \to 0$ there is a positive integer $n_{0}$ such that $|t_{n}| < \epsilon / 2$ for all $n \geq n_{0}$. Further since $s_{n} \to \infty$ it follows that there is a positive integer $m_{1}$ such that $s_{n} > n_{0}$ for $n \geq m_{1}$. Thus it follows that $|t_{n}| < \epsilon/2$ for $n \geq s_{m_{1}}$. Again $t_{n} \to 0$ hence the sequence $t_{n}$ is bounded and let $|t_{n}| < K$ for all $n$. Further note that $s_{n}/n \to 0$ hence there is a positive integer $m_{2}$ such that $|s_{n}/n| < \epsilon/2K$ for all $n \geq m_{2}$. Let $$T_{n} = \frac{t_{1} + t_{2} + \cdots + t_{n}}{n}$$ Then for $n \geq m = \max(m_{1}, m_{2})$ \begin{align} |T_{n}| &= \left|\frac{t_{1} + t_{2} + \cdots + t_{n}}{n}\right|\notag\\ &\leq \frac{|t_{1}| + |t_{2}| + \cdots + |t_{s_{m}}|}{n} + \frac{|t_{s_{m} + 1}| + |t_{s_{m} + 2}| + \cdots + |t_{n}|}{n}\notag\\ &< \frac{s_{m}K}{n} + \frac{n - s_{m}}{n}\cdot\frac{\epsilon}{2}\notag\\ &< \frac{s_{n}}{n}\cdot K + \frac{\epsilon}{2}\notag\\ &< \frac{\epsilon}{2K}\cdot K + \frac{\epsilon}{2}\notag\\ &= \epsilon\notag \end{align} and therefore $T_{n} \to 0$ as $n \to \infty$. This completes the proof of the lemma.

Using this lemma we prove that:

Lemma 2: If $a_{n} \to A, b_{n} \to B$ as $n \to \infty$ then $$\lim_{n \to \infty}\frac{a_{1}b_{n} + a_{2}b_{n - 1} + \cdots + a_{n - 1}b_{2} + a_{n}b_{1}}{n} = AB$$ Let $a_{n} = A + t_{n}$ so that $t_{n} \to 0$ as $n \to \infty$. Then \begin{align} c_{n} &= \frac{a_{1}b_{n} + a_{2}b_{n - 1} + \cdots + a_{n - 1}b_{2} + a_{n}b_{1}}{n}\notag\\ &= A\cdot\frac{b_{1} + b_{2} + \cdots + b_{n}}{n} + \frac{t_{1}b_{n} + t_{2}b_{n - 1} + \cdots + t_{n - 1}b_{2} + t_{n}b_{1}}{n}\notag \end{align} Since $b_{n} \to B$ it is bounded by some $K$ so that $|b_{n}| < K$ for all $n$. Now the first term in the above equation tends to $AB$ (by Lemma 1) and the absolute value of second fraction is less than $$K\cdot\frac{|t_{1}| + |t_{2}| + \cdots + |t_{n}|}{n}$$ which also tends to $K \cdot 0 = 0$ by Lemma 1. Hence it follows that $c_{n} \to AB$ as $n \to \infty$.

We are now ready to prove Abel's theorem on multiplication of infinite series:

Abel's Theorem: Let $a_{n}, b_{n}$ be any two sequences and let \begin{align} A_{n} &= a_{1} + a_{2} + \cdots + a_{n}\notag\\ B_{n} &= b_{1} + b_{2} + \cdots + b_{n}\notag\\ c_{n} &= a_{1}b_{n} + a_{2}b_{n - 1} + \cdots + a_{n - 1}b_{2} + a_{n}b_{1}\notag\\ C_{n} &= c_{1} + c_{2} + \cdots + c_{n}\notag \end{align} If $A_{n} \to A, B_{n} \to B, C_{n} \to C$ as $n \to \infty$ then $C = AB$.

The theorem says that if $\sum a_{n}, \sum b_{n}$ are two convergent series and $\sum c_{n}$ is the series formed by term by term multiplication of $\sum a_{n}, \sum b_{n}$ (with terms grouped in a manner explained above) and further if $\sum c_{n}$ is convergent then $\sum c_{n} = (\sum a_{n})(\sum b_{n})$.

The proof of the theorem is based on the simple idea that we can write $a_{n} = A_{n} - A_{n - 1}, b_{n} = B_{n} - B_{n - 1}$ for $n > 1$ and $a_{1} = A_{1}, b_{1} = B_{1}$. We make the convention that $A_{0} = B_{0} = 0$ so that the above relations hold for all $n$. Using these relations it is easily proven that \begin{align} C_{n} &= a_{1}B_{n} + a_{2}B_{n - 1} + \cdots + a_{n}B_{1}\notag\\ &= b_{1}A_{n} + b_{2}A_{n - 1} + \cdots + b_{n}A_{1}\notag\\ \sum_{i = 1}^{n}C_{i} &= A_{1}B_{n} + A_{2}B_{n - 1} + \cdots + A_{n}B_{1}\notag \end{align} By lemma 2 it is clear that $$\frac{1}{n}\cdot\sum_{i = 1}^{n}C_{i} = \frac{A_{1}B_{n} + A_{2}B_{n - 1} + \cdots + A_{n}B_{1}}{n} \to AB$$ and further $C_{n} \to C$ therefore by lemma 1 $$\frac{1}{n}\cdot\sum_{i = 1}^{n}C_{i} \to C$$ and therefore $C = AB$.

Multiplication of Power Series

The above rule for multiplication of infinite series can be used to multiply special kinds of series called power series. A power series is of the form $$f(x) = \sum_{n = 0}^{\infty}a_{n}x^{n} = a_{0} + a_{1}x + a_{2}x^{2} + \cdots\tag{1}$$ where $a_{n}$ is a sequence and $x$ is a variable. If for some values of $x$ lying in a certain set $A$ the above series is convergent then its sum defines a function from $A \to \mathbb{R}$. Clearly the series is convergent if $x = 0$ and it is possible that it may be convergent for other values of $x$. Power series have a special property that if they are convergent for $x = r$ then they are convergent for all values of $x$ with $|x| < |r|$. This is not difficult to prove.

Let the series $(1)$ be convergence for $x = r \neq 0$. Then $a_{n}r^{n} \to 0$ and hence $|a_{n}r^{n}| < K$ for all $n$ and some $K$. Let $|x| = r' < |r|$ then $$|a_{n}x^{n}| = |a_{n}|r|^{n}|\left(\frac{r'}{|r|}\right)^{n} < K\left(\frac{r'}{|r|}\right)^{n}$$ and the series is convergent by comparison with the geometric series $\sum(r'/|r|)^{n}$.

Thus the region of convergence of a power series is always an interval of the form $(-R, R)$ where $R$ may also be $\infty$. Moreover the power series may or may not converge for $x = \pm R$ and then we say that $R$ is the radius of convergence (the word radius comes due the reason that if $x$ is treated as a complex variable then the region of convergence turns out to be a circle with radius $R$).

Using the rule for multiplication of infinite series we can multiply two power series $$f(x) = \sum_{n = 0}^{\infty}a_{n}x^{n}, g(x) = \sum_{n = 0}^{\infty}b_{n}x^{n}$$ to get another power series $$h(x) = \sum_{n = 0}^{\infty}c_{n}x^{n}$$ where $$c_{n} = a_{0}b_{n} + a_{1}b_{n - 1} + \cdots + a_{n - 1}b_{1} + a_{n}b_{0}$$ Whether this new series is convergent or not needs to be determined separately.

Exponential Property of the Binomial Series

We will now study a very important property of the binomial series $$f(x, n) = 1 + \binom{n}{1}x + \binom{n}{2}x^{2} + \cdots = \sum_{m = 0}^{\infty}\binom{n}{m}x^{m}\tag{2}$$ where $n$ is a real number. If $p, q$ are real and we multiply the two binomial series $f(x, p), f(x, q)$ then we get another series $$h(x) = \sum_{m = 0}^{\infty}c_{m}x^{m}$$ where $$c_{m} = \binom{p}{0}\binom{q}{m} + \binom{p}{1}\binom{q}{m - 1} + \cdots + \binom{p}{m - 1}\binom{q}{1} + \binom{p}{m}\binom{q}{0}\tag{3}$$ The challenge is to evaluate the coefficient $c_{m}$ in terms of a simple formula.

Note that if $p, q$ are positive integers then both the series $f(x, p), f(x, q)$ turn out to be finite as $\binom{p}{m}, \binom{q}{m}$ become $0$ as $m$ exceeds $p$ and $q$. Moreover in this case we know that $$f(x, p) = (1 + x)^{p}, f(x, q) = (1 + x)^{q}$$ and therefore $$f(x, p)f(x, q) = (1 + x)^{p + q}$$ and the coefficient of $x^{m}$ in $(1 + x)^{p + q}$ is $\binom{p + q}{m}$ (it will become $0$ as soon as $m$ exceeds $p + q$). Thus it follows from multiplication of series that $$\binom{p}{0}\binom{q}{m} + \binom{p}{1}\binom{q}{m - 1} + \cdots + \binom{p}{m - 1}\binom{q}{1} + \binom{p}{m}\binom{q}{0} = \binom{p + q}{m}$$ for all positive integers $p, q$ and all non-negative integers $m$. Note that by definition the expression $\binom{p}{m}$ is a polynomial in $p$ and thus we see that the above equation is an identity between polynomials of two variables $p, q$ which holds for all positive integral values of $p, q$. Thus this identity must be true identically for all real values of $p, q$ and thus the evaluation of $c_{m}$ from equation $(3)$ is complete and we have $$c_{m} = \binom{p}{0}\binom{q}{m} + \binom{p}{1}\binom{q}{m - 1} + \cdots + \binom{p}{m - 1}\binom{q}{1} + \binom{p}{m}\binom{q}{0} = \binom{p + q}{m}\tag{4}$$ for all real values of $p, q$. The series $h(x) = \sum c_{m}x^{m}$ thus turns out to be a binomial series $f(x, p + q)$. From Abel's theorem on multiplication of infinite series it follows that $$f(x, p)f(x, q) = f(x, p + q)$$ when all the three binomial series involved are convergent. By ratio test it is easily proved that binomial series is convergent whenever $|x| < 1$. Thus we have the following result:

The binomial series $$f(x, n) = 1 + \binom{n}{1}x + \binom{n}{2}x^{2} + \cdots = \sum_{m = 0}^{\infty}\binom{n}{m}x^{m}$$ is convergent for all values of $x$ with $|x| < 1$ and if $|x| < 1$ then it satisfies the following functional equation $$f(x, p)f(x, q) = f(x, p + q)\tag{5}$$ for all real values of $p, q$.

The above functional equation reminds us of the property satisfied by exponential function namely $F(x + y) = F(x)F(y)$ as far as the parameter $p$ of $f(x, p)$ is concerned. By simple algebraic arguments we can thus establish that $$f(x, n) = \{f(x, 1)\}^{n} = (1 + x)^{n}$$ for all rational values of $n$. This completes the proof of general binomial theorem when index $n$ is rational. The extension to irrational values of $n$ is achieved by noting that $f(x, n)$ is a continuous function of $n$ for a fixed value of $x$ with $|x| < 1$.

Behavior of Binomial Series at $x = \pm 1$

For the sake of completeness it is best to discuss the behavior of the binomial series $f(x, n)$ in equation $(1)$ for $x = \pm 1$. Let's first consider $x = 1$ and then we need to check the convergence (and find the sum if it is convergent) of the series $$f(1, n) = 1 + \binom{n}{1} + \binom{n}{2} + \cdots = \sum_{m = 0}^{\infty}\binom{n}{m}$$ The general term of this series $$a_{m} = \binom{n}{m} = \frac{n(n - 1)(n - 2)\cdots (n - m + 1)}{m!}$$ and clearly if $n \leq -1$ then $$\left|\binom{n}{m}\right| \geq 1$$ and thus the general term of the series does not tend to $0$. Therefore the binomial series does not converge if $n \leq -1$. If $n > -1$ then we can see that $$\frac{a_{m + 1}}{a_{m}} = \frac{n - m}{(m + 1)} = -\left(1 - \frac{n + 1}{m + 1}\right)$$ so that $a_{m}$ ultimately alternates in sign and since $n > -1$ the terms decrease in absolute value after a certain value of $m$. Next we note that $$\log\left|\frac{a_{m + 1}}{a_{m}}\right| = \log\left(\frac{m - n}{m + 1}\right) = \log\left(1 - \frac{n + 1}{m + 1}\right) < - \frac{n + 1}{m + 1}$$ and hence $$\log\left|\frac{a_{m + p}}{a_{m + 1}}\right| < - (n + 1)\sum_{i = 1}^{p}\frac{1}{m + i}$$ and the expression on right tends to $-\infty$ if $p \to \infty$. It follows that $a_{m + p} \to 0$ as $p \to \infty$. Thus $a_{m} \to 0$ as $m \to \infty$. It follows from Leibniz test for alternating series that the series under discussion is convergent if $n > -1$. To calculate its sum we need to apply Taylor's series on the function $g(x) = (1 + x)^{n}$ and we have $$2^{n} = g(1) = 1 + \binom{n}{1} + \binom{n}{2} + \cdots + \binom{n}{m - 1} + R_{m}$$ where the remainder $R_{m}$ in Lagrange's form is given by $$R_{m} = \frac{g^{(m)}(\theta)}{m!} = \binom{n}{m}(1 + \theta)^{n - m}$$ and hence for $m > n$ we have $$|R_{m}| < \left|\binom{n}{m}\right| = |a_{m}|$$ and since $a_{m} \to 0$ as $m \to \infty$ for $n > -1$, it follows that $R_{m} \to 0$ as $m \to \infty$ for $n > -1$. Thus we have the following result:

The binomial series $$f(x, n) = 1 + \binom{n}{1}x + \binom{n}{2}x^{2} + \cdots + \binom{n}{m}x^{m} + \cdots$$ for $x = 1$ is convergent only when $n > - 1$ and then its sum as expected is equal to $2^{n}$.

Next we consider the behavior of series $f(x, n)$ for $x = -1$. If we write $n = -p$ then we can see that the general term is given by $$(-1)^{m}\binom{n}{m} = (-1)^{m}\binom{-p}{m} = \frac{p(p + 1)\cdots (p + m - 1)}{m!}$$ and we can sum the series directly to get \begin{align} 1 + p &+ \frac{p(p + 1)}{2!} + \cdots + \frac{p(p + 1)\cdots (p + m - 1)}{m!}\notag\\ &= \frac{(p + 1)(p + 2)\cdots (p + m)}{m!}\notag\\ &= (-1)^{m}\binom{n - 1}{m}\notag \end{align} and we have seen earlier that the final expression converges if and only if $ n - 1 > -1$ i.e. $n > 0$ and then it tends to $0$ as $m \to \infty$. Hence the binomial series for $x = -1$ is convergent if and only if $n > 0$ and then its sum is $0$.

To summarize, the binomial series $$f(x, n) = 1 + nx + \frac{n(n - 1)}{2!}x^{2} + \frac{n(n - 1)(n - 2)}{3!}x^{3} + \cdots$$ has the following behavior:

if $n=0$ or a positive integer then the series terminates at $(n+1)$'th term and for $n=0$ the sum is $1$ otherwise the sum is $(1+x)^{n}$ regardless of the value of $x$.

it is convergent for all real values of $x$ with $|x| < 1$ and all real values of $n$ and its sum is $(1 + x)^{n}$.

for $x = 1$ it is convergent for all real values of $n$ with $n > -1$ and its sum is $2^{n}$.

for $x = -1$ it is convergent for all real values of $n$ with $n > 0$ and its sum is $0$.

for any other combination of real values of $x, n$ the series does not converge.

Print/PDF Version

The General Binomial Theorem: Part 1

2016-07-20T15:53:00.000+05:30

Introduction

One of most basic algebraic formulas which a student encounters in high school curriculum is the following $$(a + b)^{2} = a^{2} + 2ab + b^{2}$$ and its variant for $(a - b)^{2}$. And after many exercises and problems later one encounters another formula of similar nature namely $$(a + b)^{3} = a^{3} + 3a^{2}b + 3ab^{2} + b^{3}$$ and one wonders if there are similar formulas for higher powers of $(a + b)$.

This optimism is amply rewarded and further down the mathematical curriculum one encounters a general formula for positive integral powers of $(a + b)$ and the result is important enough to deserve a name: The Binomial Theorem. The Binomial Theorem states that if $n$ is a positive integer then $$(a + b)^{n} = a^{n} + \binom{n}{1}a^{n - 1}b + \cdots + \binom{n}{r}a^{n - r}b^{r} + \cdots + \binom{n}{n - 1}ab^{n - 1} + b^{n}\tag{1}$$ and the right hand side above is sometimes written compactly as $$\sum_{r = 0}^{n}\binom{n}{r}a^{n - r}b^{r}$$ The symbol $\displaystyle \binom{n}{r}$ is called a binomial coefficient and is defined for all numbers $n$ and non-negative integers $r$ by $$\binom{n}{0} = 1,\, \binom{n}{r} = \frac{n(n - 1)(n - 2)\cdots(n - r + 1)}{r!}\text{ for }r \geq 1\tag{2}$$ The binomial theorem has an easy proof based on induction and it is purely an algebraic result.

Our topic of discussion today is an extension of the above result for the case when $n$ is not necessarily a positive integer. In that case the result is known by the name The General Binomial Theorem or Binomial Theorem for General Index and it transcends the powers of algebra and belongs more properly to the field of mathematical analysis. We turn to this powerful result next.

The General Binomial Theorem

The general binomial theorem does not try to deal with the expression $(a + b)^{n}$ for all values of $n, a, b$ rather it is a conditional result and is presented in the following form:

If $x, n$ are real numbers with $|x| < 1$ then $$(1 + x)^{n} = 1 + \binom{n}{1}x + \binom{n}{2}x^{2} + \cdots = \sum_{k = 0}^{\infty}\binom{n}{k}x^{k}\tag{3}$$ Thus the formula for $(1 + x)^{n}$ is no longer a finite expression but rather an infinite series and we have a condition that $|x| < 1$. The series involved is called the binomial series and it is absolutely convergent when $|x| < 1$ and divergent when $|x| > 1$. We will talk later about its behavior when $|x| = 1$. The proof of the above formula is difficult and it belongs to the infamous category of theorems whose proofs lie beyond the scope of the book/syllabus.

In order to prove the general binomial theorem we need two results from differential calculus:

$\displaystyle \lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} = na^{n - 1}$ for real $n, a$ and $a > 0$. This result is already established as one of the standard limits in an earlier post.
Taylor's Theorem with Cauchy's Form of Remainder: This we discuss next along with its proof.

Taylor's Theorem with Cauchy's Form of Remainder

We have already encountered a version of Taylor's theorem with Peano's form of remainder in an earlier post (see another proof available on MSE). Here we need a stronger version of Taylor's theorem and undoubtedly it needs stronger hypotheses to remain valid. We state the theorem below:

Taylor's Theorem: Let $n$ be a positive integer. If $f(x)$ is a function such that $f^{(n - 1)}(x)$ is continuous in $[a, a + h]$ and $f^{(n)}(x)$ exists in $(a, a + h)$ then $$f(a + h) = f(a) + hf'(a) + \frac{h^{2}}{2!}f''(a) + \cdots + \frac{h^{n - 1}}{(n - 1)!}f^{(n - 1)}(a) + R_{n}\tag{4}$$ where $$R_{n} = \frac{(1 - \theta)^{n - p}h^{n}f^{(n)}(a + \theta h)}{p(n - 1)!}\tag{5}$$ for some number $\theta \in (0, 1)$ and any chosen integer $p$ with $1 \leq p \leq n$.

The proof is based on Rolle's Theorem. Let us put $b = a + h$ and define a function $F$ by $$F(x) = f(b) - f(x) - (b - x)f'(x) - \cdots - \frac{(b - x)^{n - 1}}{(n - 1)!}f^{(n - 1)}(x)$$ and another function $G$ by $$G(x) = F(x) - \left(\frac{b - x}{b - a}\right)^{p}F(a)$$ where $p$ is some integer between $1$ and $n$. Clearly $G(x)$ is continuous on $[a, b]$ and differentiable on $(a, b)$ and we have $G(a) = G(b) = 0$. Therefore by Rolle's Theorem there is a value $c \in (a, b)$ such that $G'(c) = 0$. Moreover since $c \in (a, b)$ we can write $c = a + \theta(b - a)$ for some $\theta \in (0, 1)$ and thus $c = a + \theta h$. The derivative $G'(x)$ is given by $$G'(x) = \frac{p(b - x)^{p - 1}}{(b - a)^{p}}F(a) - \frac{(b - x)^{n - 1}}{(n - 1)!}f^{(n)}(x)$$ and $G'(c) = 0$ implies that $$\frac{p}{(b - a)^{p}}F(a) = \frac{(b - c)^{n - p}}{(n - 1)!}f^{(n)}(c)$$ or $$F(a) = \frac{(1 - \theta)^{n - p}h^{n}f^{(n)}(a + \theta h)}{p(n - 1)!} = R_{n}$$ Noting the value of $F(a)$ we see that the proof of the theorem is complete.

The term $R_{n}$ is said to be the remainder after $n$ terms in the Taylor's series expansion for $f(a + h)$. When $p = n$ we obtain the Lagrange's form of Remainder namely $$R_{n} = \frac{h^{n}}{n!}f^{(n)}(a + \theta h)\tag{Lagrange}$$ and if $p = 1$ then we get Cauchy's form of Remainder $$R_{n} = \frac{(1 - \theta)^{n - 1}h^{n}}{(n - 1)!}f^{(n)}(a + \theta h)\tag{Cauchy}$$ Note that the statement of Taylor's theorem and the proof above assumes that $h > 0$ but it is easily seen that it holds even if $h < 0$. A particular instance of this theorem is used to find infinite series expansion of certain functions. If we put $a = 0$ and $h = x$ in the Taylor's theorem we obtain the following result which goes by the name of Maclaurin's series: $$f(x) = f(0) = xf'(0) + \frac{x^{2}}{2!}f''(0) + \cdots + \frac{x^{n - 1}}{(n - 1)!}f^{(n - 1)}(0) + R_{n}\tag{6}$$ where $$R_{n} = \frac{(1 - \theta)^{n - p}x^{n}f^{(n)}(\theta x)}{p(n - 1)!}$$ for some $\theta \in (0, 1)$ and $p$ an integer between $1$ and $n$.

As can be seen the value of $R_{n}$ depends on $\theta$ as well $x$ and $p$. Also note that the value of $\theta$ itself is based on $x$. Normally for $p$ we use one of the two choices mentioned above (i.e. use Lagrange's or Cauchy's form of remainder). Let's then write $R_{n}(x)$ for $R_{n}$ assuming that a reasonable choice of $p$ has been made. If for certain values of $x$ we can ensure that $R_{n}(x) \to 0$ as $n \to \infty$ then from equation $(6)$ we obtain an infinite series for $f(x)$ as $$f(x) = f(0) + xf'(0) + \frac{x^{2}}{2!}f''(0) + \cdots + \frac{x^{n}}{n!}f^{(n)}(0) + \cdots\tag{7}$$ and this is valid for all those values of $x$ for which $R_{n}(x) \to 0$ as $n \to \infty$. In practice we usually show that for values of $x$ in a certain range and any $\theta \in (0, 1)$ the expression $R_{n}(x) \to 0$ as $n \to \infty$ and thus obtain the above Maclaurin's series for $f(x)$.

The easiest examples of such series are obtained for $f(x) = \sin x, \cos x, e^{x}$ as follows: \begin{align} \sin x &= x - \frac{x^{3}}{3!} + \frac{x^{5}}{5!} - \cdots + (-1)^{n}\frac{x^{2n + 1}}{(2n + 1)!} + \cdots\tag{8a}\\ \cos x &= 1 - \frac{x^{2}}{2!} + \frac{x^{4}}{4!} - \cdots + (-1)^{n}\frac{x^{2n}}{(2n)!} + \cdots\tag{8b}\\ e^{x} &= 1 + x + \frac{x^{2}}{2!} + \cdots + \frac{x^{n}}{n!} + \cdots\tag{8c} \end{align} and in each case it is easily proven that $R_{n}(x) \to 0$ as $n \to \infty$ for all real values of $x$. Therefore the above series expansions are valid for all real values of $x$.

Proof of The General Binomial Theorem

Now it is time to apply Taylor's theorem on function $f(x) = (1 + x)^{n}$ where $n, x$ are real numbers and $x > -1$ so that $1 + x > 0$. Clearly the limit formula $$\lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} = na^{n - 1}\tag{9}$$ mentioned above implies that the derivative of $g(x) = x^{n}$ is $g'(x) = nx^{n - 1}$ for real $n$ and $x > 0$. Therefore $f'(x) = n(1 + x)^{n - 1}$ and it is easily seen that $f^{(m)}(x)$ exists for all positive integers $m$ and $x > -1$. Moreover we have $$f^{(m)}(0) = n(n - 1)(n - 2)\cdots(n - m + 1) = m!\binom{n}{m}$$ and by Taylor's theorem we have $$f(x) = f(0) + xf'(0) + \cdots + \frac{x^{m - 1}}{(m - 1)!}f^{(m - 1)}(0) + R_{m}(x)$$ or $$(1 + x)^{n} = 1 + \binom{n}{1}x + \cdots + \binom{n}{m - 1}x^{m - 1} + R_{m}(x)\tag{10}$$ where we use the Cauchy's form of remainder \begin{align} R_{m}(x) &= \frac{(1 - \theta)^{m - 1}x^{m}f^{(m)}(\theta x)}{(m - 1)!}\notag\\ &= \frac{n(n - 1)\cdots(n - m + 1)}{(m - 1)!}\cdot\frac{(1 - \theta)^{m - 1}x^{m}}{(1 + \theta x)^{m - n}}\notag \end{align} If $|x| < 1$ then it is easily checked that the expression $\dfrac{1 - \theta}{1 + \theta x}$ lies in $(0, 1)$ for all $\theta \in (0, 1)$. Further $(1 + \theta x)^{n - 1}$ is less than $(1 + |x|)^{n - 1}$ if $n > 1$ and it is less than $(1 - |x|)^{n - 1}$ if $n < 1$. Hence $$|R_{m}(x)| < |n|(1 \pm |x|)^{n - 1}\left|\binom{n - 1}{m - 1}\right||x|^{m} = r_{m}\tag{11}$$ Clearly we can see that $$\lim\limits_{m \to \infty}\dfrac{r_{m + 1}}{r_{m}} = |x| < 1$$ therefore $r_{m} \to 0$ as $m \to \infty$. It follows from equation $(11)$ that $R_{m}(x) \to 0$ as $m \to \infty$ for all real values of $n$ and $x$ with $|x| < 1$. Therefore taking limits as $m \to \infty$ in equation $(10)$ we get the general binomial theorem $$(1 + x)^{n} = 1 + \binom{n}{1}x + \binom{n}{2}x^{2} + \cdots = \sum_{m = 0}^{\infty}\binom{n}{m}x^{m}$$ for all real $n, x$ with $|x| < 1$.

Note that our proof of the binomial theorem is based on the derivative formula $(x^{n})' = nx^{n - 1}$ which in turn is based on the limit formula $(9)$ and therefore one should not use binomial theorem in proving the derivative formula for $x^{n}$. However many calculus textbooks perform a kind of intellectual fraud by presenting the following proof of derivative formula for $x^{n}$: \begin{align} (x^{n})' &= \lim_{h \to 0}\frac{(x + h)^{n} - x^{n}}{h}\notag\\ &= \lim_{h \to 0}\frac{x^{n}(1 + (h/x))^{n} - x^{n}}{h}\notag\\ &= \lim_{h \to 0}\dfrac{x^{n}\left(1 + \dfrac{nh}{x} + \dfrac{n(n - 1)h^{2}}{2x^{2}} + \cdots\right) - x^{n}}{h}\notag\\ &= \lim_{h \to 0}nx^{n - 1} + \frac{n(n - 1)}{2}hx^{n - 2} + \cdots\notag\\ &= nx^{n - 1}\notag \end{align} The use of general binomial theorem is justified because as $h \to 0$ we can ensure that $|h/x| < 1$. The real problem lies in the circular nature of the proof above because the binomial theorem itself is proved using this derivative formula. Note that if $n$ is a positive integer then it is possible to use the binomial theorem to express $(x + h)^{n}$ as a finite sum and then obtain the same result.

We can salvage the above calculation of derivative of $x^{n}$ if we can somehow establish the binomial theorem for general index without the use of derivatives. Surprisingly it is possible to prove the general binomial theorem without using theorems of differential calculus and we will have a look at it in the next post.

Print/PDF Version

Theories of Circular Functions: Part 3

2016-03-08T17:06:00.001+05:30

Continuing our journey from last two posts we present some more approaches to the development of the theory of circular functions. One approach is based on the use of infinite series and requires basic knowledge of theory of infinite series. This approach is particularly well suited for treating circular functions as functions of a complex variable, but we will limit ourselves to the case of real variables only.

Before we start I should also mention in passing that it is possible to develop the theory of circular functions on the basis of infinite products also but dealing with infinite series is simpler and more popular.

Circular Functions as Infinite Series

We start with the definitions \begin{align} \sin x &= x - \frac{x^{3}}{3!} + \frac{x^{5}}{5!} - \frac{x^{7}}{7!} + \cdots\tag{1}\\ \cos x &= 1 - \frac{x^{2}}{2!} + \frac{x^{4}}{4!} - \frac{x^{6}}{6!} + \cdots\tag{2} \end{align} Since both the series above are convergent for all values of $x$ (check via ratio test), it follows that the functions $\sin x, \cos x$ are defined for all values of $x$. Moreover the above series belong to the category of power series (series of the form $\sum a_{n}x^{n}$), it follows that the functions defined above are continuous and differentiable in the interior of region of convergence of the series involved. Thus both $\sin x, \cos x$ are continuous as well as differentiable for all $x$. Since the power series can also be differentiated term by term it follows from the above definitions that $$(\sin x)' = \cos x,\, (\cos x)' = -\sin x\tag{3}$$ Let's now consider the function $$f(x) = \cos^{2}x + \sin^{2}x$$ We note that $$f'(x) = -2\cos x\sin x + 2\sin x\cos x = 0$$ so that $f(x)$ is constant and therefore $f(x) = f(0) = 1$ and we get $$\cos^{2}x + \sin^{2}x = 1\tag{4}$$ for all values of $x$. From the above identity it also follows that both $\sin x, \cos x$ are bounded and $$|\sin x| \leq 1,\,|\cos x| \leq 1\tag{5}$$ for all $x$ (a fact not obvious from the definitions $(1), (2)$).

Like we saw in the previous post it is easy to establish the addition formulas \begin{align} \sin(x \pm y) &= \sin x\cos y \pm \cos x\sin y\tag{6a}\\ \cos(x \pm y) &= \cos x\cos y \mp \sin x\sin y\tag{6b} \end{align} once the derivatives for $\sin, \cos$ are available.

In this approach based on the infinite series the real challenge is to introduce the number $\pi$. To that end let us estimate the value of $\cos 2$. Since the series for $\cos x$ is an alternating series it follows that $\cos x$ is less that the $n^{\text{th}}$ partial sum of the series if $n$ is odd and $\cos x$ is greater than the sum of $n^{\text{th}}$ partial sum of the series if $n$ is even. Thus we have $$\cos 2 < 1 - \frac{2^{2}}{2!} + \frac{2^{4}}{4!} = 1 - 2 + \frac{2}{3} = -\frac{1}{3} < 0$$ and clearly $\cos 0 = 1 > 0$ therefore by intermediate value theorem it follows that there is a value $\xi \in (0, 2)$ for which $\cos \xi = 0$. Further there is a first value of $\xi \in [0, 2]$ for which $\cos \xi = 0$ and $\cos x > 0$ for all $x \in [0, \xi)$. This value of $\xi$ is important in the development of the theory of circular functions and we define $\pi = 2\xi$ where $\xi$ is the first value of $x \in [0, 2]$ for which $\cos x = 0$.

Thus we have $\cos (\pi/2) = 0$ and $\cos x > 0$ for all $x \in [0, \pi/2)$. Since $(\sin x)' = \cos x$ it follows that $\sin x$ is strictly increasing in $[0, \pi/2]$ and hence $\sin x > \sin 0 = 0$ for all $x \in (0, \pi/2]$. It now follows from equation $(4)$ that $\sin (\pi/2) = 1$. Using addition formulas we can now prove that $\sin \pi = 0, \cos \pi = -1$ and further establish the formulas $$\cos \left(\frac{\pi}{2} - x\right) = \sin x, \sin \left(\frac{\pi}{2} - x\right) = \cos x\tag{7a}$$ $$\cos \left(\frac{\pi}{2} + x\right) = -\sin x, \sin \left(\frac{\pi}{2} + x\right) = \cos x\tag{7b}$$ $$\cos (x + \pi) = -\cos x, \sin (x + \pi) = -\sin x\tag{7c}$$ From these formulas it is easy to show that $\sin x, \cos x$ are periodic functions with period $2\pi$. Thus we complete the development of theory of circular functions based on the infinite series. Another not so common approach to the theory of circular functions defines them as the solutions to the differential equation $y'' + y = 0$ which we discuss next.

Circular Functions as Solutions to the Equation $y'' + y = 0$

A slightly unusual way to look at circular functions $\cos x, \sin x$ is to view them as the solutions to the differential equation $$\frac{d^{2}y}{dx^{2}} + y = 0$$ Putting $y = f(x)$ we can rewrite the equation as $$f''(x) + f(x) = 0\tag{8}$$ Also let us fix the initial conditions as $f(0) = 0, f'(0) = 1$ so that we are first dealing with the solution $f(x) = \sin x$. From equation $(8)$ it is obvious that the solution we are expecting is infinitely differentiable for all $x$ and using initial conditions we can get the values of $f^{(n)}(0)$ for all $n$ and using Taylor series for $f(x)$ we reach the infinite series for $\sin x$ presented earlier. However it is of interest to find the solution of differential equation $(8)$ without the use of the infinite series.

First we show that the initial conditions uniquely determine the solution. Thus we show that if $f(x)$ satisfies the differential equation $(8)$ and the initial conditions $f(0) = f'(0) = 0$ then $f(x) = 0$ for all values of $x$. The thing to note here is that the function $$g(x) = \{f'(x)\}^{2} + \{f(x)\}^{2}$$ has the derivative $$g'(x) = 2f'(x)f''(x) + 2f(x)f'(x) = 2f'(x)\{f''(x) + f(x)\} = 0$$ and hence $g(x) = g(0) = 0$ for all $x$. Thus $f'(x) = f(x) = 0$ for all $x$. So we are guaranteed that if a solution to the differential equation $(8)$ exists then it is unique and fully dependent on the initial values of $f(0)$ and $f'(0)$.

Next we solve the equation $(8)$ with the initial conditions $f(0) = 0, f'(0) = 1$. As before if $$g(x) = \{f'(x)\}^{2} + \{f(x)\}^{2}$$ then $g(x) = g(0) = 1$ for all $x$ and hence $$f'(x) = \sqrt{1 - \{f(x)\}^{2}}$$ (we choose $+$ sign for square root so that the relation holds true for values of $f(0)$ and $f'(0)$) and if we invert the equation $y = f(x)$ by $x = f^{-1}(y)$ then we get $$\frac{dy}{dx} = f'(x) = \sqrt{1 - y^{2}}$$ or $$\frac{dx}{dy} = \frac{1}{\sqrt{1 - y^{2}}}$$ and we are led to the function $$x = f^{-1}(y) = \int_{0}^{y}\frac{dt}{\sqrt{1 - t^{2}}}\tag{9}$$ and thus $y = f(x)$ is the inverse of this integral. The existence of this integral justifies the existence of the solution of the differential equation $(8)$. Note however that the integral in equation $(9)$ defines $x$ as a strictly monotone function of $y$ only in the interval $(-1, 1)$ and using improper integrals its definition is extended to the closed interval $[-1, 1]$.

We now introduce $\pi$ as $$\pi = 2\int_{0}^{1}\frac{dx}{\sqrt{1 - x^{2}}}\tag{10}$$ so that the function $x = f^{-1}(y)$ maps $[-1, 1]$ to $[-\pi/2, \pi/2]$. Thus the function $y = f(x)$ is strictly increasing and maps $[-\pi/2, \pi/2]$ to $[-1, 1]$. This unique solution to $$f''(x) + f(x) = 0,\,f(0) = 0,\,f'(0) = 1$$ is denoted by $\sin x$ and its derivative $f'(x)$ (which also satisfies $(8)$) is denoted by $\cos x$. So far these functions have been defined only in the interval $[-\pi/2, \pi/2]$. From the differential equation we obtain the relations $$(\sin x)' = \cos x,\, (\cos x)' = -\sin x$$ and using these derivatives it is easy to prove the addition formulas for $\sin, \cos$ provided the arguments for $\sin, \cos$ lie in the interval $[-\pi/2, \pi/2]$. It now make sense to use the addition formulas and extend the domain of definition of these functions via the following relations $$\sin \pi = 0, \cos \pi = -1, \sin (x + \pi) = -\sin x, \cos (x + \pi) = -\cos x\tag{11}$$ The functions $\sin x, \cos x$ are now defined for all values of $x$ and satisfy the differential equation $(8)$ everywhere. From the equation $(11)$ it also follows that these functions are periodic with period $2\pi$.

Once the functions $\sin x, \cos x$ are available we can find general solution to the differential equation $(8)$. Let $$h(x) = f(x) - f(0)\cos x - f'(0)\sin x$$ and then we have $h(0) = h'(0) = 0$ and $h''(x) + h(x) = 0$ so that $h(x)$ also satisfies the differential equation $(8)$ and as we proved before $h(x) = h(0) = 0$ for all $x$ and hence the solution to the equation $(8)$ is given by $$f(x) = f(0)\cos x + f'(0)\sin x$$ and we clearly see that the initial values $f(0), f'(0)$ determine the solution completely.

The above result regarding solution of $y'' + y = 0$ can also be used to establish addition formulas for $\sin x, \cos x$. Consider the function $f(x) = \sin (x + a)$ which clearly satisfies the differential equation $f''(x) + f(x) = 0$ and we have $f'(x) = \cos(x + a)$. Thus we have $$f(0) = \sin a, f'(0) = \cos a$$ By the argument in the previous paragraph we must have \begin{align} \sin (x + a) &= f(x)\notag\\ &= f(0)\cos x + f'(0)\sin x\notag\\ &= \sin a\cos x + \cos a\sin x\notag \end{align} which is the desired addition formula. The formula for $\cos(x + a)$ can also be derived in similar manner.

Print/PDF Version

Theories of Circular Functions: Part 2

2016-03-07T16:35:00.000+05:30

In the last post we covered the traditional approach towards the theory of circular functions which is based on geometric notions related to a circle. In my opinion this approach is the easiest to understand and therefore commonly described in almost any trigonometry textbook (but without the theoretical justification of length (and area) of arcs (and sectors). However it is interesting to also have an approach which is independent of any geometrical notions. In this post we will introduce the circular functions as inverses to certain integrals.

When using integrals to define inverse circular functions there are two possibilities: either start with $\arcsin x$ as the integral of $(1 - x^{2})^{-1/2}$ or $\arctan x$ as the integral of $(1 + x^{2})^{-1}$. We prefer the second approach as it is technically easier to handle.

$\arctan x$ as an Integral

Let us define the inverse tangent function denoted by symbol $\arctan$ (or $\tan^{-1}$) as follows $$\arctan x = \int_{0}^{x}\frac{dt}{1 + t^{2}}\tag{1}$$ From the above definition it is clear that $\arctan x$ is defined for all real $x$ and is a strictly increasing function of $x$. Since $$(\arctan x)' = \frac{1}{1 + x^{2}}$$ and $\arctan 0 = 0$ it follows that $$\lim_{x \to 0}\frac{\arctan x}{x} = 1\tag{2}$$ We introduce the number $\pi$ as $$\pi = 4\arctan 1 = 4\int_{0}^{1}\frac{dt}{1 + t^{2}}\tag{3}$$ We next show that $$\lim_{x \to \infty}\arctan x = \frac{\pi}{2}\tag{4}$$ Clearly we can see that \begin{align} \arctan x &= \int_{0}^{x}\frac{dt}{1 + t^{2}}\notag\\ &= \int_{0}^{1}\frac{dt}{1 + t^{2}} + \int_{1}^{x}\frac{dt}{1 + t^{2}}\notag\\ &= \frac{\pi}{4} + \int_{1/x}^{1}\frac{du}{1 + u^{2}}\text{ (by putting }u = 1/t)\notag\\ &= \frac{\pi}{4} + \int_{0}^{1}\frac{du}{1 + u^{2}} - \int_{0}^{1/x}\frac{du}{1 + u^{2}}\notag\\ &= \frac{\pi}{2} - \arctan(1/x)\tag{5} \end{align} Noting that $\arctan x$ is continuous and letting $x \to \infty$ we get $\lim_{x \to \infty}\arctan x = \dfrac{\pi}{2}$ and hence equation $(4)$ is established. Since $\arctan x$ is an odd function (because its derivative is even function and $\arctan 0 = 0$) it follows that $\lim_{x \to -\infty}\arctan x = -\dfrac{\pi}{2}$. It is now clear that $\arctan x$ maps the whole of $\mathbb{R}$ to the interval $(-\pi/2, \pi/2)$ and since it is strictly monotone, there is an inverse function denoted by symbol $\tan$ and we thus define the function $\tan x$ for $x \in (-\pi/2, \pi/2)$ by the equation $$\tan x = y \text{ if }\arctan y = x\tag{6}$$ It should be obvious now that the function $\tan x$ is continuous and differentiable for all $x \in (-\pi/2, \pi/2)$. Moreover $\tan x \to \infty$ as $x \to \pi/2$ and $\tan x \to -\infty$ as $x \to -\pi/2$. From equation $(2)$ we get $$\lim_{x \to 0}\frac{\tan x}{x} = 1\tag{7}$$ Also note that the equation $(5)$ can be recast in the form $$\tan\left(\dfrac{\pi}{2} - x\right) = \dfrac{1}{\tan x}\tag{8}$$ where $0 < x < \pi/2$.

The challenge now is to define the functions $\sin x, \cos x$ for all values of $x$. One can first try to define $\tan x$ for all $x$ by making it periodic with period $\pi$. By making $\tan x$ periodic with period $\pi$ we ensure that $\tan x$ is defined for all $x$ except when $x$ is an odd multiple of $\pi/2$. Next we define $\sin x, \cos x$ for $x \in (-\pi/2, \pi/2)$ by the formulas $$\sin x = \frac{\tan x}{\sqrt{1 + \tan^{2}x}},\,\cos x = \frac{1}{\sqrt{1 + \tan^{2}x}}\tag{9}$$ We immediately obtain the fundamental identity $$\cos^{2}x + \sin^{2}x = 1\tag{10}$$ for all $x \in (-\pi/2, \pi/2)$. From the above definition it is obvious that $\sin x \to 1$ as $x \to (\pi/2)^{-}$ and $\cos x \to 0$ as $x \to (\pi/2)^{-}$. Similarly $\sin x \to -1$ as $x \to (-\pi/2)^{+}$ and $\cos x \to 0$ as $x \to (-\pi/2)^{+}$. We thus define $$\sin\left(\frac{\pi}{2}\right) = 1,\,\sin\left(-\frac{\pi}{2}\right) = -1,\,\cos\left(\frac{\pi}{2}\right) = 0,\,\cos\left(-\frac{\pi}{2}\right) = 0\tag{11}$$ in order to make $\sin x, \cos x$ continuous in interval $[-\pi/2, \pi/2]$. To extend the definition of $\sin x, \cos x$ for all $x$ we further define $$\sin (x + \pi) = -\sin x,\,\cos(x + \pi) = -\cos x\tag{12}$$ It now follows that $\sin x, \cos x$ are defined for all $x$ and are periodic with period $2\pi$ (while $\tan x$ is periodic with period $\pi$). With the definitions set up as above we can see that the equation $(8)$ above leads us to $$\sin\left(\frac{\pi}{2} - x\right) = \cos x,\,\cos\left(\frac{\pi}{2} - x\right) = \sin x\tag{13}$$ for $0 < x < \pi/2$. Further we also note that $\sin x$ is an odd function and $\cos x$ is an even function. It is easy to show that the functions $\sin x, \cos x$ are continuous everywhere and therefore from equation $(7)$ we get $$\lim_{x \to 0}\frac{\sin x}{x} = 1\tag{14}$$
The circular functions defined above are differentiable wherever they are defined. The differentiability of $\tan x$ follows from that of $\arctan x$ and clearly if $\tan x = y$ then $x = \arctan y$ so that $$\frac{dx}{dy} = \frac{1}{1 + y^{2}} = \frac{1}{1 + \tan^{2}x} = \cos^{2}x$$ and therefore $$(\tan x)' = \frac{dy}{dx} = \frac{1}{\cos^{2}x}\tag{15}$$ for all $x \in (-\pi/2, \pi/2)$. Since $\tan (x + \pi) = \tan x$ and $\cos^{2}(x + \pi) = \cos^{2}x$ it follows that the formula $(15)$ holds for all $x$ where $\tan x$ is defined. It is now an easy matter to calculate the derivative of $\sin x$ and $\cos x$. Thus we have \begin{align} (\sin x)' &= \frac{d}{dx}\left(\frac{\tan x}{\sqrt{1 + \tan^{2}x}}\right)\notag\\ &= \dfrac{(1 + \tan^{2}x)\sqrt{1 + \tan^{2}x} - \tan x\cdot\dfrac{\tan x(1 + \tan^{2}x)}{\sqrt{1 + \tan^{2}x}}}{1 + \tan^{2}x}\notag\\ &= \frac{1}{\sqrt{1 + \tan^{2}x}}\notag\\ &= \cos x\notag \end{align} which is valid for all $x \in (-\pi/2, \pi/2)$. The same differentiation formula applies for $x = \pm\pi/2$ also. To check this let $f(x) = \sin x$ and we have \begin{align} f_{+}'(\pi/2) &= \lim_{h \to 0^{+}}\frac{\sin(\pi/2 + h) - \sin (\pi/2)}{h}\notag\\ &= \lim_{h \to 0^{+}}\frac{-\sin(\pi/2 + h - \pi) - 1}{h}\notag\\ &= \lim_{h \to 0^{+}}\frac{-\sin(-\pi/2 + h) - 1}{h}\notag\\ &= \lim_{h \to 0^{+}}\frac{\sin(\pi/2 - h) - 1}{h}\notag\\ &= \lim_{h \to 0^{+}}\frac{\cos h - 1}{h}\notag\\ &= -\lim_{h \to 0^{+}}\frac{1 - \cos^{2}h}{h(1 + \cos h)}\notag\\ &= -\frac{1}{2}\lim_{h \to 0^{+}}\frac{\sin^{2}h}{h^{2}}\cdot h\notag\\ &= -\frac{1}{2}\cdot 1^{2}\cdot 0\notag\\ &= 0 = \cos (\pi/2)\notag \end{align} and similarly we can show that $f_{-}'(\pi/2) = 0$. Thus we see that the formula $(\sin x)' = \cos x$ holds for all $x \in (-\pi/2, \pi/2]$ and due to equation $(12)$ it holds for all values of $x$. Similarly we can establish that $(\cos x)' = -\sin x$ for all values of $x$.

Addition Formulas for Circular Functions

The addition formulas for circular functions are a cakewalk once we have the derivatives available for circular functions. Let $x, y$ be real variables connected by the equation $x + y = k$ so that their sum is a constant. Then it follows that $\dfrac{dy}{dx} = -1$. We now consider the function $$f(x) = \sin x\cos y + \cos x\sin y$$ and observe that $$f'(x) = \cos x\cos y + \sin x\sin y - \sin x\sin y - \cos x\cos y = 0$$ and hence $f(x)$ is a constant independent of $x$. Thus $$f(x) = f(k) = \sin k\cos 0 + \cos k\sin 0 = \sin k = \sin (x + y)$$ It follows that we have $$\sin (x + y) = \sin x\cos y + \cos x\sin y$$ and similarly we can establish the formula $$\cos (x + y) = \cos x\cos y - \sin x\sin y$$ Changing the sign of $y$ in these formulas we finally obtain \begin{align} \sin(x \pm y) &= \sin x\cos y \pm \cos x\sin y\tag{16a}\\ \cos(x \pm y) &= \cos x\cos y \mp \sin x\sin y\tag{16b} \end{align} With these addition formulas available all the formulas of trigonometry can be established and the theory of circular functions as inverses to certain integrals is now complete.

Print/PDF Version

Theories of Circular Functions: Part 1

2016-03-02T14:19:00.000+05:30

While answering certain questions on MSE in last few weeks it occurred to me that ample confusion is prevalent among students (and instructors alike) regarding a theoretically sound development of circular (or trigonometric) functions. In the past I had hinted at two usual approaches to trigonometry, but I guess that was not enough and hence I am writing this series on the development of circular functions (like I did for the exponential and logarithmic functions earlier).

Like the case of logarithmic and exponential functions I will restrict myself to the theory of circular functions of real variables only. Thus the functions $\sin x, \cos x$ will be defined for all real $x$ in a systematic and sound manner. Based on these definitions their elementary properties (like addition formulas) and analytic properties (continuity and differentiability) will also be derived.

What is the $x$ in $\sin x$?

While dealing with functions of a real variable $x$ it is expected that given a real number $x$, it should be possible to find the value of the function at point $x$ in a definite manner so that for each given $x$ in a certain domain there is a unique value of the function. Sometimes the value of the function is obtained by the use of algebraic manipulations of $x$ with other numbers, but when dealing with circular functions like $\sin x, \cos x$ the way to obtain $\sin x, \cos x$ given the value of $x$ is very very very (!) indirect. And that is the whole problem with these functions. Beginners are almost always kept in the dark about "What is the $x$ in $\sin x$?". It appears that the only way to figure out the $x$ in $\sin x$ is to talk something about circles first (that's the reason for the name circular functions!) and then define $\sin x, \cos x$ in terms of quantities related to a circle. Thus we start off with a unit circle with center at origin whose equation is given by $x^{2} + y^{2} = 1$.

In the above figure $A$ is the point where the circle meets the positive half of the $X$-axis and $P$ is an arbitrary point on the circle (and for purposes of illustration we have taken $P$ to be in the first quadrant). The introduction of the point $P$ and $A$ on the circle leads to two important things which we need here: the first is the arc $AP$ and second sector $AOP$. Using integral calculus it is easy to prove that the arc $AP$ possesses a length and the sector $AOP$ has an area. If the point $P$ has coordinates $(a, b)$ then it is easy to show that \begin{align} L &= \text{length of arc }AP = \int_{a}^{1}\frac{dx}{\sqrt{1 - x^{2}}}\tag{1}\\ A &= \text{area of sector }AOP\notag\\ &= \text{area of }\Delta OPB + \text{area of region }APB\notag\\ &= \frac{ab}{2} + \int_{a}^{1}\sqrt{1 - x^{2}}\,dx\tag{2} \end{align} Note that the existence of the area of sector $AOP$ is dependent on the continuity of the function $f(x) = \sqrt{1 - x^{2}}$ on interval $[0, 1]$ and the existence of length of arc $AP$ follows from the fact that the function $f(x) = \sqrt{1 - x^{2}}$ is of bounded variation on $[0, 1]$ (because it is monotone on that interval). We now show that there is a direct relationship between the length of arc $AP$ and the area of sector $AOP$ namely $L = 2A$ so that the length of an arc of unit circle is twice the area of the corresponding sector. This fact is a direct consequence of the equation of unit circle (i.e. it is an inherent property of a circle) and it has nothing to do with the nature of circular functions (to be defined later).

Clearly we have via integration by parts \begin{align} \int\sqrt{1 - x^{2}}\,dx &= \int 1\cdot\sqrt{1 -x^{2}}\,dx\notag\\ &= x\sqrt{1 - x^{2}} - \int x\cdot\frac{-x}{\sqrt{1 - x^{2}}}\,dx\notag\\ &= x\sqrt{1 - x^{2}} - \int \frac{1 - x^{2} - 1}{\sqrt{1 - x^{2}}}\,dx\notag\\ &= x\sqrt{1 - x^{2}} - \int \sqrt{1 - x^{2}}\,dx + \int \frac{1}{\sqrt{1 - x^{2}}}\,dx\notag\\ \Rightarrow 2\int\sqrt{1 - x^{2}}\,dx &= x\sqrt{1 - x^{2}} + \int \frac{1}{\sqrt{1 - x^{2}}}\,dx\notag\\ \Rightarrow \int\sqrt{1 - x^{2}}\,dx &= \frac{x\sqrt{1 - x^{2}}}{2} + \frac{1}{2}\int \frac{1}{\sqrt{1 - x^{2}}}\,dx\notag\\ \Rightarrow \int_{a}^{1}\sqrt{1 - x^{2}}\,dx &= -\frac{a\sqrt{1 - a^{2}}}{2} + \frac{1}{2}\int_{a}^{1} \frac{1}{\sqrt{1 - x^{2}}}\,dx\notag\\ \Rightarrow \int_{a}^{1}\frac{dx}{\sqrt{1 - x^{2}}} &= 2\left(\frac{ab}{2} + \int_{a}^{1}\sqrt{1 - x^{2}}\,dx\right)\notag \end{align} From equations $(1)$ and $(2)$ it is now obvious that $L = 2A$.

Both $L$ and $A$ are dependent on the position of point $P$ on the circle. Let's make the convention that when the length of arc $AP$ is measured anti-clockwise starting from $A$ to $P$ then the arc length $L$ is positive. If the length of arc $AP$ is measured clockwise from starting from $A$ to $P$ then the arc length $L$ is negative. Also let the same convention hold for the area of sector $AOP$ so that the relation $L = 2A$ holds for both positive and negative values of $L$ and $A$. Further we can see that as the point $P$ moves in anti-clockwise direction from point $A$ the corresponding arc length $L$ increases and once the point $P$ completes one revolution and reaches back to the starting point $A$ the arc length $L$ reaches its maximum value namely the circumference of the circle. Let's again make the convention that the point $P$ is allowed to make as many revolutions as required so that $P$ can traverse the full circumference a multiple number of times. This will ensure that we can have the value of arc length $L$ as large as we want. Similarly these revolutions can be made in clockwise direction also to ensure that the arc length $L$ can have large negative values. Thus by convention we allow the length $L$ of arc $AP$ (or area $A$ of sector $AOP$) to take any real value based on position of point $P$ on unit circle. In so doing we also observe that there can be multiple values of $L$ or $A$ for the same position of $P$ based on the number of revolutions of point $P$ (in clockwise or anti-clockwise direction).

Thus although the dependence of $L$ on position of $P$ is direct, it does not match the definition of a functional dependence. The magic happens when we invert the dependence of $L$ on position of $P$ and start to think of the position of point $P$ as a function of the arc length $L$. We thus arrive at the definition of circular functions $\sin x, \cos x$. More formally, let $x$ be a real number and based on the sign of $x$ let point $P$ move on unit circle (in anti-clockwise direction if $x > 0$ and in clockwise direction if $x < 0$) starting with point $A(1, 0)$ and reach a position such that the length of arc $AP$ is $x$ (or the area of sector $AOP$ is $x/2$). Then by definition the point $P$ has the coordinates $(\cos x, \sin x)$.

This is the way some elementary textbooks define the circular functions, but they typically miss the justification of area of sectors and length of arcs. Also the fundamental relation between the area of a sector and the length of corresponding arc is not established directly, but rather this link is shown via the use of analytic properties of circular functions (for more details see my answer on MSE). Further as we had discussed earlier, the length of arc $AP$ is also taken to be the measure of angle $AOP$ and we sometimes say that $\sin x$ is the sine of angle $AOP$. A definition of circular functions is not complete unless we introduce the number $\pi$ (like the definition of logarithm is not complete without the introduction of number $e$). We define $\pi$ as the area of the unit circle so that the circumference of the unit circle is by definition $2\pi$. This is equivalent to the following integral formulas $$\pi = 4\int_{0}^{1}\sqrt{1 - x^{2}}\,dx = 2\int_{0}^{1}\frac{dx}{\sqrt{1 - x^{2}}}\tag{3}$$

Elementary Properties of Circular Functions

Since the point $P(\cos x, \sin x)$ lies on the unit circle it follows from the equation of the circle that $$\cos^{2}x + \sin^{2}x = 1\tag{4}$$ With the definition of $\pi$ available we have the following results available immediately $$\cos 0 = 1, \sin 0 = 0, \cos \frac{\pi}{2} = 0, \sin \frac{\pi}{2} = 1\tag{5a}$$ $$\cos \pi = -1, \sin \pi = 0, \cos \frac{3\pi}{2} = 0, \sin\frac{3\pi}{2} = -1\tag{5b}$$ $$\cos 2\pi = 1, \sin 2\pi = 0\tag{5c}$$ Further note that if the point $P$ moves in anti-clockwise direction on unit circle and reaches a position such that length of arc $AP$ is $x$ and if $P'$ is the position when point $P$ moves the same distance in clockwise direction so that length of arc $AP'$ is $-x$ then both the points $P$ and $P'$ have same $x$-coordinate, but their $y$-coordinates are of opposite signs. Hence we have $$\sin (-x) = -\sin x, \cos (-x) = \cos x\tag{6}$$ Further note that if point $P$ on unit circle has coordinates $(a, b)$ and we rotate $P$ by a right angle in anti-clockwise direction then $P$'s coordinates change to $(-b, a)$. It thus follows that $$\cos\left(\frac{\pi}{2} + x\right) = -\sin x, \sin\left(\frac{\pi}{2} + x\right) = \cos x\tag{7}$$ Using $(6)$ and $(7)$ together we get $$\cos\left(\frac{\pi}{2} - x\right) = \sin x, \sin\left(\frac{\pi}{2} - x\right) = \cos x\tag{8}$$ Using these formulas repeatedly it is possible to compute $\cos(n(\pi/2) \pm x), \sin (n(\pi/2) \pm x)$ where $n$ is an integer. Moreover we should note that $\cos x$ is positive if $x$ lies in first or fourth quadrant and $\sin x$ is positive if $x$ lies in first or second quadrant. From the above formulas we also conclude that $\sin x, \cos x$ are both periodic with period $2\pi$. Next we establish the addition formulas for the circular functions mentioned below \begin{align} \cos(a \pm b) &= \cos a\cos b \mp \sin a\sin b\tag{9a}\\ \sin(a \pm b) &= \sin a\cos b \pm \cos a\sin b\tag{9b} \end{align} Consider points $P_{1} (\cos a, \sin a), P_{2}(\cos b, \sin b)$ on the unit circle. Then from elementary geometry we know that $$P_{1}P_{2} = \sqrt{2 - 2\cos (a - b)}$$ and from distance formula we know that $$P_{1}P_{2} = \sqrt{(\cos a - \cos b)^{2} + (\sin a - \sin b)^{2}}$$ Equating the above expressions for $P_{1}P_{2}$ we get $$2 - 2\cos(a - b) = \cos^{2}a + \sin^{2}a + \cos^{2}b + \sin^{2}b - 2\{\cos a\cos b + \sin a\sin b\}$$ or $$\cos (a - b) = \cos a\cos b + \sin a\sin b$$ Changing the sign of $b$ and using equation $(6)$ we get the formula for $\cos(a + b)$. Replacing $a$ with $(\pi/2) - a$ in equation $(9a)$ we get the equation $(9b)$ and thus the addition formulas for the circular functions are established.

Analytic Properties of Circular Functions

Next we establish that the circular functions $\cos x, \sin x$ are continuous and differentiable everywhere and in so doing we will also find their derivatives. In order to proceed further we establish the inequality $$\sin x < x < \tan x\tag{10}$$ for $0 < x < \pi/2$ where $\tan x$ is the tangent function defined by $\tan x = \dfrac{\sin x}{\cos x}$. Note that the tangent function is defined for all values of $x$ for which $\cos x \neq 0$. For values of $x$ where $\cos x = 0$ the function $\tan x$ is not defined. Consider the following figure

where $P(\cos x, \sin x)$ is a point on the unit circle and $P$ lies in the first quadrant so that $0 < x < \pi/2$. Let $AT$ be the tangent to circle at point $A$ and let $OP$ intersect $AT$ in $T$. Since $PB$ is perpendicular to $OA$ it follows that the triangles $OPB$ and $OTA$ are similar and hence $$TA = \frac{TA}{OA} = \frac{PB}{OB} = \frac{\sin x}{\cos x} = \tan x$$ Now consider the areas of $\Delta OAP, \Delta OAT$ and sector $OAP$. From the figure it is obvious that $$\text{area of }\Delta OAP < \text{area of sector }OAP < \text{area of }\Delta OAT$$ or $$\frac{1}{2}\cdot OA\cdot PB < \frac{x}{2} < \frac{1}{2}\cdot OA \cdot TA$$ or $$\sin x < x < \tan x$$ Thus we have $0 < \sin x < x$ for $0 < x < \pi/2$ and taking limits when $x \to 0^{+}$ we get $\lim_{x \to 0^{+}}\sin x = 0$. Since $\sin (-x) = -\sin x$ it follows now that $$\lim_{x \to 0}\sin x = 0\tag{11a}$$ Further we can see that when $-\pi/2 < x < \pi/2$ then $\cos x$ is positive and thus $\cos x = \sqrt{1 - \sin^{2}x}$ and hence $$\lim_{x \to 0}\cos x = 1\tag{11b}$$ We can now prove that $\sin x, \cos x$ are continuous everywhere. Clearly we have \begin{align} \lim_{x \to a}\sin x &= \lim_{h \to 0}\sin (a + h)\notag\\ &= \lim_{h \to 0}\sin a\cos h + \cos a\sin h\notag\\ &= \sin a \cdot 1 + \cos a\cdot 0\notag\\ &= \sin a\notag \end{align} so that $\sin x$ is continuous everywhere and continuity of $\cos x$ is proved in exactly the same manner. To obtain derivatives of circular functions we establish the fundamental limit $$\lim_{x \to 0}\frac{\sin x}{x} = 1\tag{12}$$ From inequality $(10)$ we get $$\sin x < x < \frac{\sin x}{\cos x}$$ or $$\cos x < \frac{\sin x}{x} < 1$$ for $0 < x < \pi/2$. Taking limits when $x \to 0^{+}$ we get $\lim\limits_{x \to 0^{+}}\dfrac{\sin x}{x} = 1$. Also since $(\sin (-x))/(-x) = (\sin x)/x$, it follows that the same result holds when $x \to 0^{-}$. Thus the limit formula $(12)$ is established. We now calculate the derivative of $\sin x$ as follows \begin{align} (\sin x)' &= \lim_{h \to 0}\frac{\sin (x + h) - \sin x}{h}\notag\\ &= \lim_{h \to 0}\frac{\sin x\cos h + \cos x\sin h - \sin x}{h}\notag\\ &= \lim_{h \to 0}\sin x \cdot\frac{\cos h - 1}{h} + \cos x\cdot\frac{\sin h}{h}\notag\\ &= \sin x\lim_{h \to 0}\frac{\cos^{2} h - 1}{h(\cos h + 1)} + \cos x\cdot 1\notag\\ &= \cos x - \sin x\lim_{h \to 0}\frac{\sin^{2}h}{h^{2}}\cdot\frac{h}{1 + \cos h}\notag\\ &= \cos x - \sin x \cdot 1\cdot\frac{0}{1 + 1}\notag\\ &= \cos x\notag \end{align} Similarly we can show that $(\cos x)' = -\sin x$. We have now established both the algebraic and analytic properties of circular functions and the development of theory of circular functions is complete. In the next post we will discuss other approaches (not based on geometric notions) to develop the theory of circular functions.

Print/PDF Version

Irrationality of exp(x)

2015-08-04T11:52:00.000+05:30

In one of the earlier posts we indicated that Johann H. Lambert proved the irrationality of $\exp(x)$ or $e^{x}$ for non-zero rational $x$ by means of continued fraction expansion of $\tanh x$. In this post we provide another proof for irrationality of $e^{x}$ which is based on a completely different approach. I first read this proof from Carl Ludwig Siegel's wonderful book Transcendental Numbers and I was amazed by the simplicity and novelty of Siegel's argument.

This proof by Siegel is based on approximation of $\exp(x)$ via rational functions (i.e. ratio of polynomials). Such approximants are now famously known as Padé approximants. Here we will not develop the full theory of Padé approximation, but rather concentrate on obtaining one such approximation for $e^{x}$.

Padé approximation for $\exp(x)$

The basic idea in approximating $\exp(x)$ by a rational function is to choose two polynomials $p(x), q(x)$ such that when the quotient $p(x)/q(x)$ is expanded as a Taylor series in powers of $x$ then it should match upto a pre-determined number of terms of the Taylor series for $\exp(x)$. Specifically we choose two polynomials $A(x), B(x)$ of degree $n$ such that $$\exp(x) + \frac{A(x)}{B(x)} = ax^{2n + 1} + \cdots$$ so that the Taylor series for the rational function $E(x) = -\dfrac{A(x)}{B(x)}$ matches till first $(2n + 1)$ terms of the Taylor series for $\exp(x)$. Also the error of this approximation is determined by $R(x) = B(x)e^{x} + A(x)$ and clearly we have $$R(x) = B(x)e^{x} + A(x) = cx^{2n + 1} + \cdots\tag{1}$$ Our main objective is to determine these polynomials $A(x), B(x)$ such that they have integer coefficients and also to obtain effective bounds for the error $R(x)$.

Since each polynomial $A(x), B(x)$ is of degree $n$ their determination involves calculating $2 (n + 1) = 2n + 2$ coefficients. The criterion that Taylor series for $E(x) = -A(x)/B(x)$ should match till first $(2n + 1)$ terms of the Taylor series for $e^{x}$ gives us $(2n + 1)$ linear equations to calculate these $(2n + 2)$ coefficients and hence we are assured of a non-trivial solution. Also there are more coefficients to be found than the number of equations available and hence there will be no unique solution. This is obvious directly also: if $A(x), B(x)$ do our job then any constant multiple of them say $rA(x), rB(x)$ also do the job because the ratio of these polynomials is unchanged. This constant $r$ will be fixed later (in an implicit manner) to suit our needs.

However we don't compute the coefficients of $A(x), B(x)$ by solving linear equations but rather follow Siegel's beautiful approach based on algebra of differential operators. Let us denote the process of calculating derivative by an operator $\textbf{D}$ so that $$\textbf{D}f(x) = \frac{d}{dx}f(x) = f'(x) \tag{2}$$ Multiple applications of this operator will be denoted via powers of $\textbf{D}$ so that $$\textbf{D}^{2}f(x) = \textbf{D}\{\textbf{D}f(x)\} = \textbf{D}f'(x) = f''(x)$$ We further define a polynomial in $\textbf{D}$ as an operator like $$P(\textbf{D}) = a_{0} + a_{1}\textbf{D} + \cdots + a_{n - 1}\textbf{D}^{n - 1} + a_{n}\textbf{D}^{n}$$ such that $$P(\textbf{D})f(x) = a_{0}f(x) + a_{1}f'(x) + \cdots + a_{n - 1}f^{(n - 1)}(x) + a_{n}f^{(n)}(x)\tag{3}$$ Since powers of $\textbf{D}$ commute with each other it is possible to multiply two such polynomial operators $P(\textbf{D}), Q(\textbf{D})$ to get a single polynomial operator via the usual rules of multiplying polynomials.

With the basics in differential operator symbolism available, we can now observe that $$\textbf{D}e^{x}f(x) = e^{x}f'(x) + e^{x}f(x) = e^{x}\{f(x) + f'(x)\} = e^{x}(1 + \textbf{D})f(x)$$ and using the same rule multiple times we get $$\textbf{D}^{n}e^{x}f(x) = e^{x}(1 + \textbf{D})^{n}f(x)\tag{4}$$ We now apply the operator $\textbf{D}^{n + 1}$ to equation $(1)$ and noting that $A(x)$ is a polynomial of degree $n$ we get \begin{align} \textbf{D}^{n + 1}R(x) &= \textbf{D}^{n + 1}e^{x}B(x) + \textbf{D}^{n + 1}A(x)\notag\\ &= e^{x}(1 + \textbf{D})^{n + 1}B(x)\notag\\ &= c_{0}x^{n} + \cdots\tag{5} \end{align} where $c_{0} = (2n + 1)(2n)(2n - 1)\cdots (n + 1)c$. It now follows that $$(1 + \textbf{D})^{n + 1}B(x) = e^{-x}\{c_{0}x^{n} + \cdots\} = c_{0}x^{n}$$ because the expression $(1 + \textbf{D})^{n + 1}B(x)$ is a polynomial of degree at most $n$. We now choose $c_{0} = 1$ to make our calculations simple (this will fix the $r$ implicitly and thereby polynomials $A(x), B(x)$ will be determined uniquely). We thus have $$B(x) = (1 + \textbf{D})^{- n - 1}x^{n}\tag{6}$$ What's that!! We haven't yet defined negative powers of differential operators so the above equation seems meaningless, but the beauty of this operator algebra is that using binomial theorem for general exponent we can express $(1 + \textbf{D})^{-n - 1}$ into a power series in $\textbf{D}$ and apply it on $x^{n}$ to get $B(x)$. Also we need to have this series only till $\textbf{D}^{n}$ because higher powers of $\textbf{D}$ applied on $x^{n}$ will lead to $0$. Hence $B(x)$ will have integer coefficients.

This highly intuitive but non-rigorous argument is very clever and can be made rigorous by learning more of operator algebra. However we don't follow that route and show directly that $B(x)$ has integer coefficients. Also when we proceed in this manner we need to remember that equation $(6)$ is just another way to express that $(1 + \textbf{D})^{n + 1}B(x) = x^{n}$. In the same manner multiplying equation $(1)$ by $e^{-x}$ and applying $\textbf{D}^{n + 1}$ on the resulting equation we get $$(-1 + \textbf{D})^{n + 1}A(x) = x^{n}$$ which we express more fashionably as $$A(x) = (-1 + \textbf{D})^{-n - 1}x^{n}\tag{7}$$ We now proceed to analyze the coefficients of $B(x)$. Let $$B(x) = B_{0} + B_{1}x + \cdots + B_{n}x^{n}$$ and from $(1 + \textbf{D})^{n + 1}B(x) = x^{n}$ we get $$\left(1 + (n + 1)\textbf{D} + \frac{n(n + 1)}{2}\textbf{D}^{2} + \cdots + (n + 1)\textbf{D}^{n}\right)(B_{0} + B_{1}x + \cdots + B_{n}x^{n}) = x^{n}$$ Clearly the term containing $x^{n}$ on LHS is $B_{n}x^{n}$ so that $B_{n} = 1$. Similarly the terms containing $x^{n - 1}$ on LHS are $B_{n - 1}x^{n - 1}$ and $(n + 1)\textbf{D}B_{n}x^{n}$ so that $B_{n - 1} + n(n + 1)B_{n} = 0$ and hence $B_{n - 1} = -n(n + 1)$. Thus we note that the coefficients $B_{i}$ can be calculated starting with $B_{n}$ and then evaluating $B_{n - 1}, B_{n - 2},\dots$ and so on. Also the equation to determine $B_{i}$ (except $B_{n}$) is always of the form $$B_{i} + b_{1}B_{i + 1} + \cdots + b_{n - i}B_{n} = 0$$ where $b_{1}, b_{2}, \ldots, B_{i + 1}, B_{i + 2}, \ldots$ are integers. It follows that all the $B_{i}$ are integers. In the same manner we can show that the polynomial $A(x)$ also has integer coefficients.

Estimation of error term $R(x)$

From equations $(5)$ and $(6)$ we can see that $$\textbf{D}^{n + 1}R(x) = e^{x}(1 + \textbf{D})^{n + 1}B(x) = e^{x}x^{n}$$ and hence $R(x) = \textbf{D}^{-n - 1}e^{x}x^{n}$. Fortunately it is much easier to handle negative powers of $\textbf{D}$ than to handle expressions like $(1 + \textbf{D})^{-n - 1}x^{n}$ encountered earlier. We define the integral operator $\textbf{J}$ as $$\textbf{J}f(x) = \int_{0}^{x}f(t)\,dt\tag{8}$$ It is easily seen that $\textbf{D}\textbf{J}f(x) = f(x)$ and if we have the extra assumption that $f(0) = 0$ then $\textbf{J}\textbf{D}f(x) = f(x)$ and hence for functions with $f(0) = 0$ both the operators $\textbf{D}$ and $\textbf{J}$ are inverses of each other and we can write $\textbf{D}\textbf{J} = \textbf{J}\textbf{D} = 1$. Clearly we can define powers of $\textbf{J}$ as repeated application of $\textbf{J}$ and powers of $\textbf{J}$ and $\textbf{D}$ commute with each other. Thus it follows that $$R(x) = \textbf{J}^{n + 1}e^{x}x^{n}\tag{9}$$ Using integration by parts it can be easily shown that powers of $\textbf{J}$ can also be expressed as an integral and we have $$\textbf{J}^{n + 1}f(x) = \frac{1}{n!}\int_{0}^{x}f(t)(x - t)^{n}\,dt\tag{10}$$ where $f(0) = 0$. Using equations $(9), (10)$ we finally have an expression for the error term $R(x)$ as \begin{align} R(x) &= \frac{1}{n!}\int_{0}^{x}e^{t}t^{n}(x - t)^{n}\,dt\notag\\ &= \frac{x^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}e^{xt}\,dt\text{ (using }t = xu, u = t)\tag{11} \end{align} From the above equation it follows that if $x \neq 0$ then $R(x) \neq 0$.

Irrationality of $\exp(x)$

In order to prove that $e^{x}$ is irrational for non-zero rational $x$ it is sufficient to prove that $e^{x}$ is irrational when $x$ is a positive integer say $x = m$. Let us assume that $e^{m} = p/q$ where $p, q$ are positive integers with no common factors. Now from equation $(1)$ we have $$R(m) = e^{m}B(m) + A(m) = \frac{p}{q}\cdot B(m) + A(m)$$ and hence $$qR(m) = pB(m) + qA(m)$$ Since polynomials $A(x), B(x)$ have integer coefficients it follows that the RHS of the above equation is an integer and since $m \neq 0$ it follows that $R(m) \neq 0$ therefore $qR(m)$ is a non-zero integer and hence $|qR(m)| \geq 1$. Now we can see from equation $(11)$ that $$|qR(m)| \leq q\cdot\frac{m^{2n + 1}}{n!}\cdot e^{m} = p \cdot\frac{m^{2n + 1}}{n!}$$ Clearly we can choose the integer $n$ as large as we please (increasing $n$ increases the accuracy of Padé approximation) and since $m^{2n + 1}/n! \to 0$ as $n \to \infty$ it is possible to choose a positive integer $n$ (depending on $m, p$) such that $p \cdot\dfrac{m^{2n + 1}}{n!} < 1$ so that $|qR(m)| < 1$. This is contrary to the fact that $|qR(m)| \geq 1$ and hence $e^{m}$ must be irrational. We have thus shown that

Theorem: If $x$ is a non-zero rational number then $e^{x}$ is irrational.

The above technique of Siegel can also be used to establish the irrationality of $\pi^{2}$ (and hence irrationality of $\pi$) with very limited amount of additional work. This we do next.

Irrationality of $\pi^{2}$

In order to use the material presented so far to get any information about nature of $\pi$ we must be able to connect $\pi$ somehow with the exponential function $e^{x}$. Fortunately Euler did it for us a long time ago and we have the beautiful equation $$e^{i\pi} + 1 = 0\tag{12}$$ Putting $x = i\pi$ in equation $(11)$ we get \begin{align} R(i\pi) &= (-1)^{n}i\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}e^{i\pi t}\,dt\notag\\ &= (-1)^{n}i\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\{\cos \pi t + i\sin \pi t\}\,dt\notag\\ &= (-1)^{n}i\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\cos \pi t\,dt\notag\\ &\,\,\,\,\,\,\,\,+ (-1)^{n + 1}\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\sin \pi t\,dt\notag\\ &= 0 + (-1)^{n + 1}\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\sin \pi t\,dt\notag\\ &\,\,\,\,\,\,\,\,\text{ (as }\cos \pi(1 - t) = -\cos \pi t)\notag\\ &= (-1)^{n + 1}\frac{\pi^{2n + 1}}{n!}\int_{0}^{1}t^{n}(1 - t)^{n}\sin \pi t\,dt\tag{13} \end{align} It follows that $R(i\pi)$ is a non-zero real number for all positive integers $n$. Next we need to analyze the expression of $R(i\pi)$ in terms of polynomials $A(x), B(x)$. Before we do that we need one relation between $A(x), B(x)$. Replacing $x$ with $(-x)$ in the equation $$(1 + \textbf{D})^{n + 1}B(x) = x^{n}$$ and noting that this changes $\textbf{D}$ into $-\textbf{D}$ as well we get $$(1 - \textbf{D})^{n + 1}B(-x) = (-1)^{n}x^{n}$$ or $$(-1)^{n + 1}(-1 + \textbf{D})^{n + 1}B(-x) = (-1)^{n}x^{n}$$ or $$(-1 + \textbf{D})^{n + 1}(-B(-x)) = x^{n}$$ Comparing with the equation $(-1 + \textbf{D})^{n + 1}A(x) = x^{n}$ we see that $B(-x) = -A(x)$ and hence $A(-x) = -B(x)$. Next we have \begin{align} R(x) &= A(x) + B(x)e^{x}\notag\\ R(-x) &= A(-x) + B(-x)e^{-x}\notag\\ &= -B(x) - A(x)e^{-x}\notag \end{align} and hence on putting $x = i\pi$ we get \begin{align} R(i\pi) &= A(i\pi) - B(i\pi)\notag\\ R(-i\pi) &= -B(i\pi) + A(i\pi)\notag \end{align} and we finally have $$R(i\pi) = R(-i\pi) = A(i\pi) + A(-i\pi)$$ It is easy to observe that the polynomial $C(x) = A(x) + A(-x)$ consists of only even powers of $x$ and hence it is effectively a polynomial $D(x^{2})$ in $x^{2}$ of degree $k = [n/2]$ with integer coefficients. And $R(i\pi) = D(-\pi^{2})$ which again shows that $R(i\pi)$ is a real number.

Let's now suppose that $\pi^{2} = a/b$ where $a, b$ are positive integers with no common factor. It is then clear that $R(i\pi) = D(-\pi^{2})$ is a rational number with denominator $b^{k}$ and hence $b^{k}R(i\pi)$ is an integer and from our expression of $R(i\pi)$ as an integral we see that the expression $b^{k}R(i\pi)$ is a non-zero integer and hence $\left|b^{k}R(i\pi)\right| \geq 1$ for all positive integers $n$. From equation $(13)$ we can see that $$\left|b^{k}R(i\pi)\right| \leq b^{k}\cdot\frac{\pi^{2n + 1}}{n!} \leq b^{n/2}\cdot\frac{\pi^{2n + 1}}{n!}$$ and the RHS can be made less than $1$ if we choose $n$ sufficiently large. Hence we arrive at a contradiction if $n$ is chosen suitably. This proves that we can't have $\pi^{2} = a/b$ for any positive integers $a, b$.

Note: For another proof of irrationality of $e^{x}$ for non-zero rational $x$ see this question on MSE.

Print/PDF Version

Measuring An Angle

2014-11-02T14:04:00.001+05:30

Today we will focus on a topic from elementary geometry namely the concept of measurement of angles. The idea of an angle is a simple one in the sense that it is made by two rays emanating from the same point. But the measurement of angles is not that simple as it appears. Many theorems in elementary geometry deal with ideas which involve the concept of measurement of angles but they assume the understanding of this measurement in an implicit fashion.

Measuring Angles

For example we have the properties of a transversal cutting two parallel lines and the corresponding as well as alternate angles being equal. What do we mean when we say that the two corresponding angles (in the last statement) are equal? Well luckily the concept of equality (or congruence) of geometrical shapes is much easier to define and comprehend. Two shapes are equal if they can be exactly superimposed on one another. Thus for two angles to be equal we need to note that when we superimpose one angle on the other the vertices as well as the rays of both the angles must coincide exactly.

In the same manner by superimposing angles such that the vertices and at least one of the rays of both angles coincide we can compare two angles and determine which of the angles is greater or smaller. This is similar to comparing two line segments by the use of a divider. However it turns out that practical applications of geometry not only require us to compare line segments but rather also measure them in a precise manner. Similar is the case with angles. A lot of practical applications of geometry (and more properly trigonometry) require us to measure angles in a precise manner.

Naturally measuring anything requires two fundamental ideas:

Defining a standard unit for measurement
An additive law defining how any magnitude of the "entity being measured" can be expressed in terms of the standard unit.

For example in case of line segments we can take any specific line segment $OA$ and define its measure (or length) to be $1$ and then the length of any other line segment $PQ$ must be seen in comparison with the length of standard line segment $OA$ whose length we have defined to be $1$. More technically if line segment $PQ$ has a point $R$ in between $P$ and $Q$ such that the line segment $OA$ can be superimposed exactly with both the line segments $PR$ and $RQ$ then we say that the line segment $PQ$ has length $2$. The extension to rational lengths is made possible by a very standard construction of dividing the unit line segment $OA$ into a finite number of line segments of equal length. Finally the extension to irrational lengths is an assumption which links the branches of elementary geometry and real analysis.

The point of discussion in previous paragraph is to emphasize the fact that the concept of length of a line segment is a very fundamental one in terms of measuring magnitudes of geometrical shapes. Once we have the grasp of length it is possible to define the concept of areas of geometrical shapes (at least those shapes which are made up of line segments only). Extension to lengths and areas of shapes consisting of general curves is made possible via real analysis (via the concept of a definite integral). For the present discussion we will assume this extension of lengths and areas for any geometrical shape and proceed to measure angles.

Measuring Angles: Step 1

Intuitively the way we visualize an angle and think about comparison of angles, it is almost obvious that a greater angle somehow signifies that its rays are farther apart from one another compared to those of a smaller angle. Thus the measurement of an angle must be related somehow to the spacing between its rays. But again it is obvious that as we move away from the vertex of an angle the rays look diverging apart from each other more and more. Hence to concretely measure the spacing between the rays it is important to fix a distance from the vertex at which we want to measure the spacing between two rays.

So let $POQ$ be an angle which we wish to measure and let $A, B$ be points on rays $OP, OQ$ respectively such that the length of line segments $OA$ and $OB$ is $1$. An intuitive measure of $\angle POQ$ must be linked with a measure of separation of the points $A$ and $B$. What could be simpler than just measuring the length of line segment $AB$? So let us define the measure of $\angle POQ$ to be the length of line segment $AB$ where $OA = OB = 1$.

Measuring an Angle: Step 1

An equally simple measure of $\angle POQ$ could be the area of $\Delta AOB$. In fact using area would really give a measure of space between the rays $OP$ and $OQ$ and the line segment $AB$.

However if we observe carefully both the definitions (either using length of $AB$ or area of $\Delta AOB$) suffer from two defects:

The comparison of angles fails miserably, and
The additive property also fails in a very obvious manner

The failure of comparison of angles (using the area of $\Delta AOB$ as a measure of $\angle POQ$) is shown via the following figure:

Failure of Comparison of Angles

In the above figure we have $OA = OB = OC = 1$ and it is obvious that the $\angle ROQ$ is significantly larger than the $\angle POQ$, but the area of $\Delta COB$ is much less than the area of $\Delta AOB$. However note that had we used the length $AB$ as measure of $\angle POQ$ then the comparison would have remained valid in the above figure because length of $BC$ is obviously greater than that of $AB$. In order to see the failure of length $AB$ as a measure of $\angle POQ$ we need to understand that associated with a configuration of two rays of $OP$ and $OQ$ emanating from same point $O$ there are two angles possible: one the interior angle (smaller one) between $OP$ and $OQ$ and another the exterior angle which is the portion of plane outside the rays $OP$ and $OQ$ and which is significantly larger than the interior angle. When we consider this view it is obvious that both these angles must be measured by same length $AB$. And it might turn out that a significantly larger angle has a comparatively smaller measure.

The additive property of measurement of angles clearly does not hold with either of these definition based on length (of $AB$) and area (of $\Delta AOB$) as evident from the figure below:

Failure of Additivity

The length $BC$ is clearly not equal to the sum of lengths of $AB$ and $AC$, rather it is slightly less than this sum. However from the same figure above it is obvious that the measure of $\angle ROQ$ must be equal to the sum of measures of $\angle ROP$ and $\angle POQ$. The same problem is seen with the areas of the triangles also as the area of $\Delta COB$ is slightly less than the sum of areas of $\Delta COA$ and $\Delta AOB$.

Measuring Angles: Step 2

What went wrong with the previous definition of measure of angles? What else we need to associate with the gap between two rays emanating from the same point? Well there is no other suitable choice than the previous two approaches based on length of a line segment and area of a triangle. But they suffer from obvious defects as seen in previous paragraphs. Is it possible to fix these defects? Let us give it a try and focus on the defects more closely and perhaps we might be able to fix them.

Let's take the issue of additivity. We find that the additivity fails by a very small margin. Consider the following figure:

Fixing the issue of additivity

If we compare this figure with the previous figure then we note that the additivity fails here by an even smaller margin. To be explicit the difference between the length $BD$ and the sum of lengths of $AB, AC$ and $CD$ is very small. Again the difference between area of $\Delta DOB$ and sum of areas of $\Delta DOC, \Delta COA$ and $\Delta AOB$ is very small. What is so different about this figure compared to the last figure? Here one angle has been expressed as the sum of three smaller angles whereas in previous case an angle was expressed as sum of two smaller angles. Intuitively it is obvious that if we express an angle as a sum of 4 smaller angles then the margin of failure of additivity in the measurement of angles will be even less. Thus to fix the issue of additivity we need to express an angle as a sum of as many smaller angles as possible.

Leaving aside the practical problem of drawing an angle as a sum of say $100$ smaller angles, we can see that this is the way forward. Thus from a theoretical point of view the best option would be to let the number of smaller angles tend to infinity. When the number of smaller angles tends to infinity then we see that the polygonal line $BACD$ turns into an arc of a circle of radius $1$ and centre $O$. And the region $OBACD$ becomes the corresponding sector associated with the arc. The situation is illustrated in the figure below:

Measuring an Angle as Sum of Many Many Angles

Starting from an intuitive approach we have finally reached the modern definition of the measure of an angle. If $POQ$ is an angle then its measure is defined to be the length of arc $AB$ of a circle centered at $O$ and radius $1$ such that $A$ lies on ray $OP$ and $B$ lies on ray $OQ$. The measure of $\angle POQ$ can also be defined as twice the area of sector $AOB$. The unit of measurement of angles as defined above is called a radian and there is no specific symbol used for this unit of measurement.

Measure of an Angle

The factor of $2$ comes because of a specific relation between the circumference of a unit circle ($2\pi$) and its area ($\pi$) which is based on a fact of integral calculus namely $$\int_{0}^{1}\frac{dx}{\sqrt{1 - x^{2}}} = 2\cdot\int_{0}^{1}\sqrt{1 - x^{2}}\,dx = \frac{\pi}{2}$$ where the integral on the left represents the length of a quadrant of a unit circle and the integral on the right represents the area of corresponding sector. We thus see that measuring an angle is intimately connected to measuring stuff related to a circle. No wonder we used semi-circular protractors in school to measure angles in degrees!

Print/PDF Version

Ramanujan's Generating Function for Partitions Modulo 7

2014-07-26T17:12:00.001+05:30

Ramanujan's Partition Congruences

Based on the empirical analysis of a table of partitions Ramanujan conjectured his famous partition congruences $$\boxed{\begin{align}p(5n + 4)&\equiv 0\pmod{5}\notag\\ p(7n + 5)&\equiv 0\pmod{7}\notag\\ p(11n + 6)&\equiv 0\pmod{11}\notag\end{align}}\tag{1}$$ and gave some of the most beautiful proofs for them (see here). In addition to these proofs he gave the following generating functions for $p(5n + 4), p(7n + 5)$: $$\sum_{n = 0}^{\infty}p(5n + 4)q^{n} = 5\frac{\{(1 - q^{5})(1 - q^{10})(1 - q^{15})\cdots\}^{5}}{\{(1 - q)(1 - q^{2})(1 - q^{3})\cdots\}^{6}}\tag{2}$$ and \begin{align}\sum_{n = 0}^{\infty}p(7n + 5)q^{n}&= 7\frac{\{(1 - q^{7})(1 - q^{14})(1 - q^{21})\cdots\}^{3}}{\{(1 - q)(1 - q^{2})(1 - q^{3})\cdots\}^{4}}\notag\\ &\,\,\,\,\,\,\,\,+ 49q\frac{\{(1 - q^{7})(1 - q^{14})(1 - q^{21})\cdots\}^{7}}{\{(1 - q)(1 - q^{2})(1 - q^{3})\cdots\}^{8}}\tag{3}\end{align} We have already established $(2)$ in one of our posts and this post deals with the identity $(3)$ concerning generating function of partitions modulo $7$.

The identity $(3)$ is exceedingly difficult to prove using elementary techniques and most of the proofs involve heavy symbolic manipulation which is normally done using a software like MAPLE or MACSYMA. I tried to obtain some help from MSE to get a proof which used something possible via hand calculation but did not get any good answers there. I chanced to read an old paper titled "Some Identities involving the Partition Function" by Oddmund Kolberg which provided very nice and elementary proofs of the identities $(2)$ and $(3)$ in a systematic fashion. This post tries to elaborate on the concise proof available in that paper.

The Basic Technique

The main idea of the proof is to start with the following well known identities of Euler and Jacobi: $$f(-q) = (1 - q)(1 - q^{2})(1 - q^{3})\cdots = \sum_{n = -\infty}^{\infty}(-1)^{n}q^{n(3n + 1)/2}\tag{4}$$ and $$f^{3}(-q) = \{(1 - q)(1 - q^{2})(1 - q^{3})\cdots\}^{3} = \sum_{n = 0}^{\infty}(-1)^{n}(2n + 1)q^{n(n + 1)/2}\tag{5}$$ and split each of the series above based on the powers of $q$ modulo some prime number. Here we focus on the specific case where the prime number concerned is $7$. Thus we can write \begin{align}f(-q)&= \sum_{n = -\infty}^{\infty}(-1)^{n}q^{n(3n + 1)/2}\notag\\ &= g_{0} + g_{1} + g_{2} + g_{3} + g_{4} + g_{5} + g_{6}\notag\\ &= \sum_{s = 0}^{6}g_{s}\notag\\ &= \sum_{s = 0}^{6}\left(\sum_{n(3n + 1)/2\,\equiv\, s\pmod{7}}(-1)^{n}q^{n(3n + 1)/2}\right)\tag{6}\end{align} and \begin{align}f^{3}(-q)&= \sum_{n = 0}^{\infty}(-1)^{n}q^{n(n + 1)/2}\notag\\ &= h_{0} + h_{1} + h_{2} + h_{3} + h_{4} + h_{5} + h_{6}\notag\\ &= \sum_{s = 0}^{6}h_{s}\notag\\ &= \sum_{s = 0}^{6}\left(\sum_{n \geq 0,\,n(n + 1)/2\,\equiv\,s\pmod{7}}(-1)^{n}(2n + 1)q^{n(n + 1)/2}\right)\tag{7}\end{align} Clearly the expression $n(3n + 1)/2$ falls into one of the four class classes $0, 1, 2, 5$ modulo $7$ so that $$g_{3} = g_{4} = g_{6} = 0$$ Similarly we have $$h_{2} = h_{4} = h_{5} = 0$$ On the other hand we can evaluate $g_{2}$ by noting that \begin{align}n(3n + 1)/2 &\equiv 2\pmod{7}\notag\\ \Leftrightarrow 24\{n(3n + 1)/2\} + 1&\equiv 49 \equiv 0\pmod{7}\notag\\ \Leftrightarrow (6n + 1)^{2}&\equiv 0\pmod{7}\notag\\ \Leftrightarrow 6n + 1&\equiv 0\pmod{7}\notag\\ \Leftrightarrow n&\equiv 1\pmod{7}\notag\end{align} We thus have \begin{align}g_{2}&= \sum_{n = -\infty}^{\infty}(-1)^{7n + 1}q^{(7n + 1)(21n + 4)/2}\notag\\ &= -\sum_{n = -\infty}^{\infty}(-1)^{7n}q^{2}q^{49n(3n + 1)/2}\notag\\ &= -q^{2}f(-q^{49})\notag\end{align} In a similar fashion we can calculate $h_{6} = -7q^{6}f^{3}(-q^{49})$ so that $h_{6} = 7g_{2}^{3}$.

We thus have the following values of $g_{s}, h_{s}$: \begin{align}g_{2} = -q^{2}f(-q^{49}),\, g_{3} = g_{4} = g_{6} &= 0\notag\\ h_{6} = -7q^{6}f^{3}(-q^{49}) = 7g_{2}^{3},\,h_{2} = h_{4} = h_{5} &= 0\tag{8}\end{align} Using the relation $\sum h_{s} = (\sum g_{s})^{3}$ we get $\sum h_{s} = (g_{0} + g_{1} + g_{2} + g_{5})^{3}$ and on equating the terms based on their modulo classes on each side (after cubing RHS) we get $$\boxed{\begin{align}3(g_{0}^{2}g_{2} + g_{1}^{2}g_{0} + g_{2}^{2}g_{5})&= h_{2} = 0\notag\\ 3(g_{1}^{2}g_{2} + g_{2}^{2}g_{0} + g_{5}^{2}g_{1})&= h_{4} = 0\notag\\ 3(g_{0}^{2}g_{5} + g_{2}^{2}g_{1} + g_{5}^{2}g_{2})&= h_{5} = 0\notag\\ g_{2}^{3} + 6g_{0}g_{1}g_{5}&= h_{6} = 7g_{2}^{3}\notag\end{align}}\tag{9}$$ Putting $\alpha = g_{0}/g_{2}, \beta = g_{1}/g_{2}, \gamma = g_{5}/g_{2}$ we can rewrite the above equations as $$\boxed{\begin{align}\alpha \beta^{2} + \alpha^{2} + \gamma &= 0\notag\\ \beta \gamma^{2} + \beta^{2} + \alpha &= 0\notag\\ \gamma \alpha^{2} + \gamma^{2} + \beta &= 0\notag\\ \alpha\beta\gamma &= 1\notag\end{align}}\tag{10}$$ We now relate these quantities with the partition function $p(n)$.

Relation with Partition Function $p(7n + 5)$

We have the generating function of the partition function $p(n)$ given by the formula $$\sum_{n = 0}^{\infty}p(n)q^{n} = \frac{1}{f(-q)}\tag{11}$$ We split the series on the left based on powers of $q$ modulo $7$ so that $\sum p(n)q^{n} = \sum_{s = 0}^{6}P_{s}$ with $P_{s}$ defined by $$P_{s} = \sum_{n = 0}^{\infty}p(7n + s)q^{7n + s}$$ We thus have the following equations \begin{align}P_{0} + P_{1} + P_{2} + P_{3} + P_{4} + P_{5} + P_{6} &= \frac{1}{f(-q)}\notag\\ g_{0} + g_{1} + g_{2} + g_{3} + g_{4} + g_{5} + g_{6} &= f(-q)\notag\end{align} Multiplying the above equations and arranging terms based on their modulo classes we get \begin{align}g_{0}P_{0} + g_{6}P_{1} + g_{5}P_{2} + g_{4}P_{3} + g_{3}P_{4} + g_{2}P_{5} + g_{1}P_{6}&= 1\notag\\ g_{1}P_{0} + g_{0}P_{1} + g_{6}P_{2} + g_{5}P_{3} + g_{4}P_{4} + g_{3}P_{5} + g_{2}P_{6}&= 0\notag\\ g_{2}P_{0} + g_{1}P_{1} + g_{0}P_{2} + g_{6}P_{3} + g_{5}P_{4} + g_{4}P_{5} + g_{3}P_{6}&= 0\notag\\ g_{3}P_{0} + g_{2}P_{1} + g_{1}P_{2} + g_{0}P_{3} + g_{6}P_{4} + g_{5}P_{5} + g_{4}P_{6}&= 0\notag\\ g_{4}P_{0} + g_{3}P_{1} + g_{2}P_{2} + g_{1}P_{3} + g_{0}P_{4} + g_{6}P_{5} + g_{5}P_{6}&= 0\notag\\ g_{5}P_{0} + g_{4}P_{1} + g_{3}P_{2} + g_{2}P_{3} + g_{1}P_{4} + g_{0}P_{5} + g_{6}P_{6}&= 0\notag\\ g_{6}P_{0} + g_{5}P_{1} + g_{4}P_{2} + g_{3}P_{3} + g_{2}P_{4} + g_{1}P_{5} + g_{0}P_{6}&= 0\notag\end{align} Using Cramer's rule we can calculate the desired sum $P_{5}$ corresponding to $p(7n + 5)$ as $P_{5} = D_{5}/D$ where $D$ and $D_{5}$ are the following determinants $$D = \begin{vmatrix}g_{0} & g_{6} & g_{5} & g_{4} & g_{3} & g_{2} & g_{1}\\
g_{1} & g_{0} & g_{6} & g_{5} & g_{4} & g_{3} & g_{2}\\
g_{2} & g_{1} & g_{0} & g_{6} & g_{5} & g_{4} & g_{3}\\
g_{3} & g_{2} & g_{1} & g_{0} & g_{6} & g_{5} & g_{4}\\
g_{4} & g_{3} & g_{2} & g_{1} & g_{0} & g_{6} & g_{5}\\
g_{5} & g_{4} & g_{3} & g_{2} & g_{1} & g_{0} & g_{6}\\
g_{6} & g_{5} & g_{4} & g_{3} & g_{2} & g_{1} & g_{0}\end{vmatrix}\tag{12}$$ $$D_{5} = -\begin{vmatrix}g_{1} & g_{0} & g_{6} & g_{5} & g_{4} & g_{2}\\
g_{2} & g_{1} & g_{0} & g_{6} & g_{5} & g_{3}\\
g_{3} & g_{2} & g_{1} & g_{0} & g_{6} & g_{4}\\
g_{4} & g_{3} & g_{2} & g_{1} & g_{0} & g_{5}\\
g_{5} & g_{4} & g_{3} & g_{2} & g_{1} & g_{6}\\
g_{6} & g_{5} & g_{4} & g_{3} & g_{2} & g_{0}\end{vmatrix}\tag{13}$$ Calculating these determinants above is bit of a challenge but is made possible by the vanishing of some of $g's$ and the relations $(9)$.

Evaluation of determinants $D, D_{5}$

If we take a closer look at $D$ we can see that it is the determinant of a circulant matrix $A$ and therefore we can easily find the eigenvalues of this matrix and thereby calculate the determinant $D$ as a product of eigenvalues. If $\omega$ is a $7^{\text{th}}$ root of unity (including $\omega = 1$ also) then we can see that the expression $$g_{0} + \omega g_{1} + \omega^{2}g_{2} + \omega^{3}g_{3}+ \omega^{4}g_{4}+ \omega^{5}g_{5}+ \omega^{6}g_{6}$$ is an eigenvalue of $A$ and the vector $$(1, \omega, \omega^{2}, \omega^{3}, \omega^{4}, \omega^{5}, \omega^{6})$$ is the corresponding eigenvector. If $\omega$ is a primitive $7^{\text{th}}$ root of unity then all the $7^{\text{th}}$ roots of unity are given by powers of $\omega$ and thus for each of $t = 0, 1, 2, 3, 4, 5, 6$ we obtain the eigenvalue $$\lambda_{t} = g_{0} + \omega^{t} g_{1} + \omega^{2t}g_{2} + \omega^{3t}g_{3}+ \omega^{4t}g_{4}+ \omega^{5t}g_{5}+ \omega^{6t}g_{6}$$ and therefore we have the determinant $D$ given by $$D = \prod_{t = 0}^{6}(g_{0} + \omega^{t} g_{1} + \omega^{2t}g_{2} + \omega^{3t}g_{3}+ \omega^{4t}g_{4}+ \omega^{5t}g_{5}+ \omega^{6t}g_{6}) = \prod_{t = 0}^{6}\sum_{s = 0}^{6}\omega^{st}g_{s}$$ Using the definition of $g_{s} = g_{s}(q)$ we can easily see that $\omega^{st}g_{s}(q) = g_{s}(\omega^{t}q)$ and hence we have \begin{align}D&= \prod_{t = 0}^{6}\sum_{s = 0}^{6}\omega^{st}g_{s} = \prod_{t = 0}^{6}\sum_{s = 0}^{6}g_{s}(\omega^{t}q) = \prod_{t = 0}^{6}f(-\omega^{t}q)\notag\\ &= \prod_{t = 0}^{6}\prod_{n = 1}^{\infty}(1 - \omega^{nt}q^{n}) = \prod_{n = 1}^{\infty}\prod_{t = 0}^{6}(1 - \omega^{tn}q^{n})\notag\\ &= \prod_{n \not\equiv 0\pmod{7}}(1 - q^{7n})\prod_{n \equiv 0\pmod{7}}(1 - q^{n})^{7} = \frac{f^{8}(-q^{7})}{f(-q^{49})}\tag{14}\end{align} Before calculating $D_{5}$ we need to perform a direct calculation of the determinant $D$ using the usual definition of a determinant. To simplify the calculation we use the fact that $$g_{0} = \alpha g_{2}, g_{1} = \beta g_{2}, g_{5} = \gamma g_{2}, g_{3} = g_{4} = g_{6} = 0$$ and then we have $$D = g_{2}^{7}\begin{vmatrix}\alpha & 0 & \gamma & 0 & 0 & 1 & \beta\\
\beta & \alpha & 0 & \gamma & 0 & 0 & 1\\
1 & \beta & \alpha & 0 & \gamma & 0 & 0\\
0 & 1 & \beta & \alpha & 0 & \gamma & 0\\
0 & 0 & 1 & \beta & \alpha & 0 & \gamma\\
\gamma & 0 & 0 & 1 & \beta & \alpha & 0\\
0 & \gamma & 0 & 0 & 1 & \beta & \alpha\end{vmatrix} = g_{2}^{7}D'$$ Thanks to the presence of the zeroes in $D'$ and the fact that $\alpha\beta\gamma = 1$ the determinant $D'$ can be calculated by hand with not so unreasonable effort and we get \begin{align}D'&= (\alpha^{7} + \beta^{7} + \gamma^{7}) - 7(\alpha\beta^{5} + \beta\gamma^{5} + \gamma\alpha^{5})\notag\\ &\,\,\,\,\,\,\,\,+ 14(\alpha^{2}\beta^{3} + \beta^{2}\gamma^{3} + \gamma^{2}\alpha^{3}) + 8\tag{15}\end{align} In the same manner we can calculate $D_{5} = -g_{2}^{6}D'_{5}$ where $$D'_{5} = \begin{vmatrix}\beta & \alpha & 0 & \gamma & 0 & 1\\
1 & \beta & \alpha & 0 & \gamma & 0\\
0 & 1 & \beta & \alpha & 0 & 0\\
0 & 0 & 1 & \beta & \alpha & \gamma\\
\gamma & 0 & 0 & 1 & \beta & 0\\
0 & \gamma & 0 & 0 & 1 & \alpha\end{vmatrix}$$ and \begin{align}D'_{5}&= (\alpha\beta^{5} + \beta\gamma^{5} + \gamma\alpha^{5}) - 4(\alpha^{2}\beta^{3} + \beta^{2}\gamma^{3} + \gamma^{2}\alpha^{3})\notag\\ &\,\,\,\,\,\,\,\,+ 3(\alpha^{3}\beta + \beta^{3}\gamma + \gamma^{3}\alpha) - 8\tag{16}\end{align} Since we know the value of $D'$ in a closed form as $$D' = \frac{D}{g_{2}^{7}} = -\frac{f^{8}(-q^{7})}{q^{14}f^{8}(-q^{49})}\tag{17}$$ the real issue is to evaluate $D'_{5}$ in a closed form. For this we try to find relation between $D'$ and $D'_{5}$. We use a new set of variables $y_{1}, y_{2}, y_{3}$ defined by $$y_{1} = \alpha^{3}\beta, y_{2} = \beta^{3}\gamma, y_{3} = \gamma^{3}\alpha\tag{18}$$ and then apply equation $(10)$ to evaluate the following expressions \begin{align}\alpha^{2}\beta^{3}&= y_{1}y_{2} = - y_{1} - 1\tag{19a}\\ \beta^{2}\gamma^{3} &= y_{2}y_{3} = - y_{2} - 1\tag{19b}\\ \gamma^{2}\alpha^{3} &= y_{3}y_{1} = - y_{3} - 1\tag{19c}\end{align} $$y_{1}y_{2}y_{3} = 1\tag{20}$$ $$\alpha\beta^{5} = y_{1} - y_{2} + 1, \beta\gamma^{5} = y_{2} - y_{3} + 1, \gamma\alpha^{5} = y_{3} - y_{1} + 1\tag{21}$$ \begin{align}\alpha^{7}&= - y_{1}^{2} + y_{1} - y_{3} - 1\tag{22a}\\ \beta^{7}&= - y_{2}^{2} + y_{2} - y_{1} - 1\tag{22b}\\ \gamma^{7}&= - y_{3}^{2} + y_{3} - y_{2} - 1\tag{22c}\end{align} Using equation $(15)$ and the above relations between $y_{i}$ we get \begin{align}D'&= -(y_{1}^{2} + y_{2}^{2} + y_{3}^{2}) - 14(y_{1} + y_{2} + y_{3}) - 58\notag\\ &= -(y_{1} + y_{2} + y_{3})^{2} - 16(y_{1} + y_{2} + y_{3}) - 64\notag\\ &= -(y_{1} + y_{2} + y_{3} + 8)^{2}\notag\end{align} Using equation $(17)$ we get $$y_{1} + y_{2} + y_{3} + 8 = \pm\frac{f^{4}(-q^{7})}{q^{7}f^{4}(-q^{49})}$$ and clearly we can see that the expression $y_{1}$ begins with $-q^{-7}$ so that we need to take the negative sign on RHS in above equation. We finally have $$y_{1} + y_{2} + y_{3} = -\frac{f^{4}(-q^{7})}{q^{7}f^{4}(-q^{49})} - 8\tag{23}$$ In the same manner we can see that $$D'_{5} = 7(y_{1} + y_{2} + y_{3}) + 7 = -7\frac{f^{4}(-q^{7})}{q^{7}f^{4}(-q^{49})} - 49$$ We now have \begin{align}\sum_{n = 0}^{\infty}p(7n + 5)q^{7n + 5}&= P_{5} = \frac{D_{5}}{D} = \frac{-g_{2}^{6}D'_{5}}{g_{2}^{7}D'} = -\frac{D'_{5}}{g_{2}D'}\notag\\ &= \frac{1}{q^{2}f(-q^{49})}\cdot\frac{q^{14}f^{8}(-q^{49})}{f^{8}(-q^{7})}\left(7\frac{f^{4}(-q^{7})}{q^{7}f^{4}(-q^{49})} + 49\right)\notag\\ &= q^{5}\left(7\frac{f^{3}(-q^{49})}{f^{4}(-q^{7})} + 49q^{7}\frac{f^{7}(-q^{49})}{f^{8}(-q^{7})}\right)\notag\end{align} Cancelling $q^{5}$ from both sides and replacing $q^{7}$ by $q$ we get the Ramanujan's identity $$\sum_{n = 0}^{\infty}p(7n + 5)q^{n} = 7\frac{f^{3}(-q^{7})}{f^{4}(-q)} + 49q\frac{f^{7}(-q^{7})}{f^{8}(-q)}$$ The above proof is totally elementary and it is probable that Ramanujan did something similar to obtain his formula. The same technique of splitting series for $f(-q)$ and $1/f(-q)$ into multiple parts based on the modulo classes of the powers of $q$ can be used to prove the identity $(2)$ for partitions modulo $5$. In the above technique the only hard part is the calculation of determinants by brute force which is somewhat laborious. For a person like Ramanujan with unmatched powers of symbolic manipulation this would have been a child's play.

Partition Congruence $p(49n + 47) \equiv 0\pmod{49}$

Ramanujan's motivation for the identity $(3)$ was to prove the partition congruence $$p(49n + 47) \equiv 0\pmod{49}\tag{24}$$ and he gave a very short proof of this congruence in the following manner. Using binomial theorem it is easy to check that $$(1 - q)^{7} = 1 - q^{7} + 7J$$ where $J$ is a power series in $q$ with integral coefficients and hence we can write $$\frac{1 - q^{7}}{(1 - q)^{7}} = 1 + 7J$$ where $J$ again represents some power series in $q$ with integer coefficients (note that each instance of $J$ may represent a different power series in $q$). Replace $q$ with $q^{2}, q^{3}, \ldots$ and multiplying the results we get $$\frac{(1 - q^{7})(1 - q^{14})(1 - q^{21})\cdots}{\{(1 - q)(1 - q^{2})(1 - q^{3})\cdots\}^{7}} = \frac{f(-q^{7})}{f^{7}(-q)}= 1 + 7J\tag{25}$$ Using identity $(3)$ we can write $$\dfrac{{\displaystyle \sum_{n = 0}^{\infty}p(7n + 5)q^{n + 1}}}{7f^{2}(-q^{7})} = qf^{3}(-q)\frac{f(-q^{7})}{f^{7}(-q)} + 7q^{2}\frac{f^{5}(-q^{7})}{f^{8}(-q)}$$ and using $(25)$ we can now write $$\dfrac{{\displaystyle \sum_{n = 0}^{\infty}p(7n + 5)q^{n + 1}}}{7f^{2}(-q^{7})} = qf^{3}(-q) + 7J = \sum_{n = 0}^{\infty}(-1)^{n}(2n + 1)q^{n(n + 1)/2 + 1} + 7J$$ We next need to analyze the coefficient of $q^{7n}$ on both sides. On the RHS we can see that \begin{align}\frac{n(n + 1)}{2} + 1 &\equiv 0\pmod{7}\notag\\ \Leftrightarrow n(n + 1) + 2 &\equiv 0\pmod{7}\notag\\ \Leftrightarrow n(n + 1) &\equiv 5\pmod{7}\notag\\ \Leftrightarrow 4n(n + 1) + 1 &\equiv 0\pmod{7}\notag\\ \Leftrightarrow (2n + 1)^{2} &\equiv 0\pmod{7}\notag\\ \Leftrightarrow (2n + 1) &\equiv 0\pmod{7}\notag\end{align} so that the coefficient of $q^{7n}$ on the RHS is a multiple of $7$. It follows that the coefficient of $q^{7n}$ in $\sum_{n = 0}^{\infty}p(7n + 5)q^{n + 1}$ is a multiple of $49$. Thus we have $p(49n + 47)\equiv 0\pmod{49}$.

Print/PDF Version

A Continued Fraction for Error Function by Ramanujan

2014-06-21T16:00:00.002+05:30

Today's post is inspired by this question I asked sometime back on MSE. And after some effort (offering a bounty) I received a very good answer from a user on MSE. This question was asked for the first time by Ramanujan in the "Journal of Indian Mathematical Society" 6th issue as Question no. 541, page 79 in the following manner:

Prove that $$\left(1 + \frac{1}{1\cdot 3} + \frac{1}{1\cdot 3\cdot 5} + \cdots\right) + \left(\cfrac{1}{1+}\cfrac{1}{1+}\cfrac{2}{1+}\cfrac{3}{1+}\cfrac{4}{1+\cdots}\right) = \sqrt{\frac{\pi e}{2}}\tag{1}$$ It turns out that the first series is intimately connected with the error function given $$\operatorname{erf}(x) = \frac{2}{\sqrt{\pi}}\int_{0}^x e^{-t^2}\,dt\tag{2}$$ In his famous letter to G. H. Hardy, dated 16th January 1913, Ramanujan gave the following continued fraction for the integral used in the definition of error function given above: $$\int_{0}^{a}e^{-x^{2}}\,dx = \frac{\sqrt{\pi}}{2} - \cfrac{e^{-a^{2}}}{2a+}\cfrac{1}{a+}\cfrac{2}{2a+}\cfrac{3}{a+}\cfrac{4}{2a+\cdots}\tag{3}$$ In this post we will establish the relations $(1)$ and $(3)$ along the lines indicated in the answer on MSE.

Summing the Given Series

We first start with summing the given series $$S = 1 + \frac{1}{1\cdot 3} + \frac{1}{1\cdot 3\cdot 5} + \cdots$$ Clearly if we define a function $f(x)$ by $$f(x) = x + \frac{x^{3}}{1\cdot 3} + \frac{x^{5}}{1\cdot 3\cdot 5} + \cdots\tag{4}$$ then $S = f(1)$. It turns out that if we differentiate the series for $f(x)$ term by term then we obtain \begin{align} f'(x) &= 1 + x^{2} + \frac{x^{4}}{1\cdot 3} + \frac{x^{6}}{1\cdot 3\cdot 5} + \cdots\notag\\ &= 1 + x\left(x + \frac{x^{3}}{1\cdot 3} + \frac{x^{5}}{1\cdot 3\cdot 5} + \cdots\right)\notag\\ &= 1 + xf(x)\notag \end{align} so that the function $y = f(x)$ satisfies the following first order differential equation $$\frac{dy}{dx} - xy = 1, y(0) = 0$$ Clearly the integrating factor here $e^{-x^{2}/2}$ and hence we have $$y = f(x) = e^{x^{2}/2}\int_{0}^{x}e^{-t^{2}/2}\,dt$$ and hence $$S = f(1) = \sqrt{e}\int_{0}^{1}e^{-t^{2}/2}\,dt$$ We can rewrite the sum $S$ in the following manner \begin{align} S &= \sqrt{e}\int_{0}^{1}e^{-t^{2}/2}\,dt\notag\\ &= \sqrt{e}\int_{0}^{\infty}e^{-t^{2}/2}\,dt - \sqrt{e}\int_{1}^{\infty}e^{-t^{2}/2}\,dt\notag\\ &= \sqrt{\frac{\pi e}{2}} - \sqrt{e}\int_{1}^{\infty}e^{-t^{2}/2}\,dt\notag \end{align} where the first integral is easily calculated from the famous integral $$\int_{0}^{\infty}e^{-t^{2}}\,dt = \frac{\sqrt{\pi}}{2}\tag{5}$$ It follows that the equation $(1)$ can be established if we show that $$\sqrt{e}\int_{1}^{\infty}e^{-t^{2}/2}\,dt = \cfrac{1}{1+}\cfrac{1}{1+}\cfrac{2}{1+}\cfrac{3}{1+}\cfrac{4}{1+\cdots}\tag{6}$$ It turns out that it is much easier to consider the function $$\phi(x) = e^{x^{2}/2}\int_{x}^{\infty}e^{-t^{2}/2}\,dt\tag{7}$$ and derive a continued fraction for $\phi(x)$ and then put $x = 1$ to get $(6)$.

Derivatives of $\phi(x)$

If we differentiate $\phi(x)$ we obtain $$\phi'(x) = xe^{x^{2}/2}\int_{x}^{\infty}e^{-t^{2}/2}\,dt - e^{x^{2}/2}\cdot e^{-x^{2}/2} = x\phi(x) - 1\tag{8}$$ and a further differentiation gives $$\phi''(x) = \phi(x) + x\phi'(x) = \phi(x) + x(x\phi(x) - 1) = (1 + x^{2})\phi(x) - x$$ It is now easy to guess the pattern of $n^{\text{th}}$ derivative of $\phi(x)$ and we have $$\phi^{(n)}(x) = P_{n}(x)\phi(x) - Q_{n}(x)\tag{9}$$ where $P_{n}(x), Q_{n}(x)$ are polynomials. We will derive many properties of these polynomials starting with the following $$\boxed{\begin{align}P_{n + 1}(x) &= xP_{n}(x) + P_{n}'(x)\notag\\ Q_{n + 1}(x) &= P_{n}(x) + Q_{n}'(x)\notag\end{align}}\tag{10}$$ Clearly we can see that \begin{align} P_{n + 1}(x)\phi(x) - Q_{n + 1}(x) &= \phi^{(n + 1)}(x) = \{\phi^{(n)}(x)\}'\notag\\ &= \{P_{n}(x)\phi(x) - Q_{n}(x)\}'\notag\\ &= P_{n}(x)\phi'(x) + P_{n}'(x)\phi(x) - Q_{n}'(x)\notag\\ &= P_{n}(x)\{x\phi(x) - 1\} + P_{n}'(x)\phi(x) - Q_{n}'(x)\notag\\ &= \{xP_{n}(x) + P_{n}'(x)\}\phi(x) - \{P_{n}(x) + Q_{n}'(x)\}\notag \end{align} and we get the equations $(9)$ by comparing the LHS and RHS of the above equation.

The initial values of these polynomials are $P_{0}(x) = 1, P_{1}(x) = x$ and $Q_{0}(x) = 0, Q_{1}(x) = 1$. Differentiating the fundamental relation $\phi'(x) = x\phi(x) - 1$ $n$ times using Leibniz rule we get the following relation $$\phi^{(n + 1)}(x) = x\phi^{(n)}(x) + n\phi^{(n - 1)}(x)$$ if $n \geq 1$ and therefore we get \begin{align} P_{n + 1}(x)\phi(x) - Q_{n + 1}(x) &= x\{P_{n}(x)\phi(x) - Q_{n}(x)\} + n\{P_{n - 1}(x)\phi(x) - Q_{n - 1}(x)\}\notag\\ &= \{xP_{n}(x) + nP_{n - 1}(x)\}\phi(x) - \{xQ_{n}(x) + nQ_{n - 1}(x)\}\notag \end{align} and thus by comparing LHS and RHS we get $$P_{n + 1}(x) = xP_{n}(x) + nP_{n - 1}(x),\, Q_{n + 1}(x) = xQ_{n}(x) + nQ_{n - 1}(x)$$ Comparing the above recursive relation of $P_{n}(x)$ with the previous one given in equation $(10)$ we also get $P_{n}'(x) = nP_{n - 1}(x)$. We summarize our results as follows $$\boxed{\begin{align} P_{0}(x) &= 1,\, P_{1}(x) = x\notag\\ Q_{0}(x) &= 0,\, Q_{1}(x) = 1\notag\\ P_{n + 1}(x) &= xP_{n}(x) + nP_{n - 1}(x),\, n\geq 1\notag\\ Q_{n + 1}(x) &= xQ_{n}(x) + nQ_{n - 1}(x),\, n\geq 1\notag\\ P_{n}'(x) &= nP_{n - 1}(x),\,n\geq 1\notag \end{align}}\tag{11}$$ Next we establish the following identity which seems like a cross multiplication of $P$ and $Q$ $$Q_{n + 1}(x)P_{n}(x) - P_{n + 1}(x)Q_{n}(x) = (-1)^{n} n!\tag{12}$$ Clearly by direct substitution it holds for $n = 0, 1$ and we can try induction on $n$ to prove it for all $n$. We assume that the equation $(12)$ above holds for $n = m$ and try to prove it for $n = m + 1$. We have \begin{align} F(x) &= Q_{m + 2}(x)P_{m + 1}(x) - P_{m + 2}(x)Q_{m + 1}(x)\notag\\ &= \{xQ_{m + 1}(x) + (m + 1)Q_{m}(x)\}P_{m + 1}(x) - \{xP_{m + 1}(x) + (m + 1)P_{m}(x)\}Q_{m + 1}(x)\notag\\ &= - (m + 1)\{Q_{m + 1}(x)P_{m}(x) - P_{m + 1}(x)Q_{m}(x)\}\notag\\ &= - (m + 1)(-1)^{m}m! = (-1)^{m + 1}(m + 1)!\notag \end{align} and therefore the relation holds for $n = m + 1$ and hence by induction equation $(12)$ holds for all values of $n$.

Next another identity concerning sign of $\phi^{(n)}(x)$ is needed. We have $$(-1)^{n}\phi^{(n)}(x) > 0,\, n \geq 0\tag{13}$$ for all $x$. We have \begin{align} \int_{0}^{\infty}e^{-tx}e^{-t^{2}/2}\,dt &= e^{x^{2}/2}\int_{0}^{\infty}e^{-(x + t)^{2}/2}\,dt\notag\\ &= e^{x^{2}/2}\int_{x}^{\infty}e^{-z^{2}/2}\,dz\text{ (by putting }z = x + t)\notag\\ &= \phi(x)\notag \end{align} and hence $$\phi^{(n)}(x) = (-1)^{n}\int_{0}^{\infty}t^{n}e^{-tx}e^{-t^{2}/2}\,dt$$ and the relation $(13)$ is now obvious.

Series Expansion for $P_{n}(x)$

We will now focus more on the polynomial $P_{n}(x)$. It turns out that we have $$P_{n}(x) = e^{-x^{2}/2}\frac{d^{n}}{dx^{n}}\{e^{x^{2}/2}\}\tag{14}$$ Clearly if we let $a_{n}(x) = e^{-x^{2}/2}\{e^{x^{2}/2}\}^{(n)}$ then $a_{0}(x) = 1$ and $a_{1}(x) = x$. Also we have \begin{align} a_{n + 1}(x) &= e^{-x^{2}/2}\{e^{x^{2}/2}\}^{(n + 1)}\notag\\ &= e^{-x^{2}/2}\{xe^{x^{2}/2}\}^{(n)}\notag\\ &= x\left(e^{-x^{2}/2}\{e^{x^{2}/2}\}^{(n)}\right) + ne^{-x^{2}/2}\{e^{x^{2}/2}\}^{(n - 1)}\notag\\ &= xa_{n}(x) + na_{n - 1}(x)\notag \end{align} so that $a_{n}(x)$ satisfies the same recurrence relation $(11)$ as $P_{n}(x)$. At the same time the initial values of $a_{n}(x)$ for $n = 0, 1$ also match with those of $P_{n}(x)$ and hence $a_{n}(x) = P_{n}(x)$ for all $x$ and $n$.

Next we start with the famous integral $$\frac{\sqrt{\pi}}{2} = \int_{0}^{\infty}e^{-x^{2}}\,dx$$ and put $z = x\sqrt{2}$ to get $$\sqrt{\frac{\pi}{2}} = \int_{0}^{\infty}e^{-z^{2}/2}\,dz = \frac{1}{2}\int_{-\infty}^{\infty}e^{-z^{2}/2}\,dz$$ Putting $z = t - x$ we get $$\sqrt{2\pi} = e^{-x^{2}/2}\int_{-\infty}^{\infty}e^{tx}e^{-t^{2}/2}\,dt$$ and therefore we have $$e^{x^{2}/2} = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}e^{tx}e^{-t^{2}/2}\,dt$$ and differentiating this relation $n$ times with respect to $x$ we get \begin{align} P_{n}(x)\,&= \frac{e^{-x^{2}/2}}{\sqrt{2\pi}}\int_{-\infty}^{\infty}t^{n}e^{tx}e^{-t^{2}/2}\,dt\notag\\ &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}t^{n}e^{-(t - x)^{2}/2}\,dt\notag\\ &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}(u + x)^{n}e^{-u^{2}/2}\,du\notag\\ &= \sum_{k = 0}^{n}\binom{n}{k}x^{n - k}\left(\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}u^{k}e^{-u^{2}/2}\,du\right)\notag\\ &= \sum_{0 \leq 2k \leq n}\binom{n}{2k}x^{n - 2k}\left(\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}u^{2k}e^{-u^{2}/2}\,du\right)\notag\\ &= \sum_{0 \leq 2k \leq n}\binom{n}{2k}x^{n - 2k}\left(\frac{2}{\sqrt{2\pi}}\int_{0}^{\infty}u^{2k}e^{-u^{2}/2}\,du\right)\notag\\ &= \sum_{0 \leq 2k \leq n}\binom{n}{2k}x^{n - 2k}\left(\frac{2^{k}}{\sqrt{\pi}}\int_{0}^{\infty}t^{k - 1/2}e^{-t}\,dt\right)\text{ (by putting }u^{2} = 2t)\notag\\ &= \sum_{0 \leq 2k \leq n}\binom{n}{2k}x^{n - 2k}\cdot\dfrac{2^{k}\Gamma\left(k + \dfrac{1}{2}\right)}{\sqrt{\pi}}\notag\\ &= \sum_{0 \leq 2k \leq n}\binom{n}{2k}x^{n - 2k}\cdot\dfrac{(2k)!}{2^{k}k!}\notag \end{align} and we thus obtain the series for $P_{n}(x)$ given by $$P_{n}(x) = \sum_{0 \leq 2k \leq n}\binom{n}{2k}\frac{(2k)!}{2^{k}k!}x^{n - 2k}\tag{15}$$ It follows that $P_{n}(x)$ is even function if $n$ is even and is an odd function if $n$ is odd. Also using the greatest $k$ such that $2k\leq n$ in the above series we see that $$P_{2n}(x) \geq \frac{(2n)!}{2^{n}n!},\,\frac{P_{2n + 1}(x)}{x}\geq \frac{(2n + 1)!}{2^{n}n!}\tag{16}$$ for all $x$.

Continued Fraction for $\phi(x)$

From $(13)$ we see that $\phi^{(2n)}(x) > 0$ for all $n$ and $x$ and hence noting that $P_{2n}(x) > 0$ for all $x$ (from $(16)$) we get $$\frac{Q_{2n}(x)}{P_{2n}(x)} < \phi(x)$$ for all $x$ and non-negative integers $n$. Similarly for $x > 0$ we see that $\phi^{(2n + 1)}(x) < 0$ and therefore $$\phi(x) < \frac{Q_{2n + 1}(x)}{P_{2n + 1}(x)}$$ for non-negative integers $n$.

It follows that if $x > 0$ and $n$ is a non-negative integer then $$\frac{Q_{2n}(x)}{P_{2n}(x)} < \phi(x)< \frac{Q_{2n + 1}(x)}{P_{2n + 1}(x)},\, \frac{Q_{2n + 2}(x)}{P_{2n + 2}(x)} < \phi(x)< \frac{Q_{2n + 1}(x)}{P_{2n + 1}(x)}\tag{17}$$ and this leads to $$0 < \phi(x) - \frac{Q_{2n}(x)}{P_{2n}(x)} < \frac{Q_{2n + 1}(x)}{P_{2n + 1}(x)} - \frac{Q_{2n}(x)}{P_{2n}(x)} = \frac{(2n)!}{P_{2n}(x)P_{2n + 1}(x)}$$ and $$0 < \frac{Q_{2n + 1}(x)}{P_{2n + 1}(x)} - \phi(x) < \frac{Q_{2n + 1}(x)}{P_{2n + 1}(x)} - \frac{Q_{2n + 2}(x)}{P_{2n + 2}(x)} = \frac{(2n + 1)!}{P_{2n + 1}(x)P_{2n + 2}(x)}$$ by using the relation $(12)$. We can combine the above inequalities and write $$0 < \left|\phi(x) - \frac{Q_{n}(x)}{P_{n}(x)}\right| < \frac{n!}{P_{n}(x)P_{n + 1}(x)}\tag{18}$$ for all $x > 0$ and all non-negative integers $n$. Using the estimates for $P_{n}(x)$ given in $(16)$ we see that $$\lim_{n \to \infty}\frac{Q_{n}(x)}{P_{n}(x)} = \phi(x)\tag{19}$$ for all $x > 0$.

From the recurrence relations $(11)$ it is now clear that $Q_{n}(x)/P_{n}(x)$ is the $n^{\text{th}}$ convergent of the continued fraction $$\cfrac{1}{x + }\cfrac{1}{x + }\cfrac{2}{x + }\cfrac{3}{x + }\cfrac{4}{x + \cdots}$$ and from $(19)$ it follows that these convergents do converge to $\phi(x)$ if $x > 0$. It follows that $$\phi(x) = e^{x^{2}/2}\int_{x}^{\infty}e^{-t^{2}/2}\,dt = \cfrac{1}{x + }\cfrac{1}{x + }\cfrac{2}{x + }\cfrac{3}{x + }\cfrac{4}{x + \cdots}\tag{20}$$ for positive values of $x$. Putting $x = 1$ we get equation $(6)$ and thus the problem posed by Ramanujan in equation $(1)$ is solved. The continued fraction in equation $(3)$ is equivalent to $$e^{a^{2}}\int_{a}^{\infty}e^{-t^{2}}\,dt = \cfrac{1}{2a + }\cfrac{1}{a + }\cfrac{2}{2a + }\cfrac{3}{a + }\cfrac{4}{2a + \cdots}$$ and it can be proved in the manner similar to the continued fraction of $\phi(x)$. We only need to note that if $$\psi(x) = e^{x^{2}}\int_{x}^{\infty}e^{-t^{2}}\,dt
$$ then $\psi'(x) = 2x\psi(x) - 1$ so that the polynomials $P_{n}(x), Q_{n}(x)$ corresponding to $\psi^{(n)}(x) = P_{n}(x)\psi(x) - Q_{n}(x)$ will have the following recurrence relations $$\boxed{\begin{align} P_{0}(x) &= 1,\, P_{1}(x) = 2x\notag\\ Q_{0}(x) &= 0,\, Q_{1}(x) = 1\notag\\ P_{n + 1}(x) &= 2xP_{n}(x) + 2nP_{n - 1}(x)\notag\\ Q_{n + 1}(x) &= 2xQ_{n}(x) + 2nQ_{n - 1}(x)\notag \end{align}}\tag{21}$$ and like the case of $\phi(x)$ these recurrence relations lead to the continued fraction $$\psi(x) = e^{x^{2}}\int_{x}^{\infty}e^{-t^{2}}\,dt = \cfrac{1}{2x +}\cfrac{2}{2x +}\cfrac{4}{2x +}\cfrac{6}{2x +}\cfrac{8}{2x + \cdots}$$ and it can be easily simplified by cancelling $2$ from numerator and denominator at appropriate terms in the continued fraction to get the final version as mentioned by Ramanujan $$\psi(x) = e^{x^{2}}\int_{x}^{\infty}e^{-t^{2}}\,dt = \cfrac{1}{2x +}\cfrac{1}{x +}\cfrac{2}{2x +}\cfrac{3}{x +}\cfrac{4}{2x + \cdots}\tag{22}$$
Note: Some of the proofs are taken from an online paper by Omran Kouba (who provided an answer to my question posted on MSE).

Print/PDF Version

Playing With Partitions: Euler's Pentagonal Theorem

2014-05-21T22:27:00.000+05:30

This post was originally written for MSE blog. Please click here to read.

Print/PDF Version

Theories of Exponential and Logarithmic Functions: Part 3

2014-05-11T14:46:00.000+05:30

In this concluding post on the theories of exponential and logarithmic functions we will present the most intuitive and obvious approach to define the expression $a^{b}$ directly without going to the number $e$ and the function $\log x$. This approach stems from the fact that an irrational number can be approximated by rational numbers and we can find as good approximations as we want. The idea is that if $b$ is irrational and $a > 0$ then we have many rational approximations $b', b'', \ldots $ to $b$ and the numbers $a^{b'}, a^{b''}, \ldots$ would be the approximations to the number $a^{b}$ being defined. Inherent in such a procedure is the belief that we can find as good approximations to $a^{b}$ as we want by choosing sufficiently good rational approximations to $b$. Thus we can see that the numbers $$2^{1}, 2^{1.4}, 2^{1.41}, 2^{1.414}, 2^{1.4142}, \ldots$$ are approximations to the number $2^{\sqrt{2}}$.

Intuitive Definition of $a^{b}$

In order to make the definition described in previous paragraph more precise we start with the assumption that $a > 0$ and $b$ is any real number. If $\{b_{n}\}$ is a sequence of rational numbers such that $\lim_{n \to \infty}b_{n} = b$ then we define $$a^{b} = \lim_{n \to \infty}a^{b_{n}}\tag{1}$$ While this seems to be the most obvious approach to the theory of exponential and logarithmic functions it has two stumbling blocks:

We need to show that the limit in definition $(1)$ exists
If there are two sequences $\{b_{n}\}, \{c_{n}\}$ both tending to $b$ then we must show that $\lim_{n \to \infty}a^{b_{n}} = \lim_{n \to \infty}a^{c_{n}}$

Both these statements are obvious if $a = 1$. In the further development of the theory we can assume that $a > 1$ and the case $0 < a < 1$ can be treated by noting that $a^{b} = 1/(1/a)^{b}$ where $b$ is rational and $(1/a) > 1$.

We first show that there is a sequence $b_{n}$ tending to $b$ such that $\lim_{n \to \infty}a^{b_{n}}$ exists. Clearly we can choose any monotonically increasing sequence $b_{n}$ which tends to $b$. By any theory of real numbers (Dedekind cuts or Cantor's constructions of reals via Cauchy sequences of rationals) it is possible to find such a sequence $b_{n}$ of rationals. In practice we may choose $b_{n}$ to consist of decimal approximations to $b$ adding each extra digit for each term of the sequence. Note that such a sequence $b_{n}$ is bounded above (because it is increasing and reaches a limit $b$). This means that there is a rational number $B$ such that $b_{n} \leq B$ for all $n$. Now we can see that the sequence $x_{n} = a^{b_{n}}$ is increasing (because $b_{n}$ is increasing and $a > 1$) and at the same time it is bounded above by $a^{B}$. It follows that $x_{n}$ tends to a positive limit.

Next we show that if there are two sequences $b_{n}, c_{n}$ of rationals each tending to $b$ then $\lim_{n \to \infty}a^{b_{n}} = \lim_{n \to \infty}a^{c_{n}}$. Clearly if we put $d_{n} = b_{n} - c_{n}$ then $d_{n}$ is a sequence of rationals tending to $0$. We will first show that $a^{d_{n}} \to 1$ as $n \to \infty$. Let $\epsilon > 0$ be an arbitrary number. We know from previous post that $a^{1/n} \to 1$ and $a^{-1/n} \to 1$ as $n \to \infty$ hence there is an integer $m$ such that $$|a^{-1/n} - 1| < \epsilon,\,\, |a^{1/n} - 1| < \epsilon\tag{2}$$ whenever $n \geq m$. Since $d_{n} \to 0$ it follows that there is a positive integer $N$ such that $|d_{n}| < 1/m $ for all $n \geq N$. This means that $$a^{-1/m} < a^{d_{n}} < a^{1/m}\tag{3}$$ for all $n \geq N$. From equations $(2)$ and $(3)$ we can see that $$|a^{d_{n}} - 1| < \epsilon$$ whenever $n \geq N$. It follows that $a^{d_{n}} \to 1$ as $n \to \infty$. Now we can see that $$\lim_{n \to \infty}a^{b_{n}} = \lim_{n \to \infty}a^{d_{n} + c_{n}} = \lim_{n \to \infty}a^{d_{n}}\lim_{n \to \infty}a^{c_{n}} = \lim_{n \to \infty}a^{c_{n}}$$ Thus both the stumbling blocks in definition $(1)$ are taken care of and we can proceed further.

We next prove that if $b > 0$ then $a^{b} > 1$. Since $b > 0$ there must be an increasing sequence $b_{n}$ of positive rationals tending to $b$. Clearly we have $$b_{n} > b_{1} > 0$$ for all $n > 1$ and hence it follows that $$a^{b_{n}} > a^{b_{1}} > 1$$ for all $n > 1$. Taking limits as $n \to \infty$ we see that $a^{b} \geq a^{b_{1}} > 1$.

We are now in a position to study the function $f(x) = a^{x}$. As before we will only need to consider the case $a > 1$ and reader can state and prove results for the case $0 < a < 1$ himself. First we see that the law of exponents is valid i.e. $$a^{x}a^{y} = a^{x + y}\tag{4}$$ for all $x, y$. Clearly let $x_{n}, y_{n}$ be sequences of rationals tending to $x, y$ so that $(x_{n} + y_{n}) \to (x + y)$. Then we have $a^{x_{n}}a^{y_{n}} = a^{x_{n} + y_{n}}$ and taking limits when $n \to \infty$ we are done. In similar fashion we can prove usual laws of exponents.

We next show that $f(x)$ is strictly increasing for all $x$. Let $x > y$ so that $x - y > 0$ and then $a^{x - y} > 1$. Multiplying by $a^{y} > 0$ on both sides we get $a^{x - y}a^{y} > a^{y}$ i.e. $a^{x} > a^{y}$. From this point it is easy to establish most of the common inequalities regarding exponents and we will not go into details. In particular the inequalities related to $\alpha, \beta, r, s$ in the last post all hold true for all positive real numbers $r, s$. However there is a caveat. The inequalities may become weak if we try to construct sequences of positive rationals tending to $r, s$ and then letting $n \to \infty$. But we don't really need the strict versions of those inequalities.

It will be important now to establish the simple but fundamental limit $$\lim_{x \to 0}a^{x} = 1\tag{5}$$ Let $\epsilon > 0$ be arbitrary. Then there is a positive integer $m$ such that $$|a^{-1/n} - 1| < \epsilon,\,\, |a^{1/n} - 1| < \epsilon$$ for $n \geq m$. Clearly if we chose $x \in (-1/m, 1/m)$ then $$a^{-1/m} < a^{x} < a^{1/m}$$ and hence $|a^{x} - 1| < \epsilon$. Thus we can choose $\delta = 1/m$ and then we can see that $|a^{x} - 1| < \epsilon$ whenever $0 < |x| < \delta$. Thus $a^{x} \to 1$ as $x \to 0$.

As in the last post we had shown that $n(x^{1/n} - 1)$ is decreasing as $n$ increases, we can show that the function $g(x) = \dfrac{a^{x} - 1}{x}$ is increasing for all $x > 0$. Also the function $g(x)$ satisfies $(a - 1) \geq g(x) \geq a^{x - 1} (a - 1)$ for all $a > 1, x > 0$. It follows that as $x \to 0^{+}$ the limit $$\lim_{x \to 0^{+}}\frac{a^{x} - 1}{x} = \phi(a)$$ exists and $\phi(a) \geq (a - 1)/a$. If $x \to 0^{-}$ then we put $x = -y$ so that $y \to 0^{+}$ and then we have $$\lim_{x \to 0^{-}}\frac{a^{x} - 1}{x} = \lim_{y \to 0^{+}}\frac{a^{y} - 1}{ya^{y}} = \phi(a)/1 = \phi(a)$$ It follows that the limit $$\lim_{x \to 0}\frac{a^{x} - 1}{x} = \phi(a)\tag{6}$$ defines a function $\phi(a)$ which is defined for all $a > 0$.

We can also see that $\phi(1) = 0$ and $(a - 1)/a \leq \phi(a) \leq (a - 1)$ for $a > 1$. Thus $\phi(a) > 0$ for $a > 1$ and it can be easily seen that $\phi(1/a) = -\phi(a)$ so that $\phi(a) < 0$ for $0 < a < 1$. It can be established that $\phi(ab) = \phi(a) + \phi(b)$ for all positive $a, b$ and $\phi(a/b) = \phi(a) - \phi(b)$. Thus if $a > b$ so that $a/b > 1$ and we have $\phi(a/b) > 0$ so that $\phi(a) > \phi(b)$.

It follows that the function $\phi(a)$ is defined for all $a > 0$ and it is strictly increasing function of $a$. It satisfies the properties $$\phi(1) = 0, \phi(1/a) = -\phi(a), \phi(ab) = \phi(a) + \phi(b), \phi(a/b) = \phi(a) - \phi(b)$$ It is now time to identify this function $\phi(a)$ with $\log a$ and thus we define $$\log x = \lim_{b \to 0}\frac{x^{b} - 1}{b}\tag{7}$$ for all $x > 0$. Just to repeat the properties of $\log$ function established in terms of $\phi$ we have \begin{align} \log 1 = 0, \log (1/x) = -\log x,\tag{8a}\\ \log(xy) = \log x + \log y,\tag{8b}\\ \log(x/y) = \log x - \log y\tag{8c} \end{align} for all positive $x, y$. Also $\log x$ is a strictly increasing function of $x$ and $$\frac{x - 1}{x}\leq \log x \leq (x - 1)\tag{9}$$ for $x > 1$. From this last equation we can easily show that $$\lim_{x \to 1}\frac{\log x}{x - 1} = 1\text{ or equivalently }\lim_{x \to 0}\frac{\log(1 + x)}{x} = 1\tag{10}$$ and this limit shows that the derivative of $\log x$ is $1/x$ (see previous post).

We can also find derivative of $f(x) = a^{x}$. We have $$(a^{x})' = \lim_{h \to 0}\frac{a^{x + h} - a^{x}}{h} = a^{x} \lim_{h \to 0}\frac{a^{h} - 1}{h} = a^{x}\log a\tag{11}$$ We next show that $$\log a^{b} = b\log a\tag{12}$$ for all $a > 0$ and all $b$. Clearly from the property $\log(ab) = \log a + \log b$ it is easy to show that $(12)$ holds if $b$ is rational. If $b$ is irrational then we can take a sequence $b_{n}$ of rationals tending to $b$ and then we have $\log a^{b_{n}} = b_{n}\log a$. Taking limits when $n \to \infty$ and noting that both $\log x$ and $a^{x}$ are continuous we get $\log a^{b} = b\log a$.

Now $\log 2 > 0$ and hence given any number $M > 0$ we can find an integer $n$ such that $n \log 2 > M$. Let $N = 2^{n}$. If $x > N$ then $$\log x > \log N = \log 2^{n} = n\log 2 > M$$ It follows that $\log x \to \infty$ as $x \to \infty$. Noting that $\log (1/x) = -\log x$ we get $\log x \to -\infty$ as $x \to 0^{+}$. It follows that $\log x$ is a strictly increasing function from $\mathbb{R}^{+}$ to $\mathbb{R}$ and hence there is a unique number $e > 1$ such that $\log e = 1$.

Then for any real $x$ we have $\log e^{x} = x\log e = x$ so that if $y = e^{x}$ then $x = \log y$. It turns out that the specific power $e^{x}$ is the inverse of the logarithm function. Also we note that $(e^{x})' = e^{x}\log e = e^{x}$ so that $e^{x}$ is its own derivative. We immediately get the formula $$\lim_{x \to 0}\frac{e^{x} - 1}{x} = 1\tag{13}$$ We will now show that $$e^{x} = \lim_{t \to \infty}\left(1 + \frac{x}{t}\right)^{t} = \lim_{t \to -\infty}\left(1 + \frac{x}{t}\right)^{t}\tag{14}$$ Note that we can more easily combine these two limits into one and show instead that $$\lim_{h \to 0}(1 + xh)^{1/h} = e^{x}$$ Clearly we can see that \begin{align} \lim_{h \to 0}\log(1 + xh)^{1/h} &= \lim_{h \to 0}\frac{\log(1 + xh)}{h}\notag\\ &= \lim_{h \to 0}x\cdot\frac{\log(1 + xh)}{xh} = x\notag \end{align} and by continuity of $\log x$ and $e^{x}$ we get $\lim_{h \to 0}(1 + xh)^{1/h} = e^{x}$. If we replace $t$ in $(14)$ by positive integer $n$ then we immediately get the usual formulas $$e^{x} = \lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{n} = \lim_{n \to \infty}\left(1 - \frac{x}{n}\right)^{-n},\,\, e = \lim_{n \to \infty}\left(1 + \frac{1}{n}\right)^{n}\tag{15}$$ Note that the infinite series for $e^{x}$ can either be proved using limits above or using the Taylor's series approach based on derivatives. We have now been able to derive all the common properties of $a^{x}, e^{x}, \log x$ and have established the derivative formulas. The interested reader can carry on from here and establish any property of these functions without any hassle.

Note: This post is inspired by this and this answer on MSE.

Print/PDF Version

Theories of Exponential and Logarithmic Functions: Part 2

2014-05-10T13:37:00.002+05:30

Exponential Function as a Limit

In the last post we developed the theory of exponential and logarithmic function using the standard approach of defining logarithm as an integral. In this post we will examine various alternative approaches to develop a coherent theory of these functions. We will start with the most common definition of $\exp(x)$ as the limit of a specific sequence. For users of MSE this is the approach outlined in this answer on MSE.

Thus for all $x \in \mathbb{R}$ we define $$\exp(x) = \lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{n}\tag{1}$$ In order that this definition is valid we must show that the limit in question exists for all $x$. Clearly it exists for $x = 0$ and the limit is $1$ so that $\exp(0) = 1$. Let us first suppose that $x > 0$. Then we have by the binomial theorem \begin{align} F(x, n) &= \left(1 + \frac{x}{n}\right)^{n}\notag\\ &= 1 + x + \frac{n(n - 1)}{2!}\left(\frac{x}{n}\right)^{2} + \cdots + \text{ upto }n\text{ terms}\notag\\ &= 1 + x + \dfrac{\left(1 - \dfrac{1}{n}\right)}{2!}x^{2} + \dfrac{\left(1 - \dfrac{1}{n}\right)\left(1 - \dfrac{2}{n}\right)}{3!}x^{3} + \cdots\notag \end{align} It can be easily seen that as the value of $n$ increases then the number of terms in the expansion of $F(x, n)$ increases as well as the value of each term increases. It follows that for $x > 0$ the sequence $F(x, n)$ increases as $n$ increases.

Next we need to show that the sequence $F(x, n)$ is bounded above by some constant independent of $n$. Clearly we can see that $$F(x, n) \leq 1 + x + \frac{x^{2}}{2!} + \cdots + \frac{x^{n}}{n!}$$ and if the series on the right is extended as an infinite series then it is convergent. It follows that $F(x, n)$ is bounded by a fixed constant independent of $n$. Thus we can see that $F(x, n)$ is an increasing and bounded sequence and hence $\lim_{n \to \infty}F(x, n)$ exists. It follows that the definition $(1)$ for $\exp(x)$ makes sense for $x > 0$.

We next need to see that the same definition is valid for negative $x$ also. To see that we need to consider another sequence $G(x, n) = \left(1 - \dfrac{x}{n}\right)^{-n}$ for $x > 0$. We will show that this sequence decreases. Clearly we have by the binomial theorem (for negative integral exponent) \begin{align} G(x, n) &= 1 + x + \frac{n(n + 1)}{2!}\left(\frac{x}{n}\right)^{2} + \cdots\notag\\ &= 1 + x + \dfrac{\left(1 + \dfrac{1}{n}\right)}{2!}x^{2} + \dfrac{\left(1 + \dfrac{1}{n}\right)\left(1 + \dfrac{2}{n}\right)}{3!}x^{3} + \cdots\notag \end{align} Note that the above derivation is valid only when $0 < x/n < 1$ and this will be the case after a certain value of $n$. Clearly each term in the series above is decreasing as $n$ increases. It follows that for $x > 0$ the sequence $G(x, n)$ starts to decrease after a certain value of $n$ onwards. It should also be clear that after a certain value of $n$ onwards the sequence $G(x, n) \geq 1 + x$. It follows that $\lim_{n \to \infty}G(x, n)$ exists for $x > 0$ and this limit is not less than $(1 + x)$. If we note carefully we find that that $G(x, n) = 1/F(-x, n)$ and hence it follows that $\lim_{n \to \infty}F(-x, n)$ exists for all $x > 0$ and this limit is positive.

We have thus shown that $\lim_{n \to \infty}F(x, n)$ exists and is positive for all values of $x$. Thus the function $\exp(x)$ is well defined and is positive for all values of $x$. In other words $\exp(x)$ is a function from $\mathbb{R}$ to $\mathbb{R}^{+}$. We next show that $$\exp(x + y) = \exp(x)\exp(y),\,\, \exp(-x) = 1/\exp(x)\tag{2}$$ for all $x, y$. It would be better if we first show $\exp(-x) = 1/\exp(x)$ for all $x$. Clearly this holds for $x = 0$ and from the nature of the relation it is sufficient to prove it for $x > 0$. Again the functions $F(x, n), G(x, n)$ help us in a very nice way.

First we can see that $$\frac{F(x, n)}{G(x, n)} = \left(1 - \frac{x^{2}}{n^{2}}\right)^{n}$$ and after a certain value of $n$ onwards the expression on the right is less than $1$. It follows that $G(x, n) > F(x, n)$ after a certain value of $n$ onwards. Also note that $F(x, n)$ increases and tends to $\exp(x)$ as $n \to \infty$ hence $F(x, n) \leq \exp(x)$. We can now see that if $f(x, n) = G(x, n) - F(x, n)$ then \begin{align} 0 < f(x, n) &= \left(1 - \frac{x}{n}\right)^{-n} - \left(1 + \frac{x}{n}\right)^{n}\notag\\ &= \left(1 + \frac{x}{n}\right)^{n}\left\{\left(1 - \frac{x^{2}}{n^{2}}\right)^{-n} - 1\right\}\notag\\ &\leq \exp(x)\left\{\left(1 - \frac{x^{2}}{n}\right)^{-1} - 1\right\}\notag\\ &= \frac{x^{2}\exp(x)}{n - x^{2}}\notag \end{align} Letting $n \to \infty$ and using Squeeze theorem we get $\lim_{n \to \infty}G(x, n) = \lim_{n \to \infty}F(x, n)$ and noting that $G(x, n) = 1/F(-x, n)$ it follows that $\exp(-x) = 1/\exp(x)$. At the same time we have shown that $$\exp(x) = \lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{n} = \lim_{n \to \infty}\left(1 - \frac{x}{n}\right)^{-n}\tag{3}$$ Using the above result it is easy to establish the series for $\exp(x)$ namely $$\exp(x) = 1 + x + \frac{x^{2}}{2!} + \frac{x^{3}}{3!} + \cdots\tag{4}$$ The result is clearly true for $x = 0$. For $x > 0$ let us define $$E(x, n) = 1 + x + \frac{x^{2}}{2!} + \cdots + \frac{x^{n}}{n!}$$ so that the limit $\lim_{n \to \infty}E(x, n) = E(x)$ gives the infinite series on the right side of equation $(4)$. From the expansions of $F(x, n), G(x, n)$ via binomial theorem we can easily see that $$F(x, n) \leq E(x, n) \leq G(x, n)$$ Letting $n \to \infty$ and noting that $\lim_{n \to \infty}F(x, n) = \lim_{n \to \infty}G(x, n) = \exp(x)$ we see that $\exp(x) = \lim_{n \to \infty}E(x, n)$. To establish the same result for $x < 0$ we need only note that the infinite series $E(x)$ satisfies the relation $E(x)E(-x) = 1$ for all $x$ and this is clearly matching the relation $\exp(x)\exp(-x) = 1$.

We now establish $\exp(x + y) = \exp(x)\exp(y)$. Note that since $\exp(-x) = 1/\exp(x)$ we only need to establish the relation for positive values of $x, y$. It holds trivially if $ x = 0$ or $y = 0$. Let us consider the function $f(x, y, n) = F(x, n)F(y, n) - F(x + y, n)$. We have \begin{align} 0 < f(x, y, n) &= \left(1 + \frac{x}{n}\right)^{n}\left(1 + \frac{y}{n}\right)^{n} - \left(1 + \frac{x + y}{n}\right)^{n}\notag\\ &= \left(1 + \frac{x + y}{n} + \frac{x^{2}}{n^{2}}\right)^{n} - \left(1 + \frac{x + y}{n}\right)^{n}\notag\\ &= \left(1 + \frac{x + y}{n}\right)^{n}\left\{\left(1 + \frac{xy}{n(n + x + y)}\right)^{n} - 1\right\}\notag\\ &< \exp(x + y)\left\{\left(1 + \frac{xy}{n^{2}}\right)^{n} - 1\right\}\notag\\ &= \exp(x + y)\left\{\frac{xy}{n} + \frac{(1 - 1/n)}{2!}\left(\frac{xy}{n}\right)^{2} + \cdots\right\}\notag\\ &< \exp(x + y)\left\{\frac{xy}{n} + \left(\frac{xy}{n}\right)^{2} + \cdots\right\}\notag\\ &= \frac{xy\exp(x + y)}{n - xy}\notag \end{align} and letting $n \to \infty$ we see that $f(x, y, n) \to 0$ so that $\lim_{n \to \infty}F(x, n)F(y, n) = \lim_{n \to \infty}F(x + y, n)$ and we have thus established that $\exp(x)\exp(y) = \exp(x + y)$.

We can now easily prove the fundamental limit $$\lim_{x \to 0}\frac{\exp(x) - 1}{x} = 1\tag{5}$$ We first establish the limit when $x \to 0^{+}$. We have \begin{align} \lim_{x \to 0^{+}}\frac{\exp(x) - 1}{x} &= \lim_{x \to 0^{+}}\dfrac{{\displaystyle \lim_{n \to \infty}\left(1 + \dfrac{x}{n}\right)^{n} - 1}}{x}\notag\\ &= \lim_{x \to 0^{+}}\lim_{n \to \infty}\frac{1}{x}\left\{\left(1 + \dfrac{x}{n}\right)^{n} - 1\right\}\notag\\ &= \lim_{x \to 0^{+}}\lim_{n \to \infty}\frac{1}{x}\left\{\left(1 + x + \dfrac{(1 - 1/n)}{2!}x^{2} + \cdots\right) - 1\right\}\notag\\ &= \lim_{x \to 0^{+}}\lim_{n \to \infty}\left(1 + \dfrac{(1 - 1/n)}{2!}x + \dfrac{(1 - 1/n)(1 - 2/n)}{3!}x^{2} + \cdots\right)\notag\\ &= \lim_{x \to 0^{+}}\lim_{n \to \infty}(1 + \psi(x, n))\notag \end{align} where $\psi(x, n)$ is a finite sum defined by $$\psi(x, n) = \frac{(1 - 1/n)}{2!}x + \cdots + \frac{(1 - 1/n)(1 - 2/n)\cdots(1 - (n - 1)/n)}{n!}x^{n - 1}$$ For fixed positive $x$, the function $\psi(x, n)$ is increasing as $n$ increases and it is clearly bounded above by the convergent series $$F(x) = \frac{x}{2!} + \frac{x^{2}}{3!} + \cdots$$ Hence $\lim_{n \to \infty}\psi(x, n) = \psi(x)$ exists and we have $0 \leq \psi(x) \leq F(x)$. Again we can see that if $0 < x < 2$ then $$F(x) \leq \frac{x}{2} + \frac{x^{2}}{2^{2}} + \cdots = \frac{x}{2 - x}$$ Taking limits as $x \to 0^{+}$ we see that $F(x) \to 0$ as $x \to 0^{+}$. Hence $\lim_{x \to 0^{+}}\psi(x) = 0$. We thus have \begin{align} \lim_{x \to 0^{+}}\frac{\exp(x) - 1}{x} &= \lim_{x \to 0^{+}}\lim_{n \to \infty}1 + \psi(x, n)\notag\\ &= \lim_{x \to 0^{+}}1 + \psi(x)\notag\\ &= 1 + 0 = 1\notag \end{align} This result also implies that $\exp(x) = 1 + x\left(\dfrac{\exp(x) - 1}{x}\right) \to 1 + 0\cdot 1 = 1$ as $x \to 0^{+}$. Now let us assume $x \to 0^{-}$ and put $x = -y$ so that $y \to 0^{+}$. We then have \begin{align} \lim_{x \to 0^{-}}\frac{\exp(x) - 1}{x} &= \lim_{y \to 0^{+}}\frac{1 - \exp(-y)}{y}\notag\\ &= \lim_{y \to 0^{+}}\frac{\exp(y) - 1}{y\exp(y)} = 1/1 = 1\notag \end{align} It has thus been established that $(\exp(x) - 1)/x \to 1$ as $x \to 0$.

We then come to fundamental result $$\frac{d}{dx}\{\exp(x)\} = \exp(x)\tag{6}$$ We have \begin{align} \{\exp(x)\}' &= \lim_{h \to 0}\frac{\exp(x + h) - \exp(x)}{h}\notag\\ &= \lim_{h \to 0}\frac{\exp(x)\exp(h) - \exp(x)}{h}\notag\\ &= \exp(x)\lim_{h \to 0}\frac{\exp(h) - 1}{h}\notag\\ &= \exp(x)\cdot 1 = \exp(x)\notag \end{align} and since $\exp(x) > 0$ for all $x$ it follows that $\exp(x)$ is strictly increasing, continuous and differentiable for all $x$. This allows us to define the inverse function $y = \log x$ if $x = \exp(y)$ and using rules of differentiation of inverse functions we can show that $(\log x)' = 1/x$. Once the derivatives for these functions are established it is an easy exercise to establish their common properties and thus I leave it to the reader to establish these results.

An alternative approach based on the same definition $(1)$ is available in my answer on MSE. This approach is simpler compared to the one provided above and hence I reproduce it here with some more details. The starting point is to establish that the limit in $(1)$ exists for any positive integer $x$ and this is first done for $x = 1$. Clearly we know that the sequence $F(x, n)$ is increasing and for $x = 1$ it is bounded above by the sum $$1 + 1 + \frac{1}{2!} + \cdots + \frac{1}{n!}$$ which is less than $$1 + 1 + \frac{1}{2} + \frac{1}{2^{2}} + \cdots = 3$$ Therefore the sequence $F(1, n)$ tends to a limit which is normally denoted by $e$ and clearly $2 < e < 3$.

Next we show that that if $x$ is any positive integer then $F(x, n)$ tends to limit $e^{x}$. We assume $x > 1$ as the case $x = 1$ is already handled. This is done via simple algebraic manipulation as shown below \begin{align} \lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{n} &= \lim_{n \to \infty}\left(\frac{n + 1}{n}\cdot\frac{n + 2}{n + 1}\cdots \frac{n + x}{n + x - 1}\right)^{n}\notag\\ &= \lim_{n \to \infty}\left(1 + \frac{1}{n}\right)^{n}\left(1 + \frac{1}{n + 1}\right)^{n}\cdots\left(1 + \frac{1}{n + x - 1}\right)^{n}\notag\\ \end{align} Each term (except the first) in the product on right is of the form $$\left(1 + \frac{1}{n + i}\right)^{n}$$ where $i$ is a positive integer independent of $n$. This can be written as $$\dfrac{\left(1 + \dfrac{1}{n + i}\right)^{n + i}}{\left(1 + \dfrac{1}{n + i}\right)^{i}}$$ Clearly the numerator tends to $e$ and denominator tends to $1$ so that each term $$\left(1 + \frac{1}{n + i}\right)^{n}$$ tends to $e$. Since the number of such terms is $x$ it follows that the desired limit is $e^{x}$.

Next we need to establish the result that $F(x, n)$ tends to a limit even when $x$ is not a positive integer. First we deal with real number $x > 0$. Let $m$ be a positive integer with $m > x$. Then clearly $F(x, n) < F(m, n)$. Since $F(m, n)$ is strictly increasing and tends to $e^{m}$ (shown in last paragraph), it follows that $F(m, n) < e^{m}$ and therefore $F(x, n) < e^{m}$. Thus $F(x, n)$ is bounded above and tends to a limit. Moreover since $F(x, n)$ is strictly increasing it follows that the limit is not less than $F(x, 1) = 1 + x$. Therefore we have $\exp(x) \geq 1 + x$ for all $x > 0$.

Next we handle the case when $x < 0$. Let $x = -y$ so that $y > 0$ so that $F(y, n)$ tends to a positive limit denoted by $\exp(y)$. Consider the product $$F(x, n) F(y, n) = \left(1 - \frac{x^{2}}{n^{2}}\right)^{n}$$ If $n > |x|$ then by Bernoulli's inequality we have $$1 - \frac{x^{2}}{n} \leq \left(1 - \frac{x^{2}}{n^{2}}\right)^{n} \leq 1$$ and hence by Squeeze theorem the product $F(x, n)F(y, n)$ tends to $1$ as $n \to \infty$. It now follows that $F(x, n)$ tends to a positive limit $1/\exp(y)$. We have thus shown that the limit $(1)$ exists for all $x$ and is positive. At the same time we have also proved that $\exp(x)\exp(-x) = 1$ for all $x$.

Next let $0 < x < 1$ and then $\exp(-x) = 1/\exp(x)$ so that $$\exp(-x) = \lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{-n} = \lim_{n \to \infty}1/F(x, n)$$ and by the general binomial theorem for any index we can see that $$\frac{1}{F(x, n)} \geq 1 - x$$ so that $\exp(-x) \geq 1 - x$ for all $0 < x < 1$. We thus have the inequality $$\exp(x) \geq 1 + x$$ for all $x \in (-1, \infty)$. We are now ready to prove the fundamental limit concerning the exponential function namely that $(\exp(x) - 1)/x \to 1$ as $x \to 0$.

Let $0 < x < 1$ then we have $$\exp(x) \geq 1 + x,\, \exp(-x) \geq 1 - x$$ which shows that $$\frac{\exp(x) - 1}{x} \geq 1,\, \exp(x) \leq \frac{1}{1 - x}$$ The last inequality leads us to $$\frac{\exp(x) - 1}{x} \leq \frac{1}{1 - x}$$ and thus we have $$1 \leq \frac{\exp(x) - 1}{x} \leq \frac{1}{1 - x}$$ for $0 < x < 1$. Letting $x \to 0^{+}$ we see that $$\lim_{x \to 0^{+}}\frac{\exp(x) - 1}{x} = 1$$ This limit also shows that $\exp(x) \to 1$ as $x \to 0^{+}$. Using this fact and the relation $\exp(x)\exp(-x) = 1$ it is easy to prove that $(\exp(x) - 1)/x \to 1$ as $x \to 0^{-}$. Thus the fundamental limit concerning the exponential function is established.

Next we prove the functional equation $\exp(x + y) = \exp(x)\exp(y)$ for all $x, y$. To prove this we need the following result:
Lemma: If $a_{n}$ is a sequence of real or complex terms such that $n(a_{n} - 1) \to 0$ as $n \to \infty$ then $a_{n}^{n} \to 1$ as $n \to \infty$.

To prove the lemma let us write $b_{n} = a_{n} - 1$ so that $nb_{n} \to 0$. We have $$a_{n}^{n} = (1 + b_{n})^{n} = 1 + nb_{n} + \frac{n(n - 1)}{2}b_{n}^{2} + \cdots$$ It follows that $$|a_{n}^{n} - 1| \leq |nb_{n}| + \frac{|nb_{n}|^{2}}{2} + \frac{|nb_{n}|^{3}}{2^{2}} + \cdots \leq \dfrac{|nb_{n}|}{1 - \dfrac{|nb_{n}|}{2}}$$ Since $nb_{n} \to 0$ it follows from the above inequality that $a_{n}^{n} \to 1$.

We now set $$a_{n} = \dfrac{\left(1 + \dfrac{x + y}{n}\right)}{\left(1 + \dfrac{x}{n}\right)\left(1 + \dfrac{y}{n}\right)}$$ and it is easy to check that $n(a_{n} - 1) \to 0$ and hence $a_{n}^{n} \to 1$. Thus we get $\exp(x + y) = \exp(x)\exp(y)$. Using this functional equation and the limit $\dfrac{\exp(x) - 1}{x} \to 1$ as $x \to 0$ we can prove as before that derivative of $\exp(x)$ is $\exp(x)$ itself. And from the positive derivative we get the strictly increasing nature of $\exp(x)$ and existence of inverse function $\log x$ and thus theory of these functions can be developed completely.

Approach Based on Differential Equations

Another approach which has its origins in this answer at MSE is very surprising and novel. It begins with the definition of $\exp(x)$ as the unique solution to the differential equation $$\frac{dy}{dx} = y, \,\, y(0) = 1\tag{7}$$ The fundamental challenge is to show that there exists such a solution. We thus need to find a function $f(x)$ such that $f'(x) = f(x)$ and $f(0) = 1$. We try to analyze the properties of such a function $f(x)$.

We first show that $f(x) > 0$ for all $x$. Let $g(x) = f(x)f(-x)$ so that $$g'(x) = f'(x)f(-x) - f(x)f'(-x) = f(x)f(-x) - f(x)f(-x) = 0$$ so that $g(x)$ is a constant and $g(x) = g(0) = f(0)f(0) = 1$. Thus $f(x)f(-x) = 1$ and hence $f(x) \neq 0$ for all $x$. It follows by continuity that $f(x)$ must be of constant sign and since $f(0) > 0$ it follows that $f(x) > 0$ for all $x$.

This means that $f'(x) = f(x) > 0$ so that $f(x)$ is strictly increasing, continuous and differentiable for all $x$ and hence possesses an inverse $F(x)$ such that $x = f(y)$ implies $y = F(x)$. Clearly by the rules of differentiation we get $F'(x) = 1/x$ and $F(1) = 0$ so that we have $F(x) = \int_{1}^{x}(dt/t)$ and then we reach the definition of $\log x = F(x)$ as an integral and derive all the properties of $F(x)$ and $f(x)$. This shows that there is a genuine function $f(x)$ which satisfies the differential equation $(7)$.

For the uniqueness part if we assume that there are two solutions $f(x), h(x)$ then their difference $\phi(x) = f(x) - h(x)$ satisfies $\phi'(x) = \phi(x)$ and $\phi(0) = 0$. We will show that such a function must be identically zero. Let us suppose on the contrary that there is a number $a$ for which $\phi(a) \neq 0$. Consider the function $\psi(x) = \phi(a + x)\phi(a - x)$ and then we have $$\psi'(x) = \phi'(a + x)\phi(a - x) - \phi(a + x)\phi'(a - x) = 0$$ so that $\psi(x)$ is a constant and $$\phi(a + x)\phi(a - x) = \psi(x) = \psi(0) = \phi(a)\phi(a) > 0$$ Putting $x = a$ we get $\phi(2a)\phi(0) > 0$ which contradicts $\phi(0) = 0$. This shows that equation $(7)$ has a unique solution and the definition of $\exp(x)$ is unambiguous. Since we have found the derivatives of $\exp(x)$ and its inverse we may easily deduce all the properties of these functions.

Logarithm as a Limit

Another approach which is an extension of an exercise problem in Hardy's Pure Mathematics comes from the equation $$\log x = \lim_{n \to \infty}n(\sqrt[n]{x} - 1) = \lim_{n \to \infty}n(x^{1/n} - 1)\tag{8}$$ and we take it as the definition of $\log x$ for $x > 0$. Clearly we must show that the limit above exists for $x > 0$. This is bit tricky, but not that difficult either. We first show that $\sqrt[n]{x} \to 1$ otherwise the limit in $(8)$ won't exist.

This is clearly the case when $x = 1$, so let us take the case $x > 1$. Then we can see that $\sqrt[n]{x} > 1$ and therefore let $\sqrt[n]{x} = 1 + h$ where $h > 0$. Note that $h$ depends on $x$ as well as $n$. We then have $x = (1 + h)^{n} \geq 1 + nh$ so we have $$0 < h \leq \frac{x - 1}{n}$$ Taking limits as $n \to \infty$ we see that $h \to 0$ so that $\sqrt[n]{x} = 1 + h \to 1$. If $0 < x < 1$ then we can write $x = 1/y$ where $y > 1$. Then we can see that $\sqrt[n]{x} = 1/\sqrt[n]{y} \to 1/1 = 1$ as $n \to \infty$. Hence we have established that $\sqrt[n]{x} \to 1$ as $n \to \infty$ when $x > 0$.

Next we need to establish some inequalities. Let $\alpha, \beta$ be two numbers with $0 < \beta < 1 < \alpha$ and let $r$ be a positive integer. We can easily see that for $i = 0, 1, 2, \ldots, (r - 1)$ we have $\alpha^{i} < \alpha^{r}$ and adding these inequalities we get $$r\alpha^{r} > 1 + \alpha + \alpha^{2} + \cdots + \alpha^{r - 1}$$ Multiplying by $(\alpha - 1) > 0$ we get $$r\alpha^{r}(\alpha - 1) > \alpha^{r} - 1$$ Next we add $r(\alpha^{r} - 1)$ to both sides to get $$r(\alpha^{r + 1} - 1) > (r + 1)(\alpha^{r} - 1)$$ and thus we obtain $$\frac{\alpha^{r + 1} - 1}{r + 1} > \frac{\alpha^{r} - 1}{r}\text{ where} \alpha > 1\tag{9}$$ Similarly we can show that $$\frac{1 - \beta^{r + 1}}{r + 1} < \frac{1 - \beta^{r}}{r}\text{ where }0 < \beta < 1\tag{10}$$ It now follows that if $r, s$ are positive integers with $r > s$ then $$\frac{\alpha^{r} - 1}{r} > \frac{\alpha^{s} - 1}{s},\,\,\frac{1 - \beta^{r}}{r} < \frac{1 - \beta^{s}}{s}\tag{11}$$ where $0 < \beta < 1 < \alpha$.

Next we show that the above inequalities remain valid even when $r, s$ are positive rational numbers with $r > s$. Let $r = a/b, s = c/d$ where $a, b, c, d$ are positive integers with $ad > bc$. Let $\alpha = \gamma^{bd}$ so that $\gamma$ is the $bd^{\text{th}}$ root of $\alpha$. Then $\gamma > 1$ and by $(11)$ we have $$\frac{\gamma^{ad} - 1}{ad} > \frac{\gamma^{bc} - 1}{bc}$$ and further multiplication by $bd$ gives $$\frac{\alpha^{r} - 1}{r} > \frac{\alpha^{s} - 1}{s}$$ In the same manner we can show the inequality for $\beta$. It is now established that the inequalities $(11)$ are valid for all rational $r, s$ with $0 < s < r$. Putting $s = 1$ we get $$\alpha^{r} - 1 > r(\alpha - 1),\,\, 1 - \beta^{r} < r(1 - \beta)\tag{12}$$ where $r > 1$. Again putting $r = 1$ in $(10)$ we get $$\alpha^{s} - 1 < s(\alpha - 1),\,\,1 - \beta^{s} > s(1 - \beta)\tag{13}$$ where $0 < s < 1$. We will go one step further and note that if $\alpha > 1$ then $0 < 1/\alpha < 1$ and if $0 < \beta < 1$ then $1/\beta > 1$. Hence in the inequalities $(12), (13)$ we can replace $\alpha$ by $1/\beta$ and $\beta$ by $1/\alpha$. Doing so we get $$\alpha^{r} - 1 < r\alpha^{r - 1}(\alpha - 1),\,\, 1 - \beta^{r} > r\beta^{r - 1}(1 - \beta)$$ and $$\alpha^{s} - 1 > s\alpha^{s - 1}(\alpha - 1),\,\, 1 - \beta^{s} < s\beta^{s - 1}(1 - \beta)$$ We thus finally obtain $$r\alpha^{r - 1}(\alpha - 1) > \alpha^{r} - 1 > r(\alpha - 1),\,\,s\alpha^{s - 1}(\alpha - 1) < \alpha^{s} - 1 < s(\alpha - 1)\tag{14}$$ and $$r\beta^{r - 1}(1 - \beta) < 1 - \beta^{r} < r(1 - \beta),\,s\beta^{s - 1}(1 - \beta) > 1 - \beta^{s} > s(1 - \beta)\tag{15}$$ We are now ready to analyze the limit of function $\phi(x, n) = n(x^{1/n} - 1)$ which is used in definition $(8)$. Clearly if $x = 1$ then $\phi(x, n) = 0$ and hence the limit is $0$. Let us suppose that $x > 1$ then from inequality $(11)$ (putting $\alpha = x, r = 1/n, s = 1/(n + 1)$) we see that $\phi(x, n) > \phi(x, n + 1)$ so that the function $\phi(x, n)$ is strictly decreasing as $n$ increases. Also note that the function $\phi(x, n) > 0$ if $x > 1$. It follows that the limit of $\phi(x, n)$ exists as $n \to \infty$. If in $(14)$ we put $\alpha = x, s = 1/n$ then we can see that $$\phi(x, n) > \sqrt[n]{x}\left(1 - \frac{1}{x}\right) > 1 - \frac{1}{x}$$ so it follows that $\phi(x, n) \to l$ where $l \geq 1 - (1/x) > 0$. Hence $\phi(x, n)$ tends to a positive limit as $n \to \infty$. It follows that $\log x = \lim_{n \to \infty}n(\sqrt[n]{x} - 1)$ is defined for $x \geq 1$ and $\log 1 = 0$ and $\log x > 0$ if $x > 1$.

To analyze the limit $(8)$ when $0 < x < 1$ we put $x = 1/y$ so that $y > 1$ and then $\phi(x, n) = n(1 - \sqrt[n]{y})/\sqrt[n]{y} = -\phi(y, n)/\sqrt[n]{y}$ and thus $\phi(x, n) \to -\log y = -\log(1/x) < 0$. Thus we see that the function $\log x$ is defined for all $x > 0$ and is negative when $0 < x < 1$, positive when $x > 1$ and we also have $\log (1/x) = -\log x$ for all $x > 0$. We next establish the fundamental property of $\log x$ namely $$\log (xy) = \log x + \log y\tag{16}$$ Clearly we have \begin{align} \log(xy) &= \lim_{n \to \infty}n(\sqrt[n]{xy} - 1)\notag\\ &= \lim_{n \to \infty}n(\sqrt[n]{xy} - \sqrt[n]{y} + \sqrt[n]{y} - 1)\notag\\ &= \lim_{n \to \infty}n\sqrt[n]{y}(\sqrt[n]{x} - 1) + n(\sqrt[n]{y} - 1)\notag\\ &= 1\cdot \log x + \log y = \log x + \log y\notag \end{align} Replacing $y$ by $1/y$ and noting that $\log (1/y) = -\log y$ we can see that $\log(x/y) = \log x - \log y$. If $ x > y > 0$ then we can see that $x / y > 1$ and then $\log (x/y) > 0$ i.e. $\log x > \log y$ so that $\log x$ is a strictly increasing function of $x$ for all $x > 0$.

We next need to establish the derivative of $\log x$. We first establish the limit formula $$\lim_{x \to 1}\frac{\log x}{x - 1} = 1\text{ or equivalently }\lim_{x \to 0}\frac{\log(1 + x)}{x} = 1\tag{17}$$ We establish the first form which deals with $x \to 1$ and first let's focus on $x \to 1^{+}$ so that $x > 1$. Putting $\alpha = x, s = 1/n$ in $(14)$ we get $$x^{1/n}\cdot\frac{x - 1}{x} < n(\sqrt[n]{x} - 1) < x - 1$$ Taking limits as $n \to \infty$ we get $$\frac{x - 1}{x}\leq \log x \leq x - 1$$ and diving by $(x - 1) > 0$ we get $$\frac{1}{x}\leq \frac{\log x}{x - 1} \leq 1$$ and letting $x \to 1^{+}$ we get our desired limit as $1$. If $x \to 1^{-}$ so that $0 < x < 1$ then we put $x = 1/y$ and $y \to 1^{+}$. Then we have $$\lim_{x \to 1^{-}}\frac{\log x}{x - 1} = \lim_{y \to 1^{+}}\frac{\log(1/y)}{(1/y) - 1} = \lim_{y \to 1^{+}}\frac{y\log y}{y - 1} = 1\cdot 1 = 1$$ Thus the limit formula $(17)$ is established. It is now an easy matter to calculate derivative of $\log x$. We have \begin{align} \{\log x\}' &= \lim_{h \to 0}\frac{\log (x + h) - \log x}{h}\notag\\ &= \lim_{h \to 0}\dfrac{\log \left(\dfrac{x + h}{x}\right)}{h}\notag\\ &= \lim_{h \to 0}\dfrac{\log \{1 + (h/x)\}}{h/x}\cdot\frac{1}{x}\notag\\ &= 1\cdot\frac{1}{x} = \frac{1}{x}\notag \end{align} Once we have established the derivative of $\log x$ it is now an easy route to other properties of $\log x$ and defining $\exp(x)$ as the inverse of $\log x$. However I must mention one point about inverses. Suppose that $\phi(x, n) = n(\sqrt[n]{x} - 1) = y$ then we automatically get $x = \left(1 + \dfrac{y}{n}\right)^{n} = F(y, n)$ so that the functions $\phi(x, n)$ and $F(x, n)$ are natural inverses of each other and they maintain this relationship even in the limit when $n \to \infty$ each giving rise to $\log x$ and $\exp(x)$ respectively.

Approach Based on Infinite Series

Another approach to the theory of exponential and logarithmic functions defines $$\exp(x) = 1 + x + \frac{x^{2}}{2!} + \frac{x^{3}}{3!} + \cdots\tag{18}$$ but it kind of gives everything very easily. The property $\exp(x + y) = \exp(x)\exp(y)$ follows immediately by multiplication of infinite series and the binomial theorem. Also the fundamental limit $(\exp(x) - 1)/x \to 1$ is almost obvious and then the derivative formula $\{\exp(x)\}' = \exp(x)$ follows. I will prove that $\exp(x) \neq 0$ for any $x$. Clearly by the functional relation $\exp(x + y)=\exp(x)\exp(y)$ it follows that if $\exp(x)$ vanishes at one point then it vanishes everywhere and since $\exp(0) = 1$ it follows that $\exp(x)\neq 0$ for all $x$. Now it is easy to see that $\exp(x) = \exp(x/2)\exp(x/2) > 0$ for all $x$. The equation $(\exp(x))' = \exp(x) > 0$ gives the strictly increasing nature of $\exp(x)$ and thus the existence of its inverse. Readers are advised to complete the theory based on this approach.

In the next post we will provide the most elementary approach which starts by defining the general power $a^{b}$ directly for $a > 0$ and any real $b$. This approach is the most difficult one to handle but it is more appealing to those students who have not been introduced to calculus.

Print/PDF Version

Theories of Exponential and Logarithmic Functions: Part 1

2014-05-08T14:00:00.000+05:30

In the past few months I saw a lot of questions on MSE regarding exponential and logarithmic functions. Most students were more used to the idea of defining $e$ by $\lim\limits_{n \to \infty}\left(1 + \dfrac{1}{n}\right)^{n}$ and then defining the exponential function as $e^{x}$. I tried to answer some of these questions and based on the suggestion of a user, I am trying to consolidate my answers into a series of posts here. One thing which I must mention here is that most students do have an intuitive idea of the exponential and logarithmic functions but many lack a sound theoretical foundation. In this series of posts I will provide multiple approaches to develop a theory of exponential and logarithmic functions. We will restrict ourselves to real variables only.

Properties of Exponential and Logarithmic Functions

Let us first revise common properties of these functions which any coherent theory must explain and establish. I first list down the properties of exponential function $\exp(x)$:

$\exp(x)$ is a function from $\mathbb{R}$ to $\mathbb{R}^{+}$ which is strictly increasing, continuous and differentiable for all $x$.
$\exp(0) = 1, \exp(x + y) = \exp(x)\exp(y), \exp(-x) = 1/\exp(x)$ for all $x, y \in \mathbb{R}$.
$e = \exp(1)$ and $e = \lim\limits_{n \to \infty}\left(1 + \dfrac{1}{n}\right)^{n}$
For all rational $x$ we have $\exp(x) = e^{x}$ so that the function $\exp(x)$ can be used to define irrational exponents for a specific base $e$.
$\lim\limits_{x \to 0}\dfrac{\exp(x) - 1}{x} = 1$
$\dfrac{d}{dx}\{\exp(x)\} = \exp(x)$
$\exp(x) = \lim\limits_{n \to \infty}\left(1 + \dfrac{x}{n}\right)^{n}$ for all $x \in \mathbb{R}$
$\displaystyle \exp(x) = 1 + x + \dfrac{x^{2}}{2!} + \dfrac{x^{3}}{3!} + \cdots = \sum_{n = 0}^{\infty}\frac{x^{n}}{n!}$ for all $x \in \mathbb{R}$

Next we list down the properties of logarithmic function:

$\log x$ is a function from $\mathbb{R}^{+}$ to $\mathbb{R}$ which is strictly increasing, continuous and differentiable for all $x > 0$.
$\log 1 = 0, \log(xy) = \log x + \log y, \log (1/x) = -\log x$ for all $x, y > 0$
$\log e = 1$
$\log a^{b} = b \log a$ for $a > 0$ and rational $b$ so that $a^{b}$ can be defined for irrational $b$ as $\exp(b\log a)$.
$\lim\limits_{x \to 0}\dfrac{\log (1 + x)}{x} = 1$
$\dfrac{d}{dx}\{\log x\} = \dfrac{1}{x}$
$\log x = \lim\limits_{n \to \infty}n(\sqrt[n]{x} - 1)$
$\displaystyle \log (1 + x) = x - \frac{x^{2}}{2} + \frac{x^{3}}{3} - \cdots = \sum_{n = 1}^{\infty}(-1)^{n - 1}\frac{x^{n}}{n}$ for all $x$ satisfying $-1 < x \leq 1$.

Once we have defined $a^{x}$ for $a > 0$ and all $x$ by $\exp(x\log a)$ we have additional properties of the general exponential function $a^{x}$:

$a^{x + y} = a^{x}a^{y}$ for all $x, y \in \mathbb{R}$
$\lim\limits_{x \to 0}\dfrac{a^{x} - 1}{x} = \log a$
$\dfrac{d}{dx}\{a^{x}\} = a^{x}\log a$
$\lim\limits_{h \to 0}(1 + xh)^{1/h} = \exp(x)$
$\lim\limits_{x \to \infty}\dfrac{\log x}{x^{a}} = 0$ for all $a > 0$.
$\lim\limits_{x \to \infty}\dfrac{x^{a}}{\exp(x)} = 0$ for all $a > 0$.

I have listed down the elementary properties of exponential and logarithmic functions and any coherent theory of these functions must establish these properties in a non-circular fashion. Note that the functions $\exp(x)$ and $\log x$ are inverses to each other and hence many of the properties of $\log x$ can be deduced via the corresponding properties of $\exp(x)$ and vice-versa. Hence we only need to establish one of the two corresponding properties. We now start with the easiest, simplest, but non-intuitive theory.

Definition of Logarithm as an Integral

This is the most usual approach found in books on "mathematical analysis", but when I studied this in Hardy's Pure Mathematics I was really amazed at its beauty and elegance. We start by defining $\log x$ as $$\log x = \int_{1}^{x}\frac{dt}{t}\tag{1}$$ for all $x > 0$. Clearly this definition is valid because if $x > 0$ then the function $1/t$ is defined and continuous in $[1, x]$ (or $[x, 1]$ if $x < 1$) and a continuous function is integrable. We immediately get some of the properties of $\log x$ namely $$\log 1 = 0, \frac{d}{dx}\{\log x\} = \frac{1}{x}\tag{2}$$ so that the derivative $(\log x)' = 1/x > 0$ and therefore $\log x$ is strictly increasing for $x > 0$. It follows that if $x < 1$ then $\log x < \log 1 = 0$ and if $x > 1$ then $\log x > \log 1 = 0$. Thus $\log x$ is negative if $0 < x < 1$ and positive if $x > 1$.

Next we see that the derivative of $\log x $ at $x = 1$ is $1/1 = 1$ and hence $$\lim_{x \to 1}\frac{\log x}{x - 1} = 1,\,\lim_{x \to 0}\dfrac{\log(1 + x)}{x} = 1\tag{3}$$ A novice reader may find this definition of $\log x$ sort of heavily crafted to meet the needs of continuity and differentiability and probably far removed from the basic functional property of $\log x$ namely $\log (xy) = \log x + \log y$. However it turns out that it is a matter of simple calculus: \begin{align} \log (xy) &= \int_{1}^{xy}\frac{dt}{t}\notag\\ &= \int_{1}^{x}\frac{dt}{t} + \int_{x}^{xy}\frac{dt}{t}\notag\\ &= \log x + \int_{1}^{y}\frac{d(vx)}{vx}\text{ (putting }t = vx)\notag\\ &= \log x + \int_{1}^{y}\frac{dv}{v}\notag\\ &= \log x + \log y\tag{4} \end{align} Putting $y = 1/x$ and noting that $\log 1 = 0$ we get $\log (1/x) = -\log x$. Next we define the number $e$ by $\log e = 1$. Since $\log x$ is strictly increasing the definition of $e$ is unambiguous and we have $e > 1$. Let us then analyze the expression $\log a^{b}$. First we start with $b$ as a positive integer. Then we have \begin{align} \log a^{b} &= \log (a\cdot a\cdots b\text{ times }\cdot a)\notag\\ &= \log a + \log a + \cdots b\text{ times}\notag\\ &= b\log a\notag \end{align} If $b = 0$ then $\log a^{b} = \log a^{0} = \log 1 = 0 = 0 \log a = b\log a$ so that the relation $\log a^{b} = b\log a$ holds for all non-negative integers $b$. If on the other hand $b$ is a negative integer, say $b = -m$ then $\log a^{b} = \log a^{-m} = \log (1/a^{m}) = -\log a^{m} = -m\log a = b\log a$ so that the desired relation is valid for negative integers $b$.

Suppose that $b$ is a rational number, say $b = p/q$ where $p$ is an integer and $q$ is a positive integer. Then we have $p\log a = \log a^{p} = \log (a^{p/q})^{q} = q\log a^{b}$ so that $\log a^{b} = (p/q)\log a = b\log a$. Thus the relation $$\log a^{b} = b\log a\tag{5}$$ holds for all $a > 0$ and all rational numbers $b$.

We now analyze the behavior of $\log x$ as $x \to \infty$. Suppose $N > 0$ is any pre-assigned number. Now note that $\log 2 > 0$ and hence we can choose an integer $n > 0$ such that $n > N/\log 2 > 0$ so that $n\log 2 > N$. Let us now choose $x > 2^{n}$ so that $\log x > \log 2^{n} = n\log 2 > N$. It thus follows that $\log x \to \infty$ as $x \to \infty$. Replacing $x $ by $1/x$ we can see that $\log x \to -\infty$ as $x \to 0^{+}$.

We now move to the exponential function. Exponential function $\exp(x)$ is defined as the inverse of the logarithm function defined above. Thus we write $y = \exp(x)$ if $\log y = x$. Note that the inverse function exists and is strictly increasing, continuous, differentiable in the domain because the original function $\log $ is strictly increasing, continuous and differentiable. Also $\exp(x)$ is a function from $\mathbb{R}$ to $\mathbb{R}^{+}$. Immediate consequences of this definition are $$\exp(0) = 1, \exp(1) = e, \exp(x + y) = \exp(x)\exp(y)\tag{6}$$ for all $x, y$. The derivative of $\exp(x)$ can be calculated via the technique of differentiation of inverse functions: $$y = \exp(x) \Rightarrow \log y = x \Rightarrow \frac{dx}{dy} = \frac{1}{y}\Rightarrow \frac{dy}{dx} = y = \exp(x)$$ so that the exponential function $\exp(x)$ is its own derivative. Therefore the derivative of $\exp(x)$ at $x = 0$ is $\exp(0) = 1$ so that $$\lim_{x \to 0}\frac{\exp(x) - 1}{x} = 1\tag{7}$$
Let's now establish the fundamental limit formula for $\exp(x)$ namely $$\exp(x) = \lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{n}\tag{8}$$ This is easy to establish if we observe that $\log$ function is continuous and therefore \begin{align} \log\left\{\lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{n}\right\} &= \lim_{n \to \infty}\left\{\log\left(1 + \frac{x}{n}\right)^{n}\right\}\notag\\ &= \lim_{n \to \infty}n\log\left(1 + \frac{x}{n}\right)\notag\\ &= \lim_{n \to \infty}x\cdot\frac{\log\{1 + (x/n)\}}{x/n}\notag\\ &= x\cdot 1 = x\notag \end{align} and then $$\lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^{n} = \exp(x)$$ In exactly the same manner we can prove that $$\lim_{n \to \infty}\left(1 - \frac{x}{n}\right)^{-n} = \exp(x)\tag{9}$$ and thus we obtain the link to the common definition of $e$ as $$e = \exp(1) = \lim_{n \to \infty}\left(1 + \frac{1}{n}\right)^{n}$$
We now use the Taylor's Theorem namely $$f(a + h) = f(a) + hf'(a) + \frac{h^{2}}{2!}f''(a) + \cdots + \frac{h^{n - 1}}{(n - 1)!}f^{(n - 1)}(a) + \frac{h^{n}}{n!}f^{(n)}(a + \theta h)$$ where $f^{(n)}$ exists in a certain neighbourhood of $a$ (with $(a + h)$ also lying in the same neighborhood) and $\theta$ being some number in $(0, 1)$ to get the series for $\exp(x)$. We just need to replace $h$ by $x$ and put $a = 0$ and note that the remainder term tends to zero as $n \to \infty$. Thus we get $$\exp(x) = 1 + x + \frac{x^{2}}{2!} + \cdots + \frac{x^{n}}{n!} + \cdots\tag{10}$$
Again suppose that $x$ is rational so that $e^{x}$ is defined and then we have $\log e^{x} = x\log e = x$ so that $e^{x} = \exp(x)$ for rational $x$. This is expressed in a grand fashion as $$\left(1 + 1 + \frac{1}{2!} + \frac{1}{3!} + \cdots\right)^{x} = 1 + x + \frac{x^{2}}{2!} + \frac{x^{3}}{3!} + \cdots$$ where $x$ is rational.

The next step is to define the general power $a^{b}$ as $$a^{b} = \exp(b\log a)\tag{11}$$ where $a > 0$ and $b$ is any real number. Because of the equation $(5)$ above it matches with the usual definition of $a^{b}$ when $b$ is rational. The laws of exponents are proved easily using the laws application to exponential and logarithm functions. Using this definition we can see that $$\frac{d}{dx}\{a^{x}\} = \frac{d}{dx}\{\exp(x\log a)\} = \exp(x\log a)\log a = a^{x}\log a\tag{12}$$ which implies that the derivative of $a^{x}$ at $x = 0$ is $a^{0}\log a = \log a$ and this means that $$\lim_{x \to 0}\frac{a^{x} - 1}{x} = \log a\tag{13}$$ Putting $x = 1/n$ and letting $n \to \infty$ we get the limit formula $$\lim_{n \to \infty}n(\sqrt[n]{a} - 1) = \log a\tag{14}$$
The final result we establish out of this theory is the logarithmic series $$\log(1 + x) = x - \frac{x^{2}}{2} + \frac{x^{3}}{3} - \cdots\tag{15}$$ for $-1 < x \leq 1$. We note that \begin{align} \log(1 + x) &= \int_{0}^{x}\frac{dt}{1 + t}\notag\\ &= \int_{0}^{x}\left(1 - t + t^{2} - t^{3} + \cdot + (-1)^{n - 1}t^{n - 1} + \frac{(-1)^{n}t^{n}}{1 + t}\right)\,dt\notag\\ &= x - \frac{x^{2}}{2} + \frac{x^{3}}{3} - \cdots + \frac{(-1)^{n - 1}x^{n}}{n} + R_{n}\notag \end{align} where $$R_{n} = (-1)^{n}\int_{0}^{x}\frac{t^{n}}{1 + t}\,dt$$ Clearly if $0 \leq x \leq 1$ then we have $$0 \leq |R_{n}| \leq \int_{0}^{x}t^{n}\,dt = \frac{x^{n + 1}}{n + 1} \leq \frac{1}{n + 1}$$ so that $R_{n} \to 0$ as $n \to \infty$. If $-1 < x < 0$ then we can put $x = -y$ and see that $$R_{n} = \int_{0}^{y}\frac{t^{n}}{1 - t}\,dt \leq \frac{1}{1 - y}\int_{0}^{y}t^{n}\,dt = \frac{y^{n + 1}}{(1 - y)(n + 1)}$$ so that $R_{n} \to 0$ as $n \to \infty$. Thus we have established the logarithmic series.

Next we come to come to the limit formula $$\exp(x) = \lim_{h \to 0}(1 + xh)^{1/h}\tag{15}$$ This can be shown by taking logs and showing that the new expression tends to $x$ as $h \to 0$ so that the original limit is $\exp(x)$. We have also mentioned two fundamental limits $$\lim_{x \to \infty}\frac{\log x}{x^{a}} = 0,\,\, \lim_{x \to \infty}\frac{x^{a}}{\exp(x)} = 0\tag{16}$$ for $a > 0$. For the first limit let us take a number $b$ such that $0 < b < a$ and note that $$\log x = \int_{1}^{x}\frac{dt}{t} < \int_{1}^{x} \frac{dt}{t^{1 - b}} = \frac{x^{b} - 1}{b} < \frac{x^{b}}{b}$$ for $x > 1$. Then we can see that $$0 < \frac{\log x}{x^{a}} < \frac{1}{bx^{a - b}}$$ and letting $x \to \infty$ we see that $\dfrac{\log x}{x^{a}} \to 0$ as $x \to \infty$.

The second limit can be established by the use of exponential series. Clearly we can find an integer $n$ such that $n > a$ and then by exponential series $\exp(x) > x^{n}/n!$ so that $0 < x^{a}/\exp(x) < n!/x^{n - a}$ and taking limit as $x \to \infty$ we see that $\exp(x)/x^{a} \to 0$.

We have thus established all the common properties of exponential and logarithmic functions. It will be found that the most of the proofs depend on establishing the derivative of these functions. While discussing alternative theories in later posts we will try to establish the derivatives and let the reader carry on from that point onwards.

Print/PDF Version

Abel and the Insolvability of the Quintic: Part 4

2014-01-04T14:21:00.000+05:30

We now turn to the goal of this series namely to establish the fact that the general polynomial of degree $5$ or higher is not solvable by radicals over its field of coefficients. Here Abel's argument is quite terse and I have not been able to fully comprehend some parts of it. Also proof of some statements are not provided by Abel because it appeared quite obvious to him. We will provide here a proof which is based on Ruffini's arguments and its later simplification by Wantzel.

The idea of the proof is to study the field extension $K = \mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$ of $F = \mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ with regard to the symmetries under various permutations of the indeterminates $x_{i}$. Clearly the elements of the base field $F$ are invariant under all the possible permutations of $x_{i}$. But the elements of field $K$ are invariant only under the identity permutation. We need to analyze the behavior of a radical extension $R$ of $F$ which is contained in $K$ with regard to invariance under the permutations of $x_{i}$.

Insolvability of the General Polynomial of Degree $n \geq 5$

Let the general polynomial of degree $n$ be denoted by $P(x)$ with $x_{1}, x_{2}, \ldots, x_{n}$ as its roots so that $$P(x) = (x - x_{1})(x - x_{2})\cdots(x - x_{n}) = x^{n} - s_{1}x^{n - 1} + \cdots + (-1)^{n}s_{n}$$ where $s_{1}, s_{2}, \ldots, s_{n}$ are the elementary symmetric functions of the roots $x_{i}$. The base field of the coefficients is $F = \mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ and clearly the splitting field of $P(x)$ over $F$ is $K = \mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$.

We start with a very surprising and curious result regarding the behavior of radical expressions with respect to two specific permutations of the $x_{i}$. In order to define these two permutations it is absolutely important that we have $n \geq 5$. Such permutations don't exist if $n < 5$.

Theorem 12: Let $u, a \in K = \mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$ and $p$ be a prime number such that $u^{p} = a$. Let $n \geq 5$ and let $\sigma, \tau$ be two permutations of $x_{i}$'s defined as follows: $$\sigma: x_{1} \to x_{2} \to x_{3} \to x_{1}, x_{i} \to x_{i} \textit{ for }i > 3$$ and $$\tau: x_{3} \to x_{4} \to x_{5} \to x_{3}, x_{i} \to x_{i} \textit{ for } i = 1, 2\textit{ and }i > 5$$ If $a$ in invariant under the permutations $\sigma, \tau$ then so is $u$.

Clearly if $a = 0$ then $u = 0$ and the theorem is trivially true in this case. So let $a \neq 0$ so that $u \neq 0$. Now we have $u^{p} = a$ so that $\sigma(u^{p}) = \sigma(a) = a$ or $(\sigma(u))^{p} = a = u^{p}$. We then have $(\sigma(u)/u)^{p} = 1$ so that $\sigma(u)/u$ is some $p^{\text{th}}$ root of unity, say $\omega_{\sigma}$ and then we have $\sigma(u) = \omega_{\sigma}u$.

Applying the permutation $\sigma$ to this equation we get $$\sigma^{2}(u) = \sigma(\omega_{\sigma}u) = \omega_{\sigma}\sigma(u) = \omega_{\sigma}^{2}u$$ and similarly $\sigma^{3}(u) = \omega_{\sigma}^{3}u$. But $\sigma^{3}$ is the identity permutation and hence we get $\omega_{\sigma}^{3}u = u$ so that $\omega_{\sigma}^{3} = 1$.

Following exactly the same reasoning we get $\tau(u) = \omega_{\tau}u$ where $\omega_{\tau}$ is some $p^{\text{th}}$ root of unity and we have $\omega_{\tau}^{3} = 1$. Now its time to do some permutation algebra. Clearly we have $$\sigma \circ \tau: x_{1} \to x_{2} \to x_{3} \to x_{4} \to x_{5} \to x_{1}, x_{i} = x_{i} \text{ for }i > 5$$ and $$\sigma^{2} \circ \tau: x_{1} \to x_{3} \to x_{4} \to x_{5} \to x_{2} \to x_{1}, x_{i} = x_{i} \text{ for }i > 5$$ Again we can see that $$(\sigma \circ \tau)u = \omega_{\sigma}\omega_{\tau}u,\, (\sigma^{2}\circ \tau)(u) = \omega_{\sigma}^{2}\omega_{\tau}u$$ Since both $(\sigma\circ\tau)^{5}$ and $(\sigma^{2}\circ\tau)^{5}$ are identity permutations it follows that $$(\omega_{\sigma}\omega_{\tau})^{5} = (\omega_{\sigma}^{2}\omega_{\tau})^{5} = 1$$ Also we have previously obtained $$\omega_{\sigma}^{3} = \omega_{\tau}^{3} = 1$$ Now it is clear that $$\omega_{\sigma} = (\omega_{\sigma}^{3})^{2}(\omega_{\sigma}\omega_{\tau})^{5}(\omega_{\sigma}^{2}\omega_{\tau})^{-5} = 1\cdot 1\cdot 1 = 1$$ and $$\omega_{\tau} = (\omega_{\tau}^{3})^{2}\omega_{\sigma}^{5}(\omega_{\sigma}\omega_{\tau})^{-5} = 1\cdot 1\cdot 1 = 1$$ We thus have $\sigma(u) = \omega_{\sigma}u = u, \tau(u) = \omega_{\tau}u = u$ so that $u$ is invariant under the permutations $\sigma$ and $\tau$.

We are now ready to prove the insolvability of the polynomial $P(x)$ by radicals if $n \geq 5$. We have the precise statement of the theorem as follows:

Theorem 13: If $n \geq 5$ then the general polynomial of degree $n$ given by $$P(x) = (x - x_{1})(x - x_{2})\cdots(x - x_{n}) = x^{n} - s_{1}x^{n - 1} + \cdots + (-1)^{n}s_{n}$$ is not solvable by radicals over $\mathbb{Q}(s_{1}, s_{2}, \ldots, s_{n})$ nor over $\mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$.

By theorem 6 of this post it is sufficient to show that the polynomial $P(x)$ is not solvable by radicals over $F = \mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$. Let us suppose on the contrary that the polynomial $P(x)$ is solvable by radicals over $F$. This means there is a radical extension $R$ of $F$ which contains a root $x_{i}$ of $P(x)$. By renumbering of the $x_{i}$'s it is possible to ascertain that $R$ contains $x_{1}$ in particular. Now by the theorem of natural irrationalities proved in the last post we can assume that $R$ is contained in $K = \mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$. Let the height of $R$ over $F$ be $h$. Clearly $h$ can not be zero as it would mean $R = F$ and hence every member of $R$ including the root $x_{1}$ would have to be invariant under all the permutations of the $x_{i}$'s. Hence $h > 0$. Let $F = R_{0} \subseteq R_{1} \subseteq R_{2} \subseteq \ldots \subseteq R_{h} = R$ be the tower of radical extensions leading from $F$ to $R$. Here each $R_{i}$ is a radical extension of height $1$ of $R_{i - 1}$.

Consider first the radical extension $R_{1}$ of height $1$ over $F$. Clearly we have a member $u \in R_{1}$ and a prime $p$ such that $R_{1} = F(u)$ and $u^{p} = a \in F$ is not a $p^{\text{th}}$ power in $F$. Clearly the element $a \in F$ is invariant under all the permutations of the $x_{i}$'s and since $n \geq 5$, the element $a$ is invariant under the two permutations $\sigma$ and $\tau$ defined in theorem 12 above. It follows from theorem 12 that the element $u$ is also invariant under $\sigma, \tau$. Since every element of $F$ is also invariant under these two permutations it follows that every element of $F(u) = R_{1}$ is also invariant under $\sigma, \tau$.

Now considering the field $R_{2}$ as a radical extension of height $1$ over $R_{1}$ and repeating the same argument we see that every element of $R_{2}$ is also invariant under $\sigma, \tau$. Continuing this process for each of the fields $R_{i}$ we finally see that every member of $R_{h} = R$ is invariant under $\sigma, \tau$. But we have $x_{1} \in R$ which is clearly not invariant under $\sigma$ and therefore we obtain a contradiction. It follows that our initial assumption of the solvability of $P(x)$ by radicals over $F$ is wrong and thereby our proof is complete.

Note: The treatment of the insolvability of general polynomial of degree $5$ or more in this series of posts is taken from the wonderful book Galois Theory of Algebraic Equations by Jean-Pierre Tignol. Readers are advised to go through this beautiful book for further development in these topics. This has been the first understandable presentation of Abel's proof I have found in literature and online articles. Most of the Modern Algebra textbooks totally ignore the contributions of Abel or just mention it as a historical note and straightaway jump onto the beautiful theories of Galois. Tignol's book discusses all the historical developments leading upto Galois Theory with the exposition of the contributions from various mathematicians like Gauss, Lagrange, Abel and finally Galois.

Print/PDF Version

Abel and the Insolvability of the Quintic: Part 3

2014-01-03T16:42:00.001+05:30

The proof for the non-solvability of polynomial equation of degree $5$ (or more) by radicals obviously has to proceed via method of contradiction. Abel therefore assumed that such a solution was possible for a quintic and then figured out the most general form of such a solution. At the same time Abel observed that the radical expressions occurring in such a form must themselves be rational expressions of the roots desired. This was a key part which Abel proved for the first time. This result was later termed as the Theorem of Natural Irrationalities.

Abel's approach to the proof of this central result proceeds in multiple stages. Abel shows that an algebraic function of order $1$ (say $v = f_{1}(f_{0}^{1/m}, a, b, c)$) can always be expressed as a linear combination of powers of the radical expression ($u = f_{0}^{1/m}$) used to generate this function. Also when this is done, it is possible to express the coefficients of this linear combination (as well as the radical $u$) as rational functions of the original algebraic function ($v$) and its counterparts. The same procedure can then be carried inductively to algebraic functions of any order. We will rewrite Abel's proof in the modern notation of radical extensions.

Irreducibility of the Polynomial $x^{p} - a$

We first need to establish the irreducibility of the polynomial $f(x) = x^{p} - a \in F[x]$ where $p$ is a prime number and $a \in F$ is not a $p^{\text{th}}$ power in field $F$. In order to establish this we first show that the the powers of $a$ are also not $p^{\text{th}}$ powers in $F$. More precisely we show that if $k = 1, 2, \ldots, p - 1$ then $a^{k}$ is not a $p^{\text{th}}$ power in $F$.

Clearly on the contrary assume that $a^{k} = b^{p}$ for some $b \in F$. Now we note that $k$ and $p$ are coprime so there exists integers $m, n$ such that $mk + np = 1$. And therefore $$a = a^{mk + np} = (a^{k})^{m}a^{np} = b^{mp}a^{np} = (b^{m}a^{n})^{p}$$ which contradicts the fact that $a$ is not a $p^{\text{th}}$ power in $F$.

Let us now assume that $f(x) = x^{p} - a$ is reducible in $F[x]$. This means that we have polynomials $g(x), h(x) \in F[x]$ such that $f(x) = g(x)h(x)$. Also without any loss of generality we can assume that $g(x), h(x)$ are monic polynomials (i.e. the coefficients of highest power of $x$ in them is $1$). Let $K$ be the splitting field of $f(x)$ over $F$. Thus $f(x)$ can be factored as a product of linear factors over $K$. Since the roots of $f(x)$ are $p^{\text{th}}$ roots of $a$ which are obtained by multiplying one such root $u$ with the $p^{\text{th}}$ roots of unity. It follows that $K$ contains all the roots of the form $\omega u$ where $\omega$ is any $p^{\text{th}}$ root of unity. Let $\mu_{p}$ denote the set of all $p^{\text{th}}$ roots of unity. Then we can write $$f(x) = \prod_{\omega \in \mu_{p}}(x - \omega u)$$ Clearly both the polynomials $g(x), h(x)$ also split into linear factors over $K$ and the some linear factors of $f(x)$ make up $g(x)$ and remaining ones make up $h(x)$. Thus we can write $$g(x) = \prod_{\omega \in I}(x - \omega u), h(x) = \prod_{\omega \in J}(x - \omega u)$$ where $I, J$ are sets with $I \cap J = \emptyset, I \cup J = \mu_{p}$.

Let the constant term of $g(x)$ be denoted by $b$ so that $b \in F$. Now from the above relation we can see that $b = (-u)^{k}\prod_{\omega \in I}\omega$ where $k$ is the number of elements in $I$. Since $\omega^{p} = 1$, it follows that $\{(-1)^{k}b\}^{p} = u^{kp} = a^{k}$. This means that $a^{k}$ is a $p^{\text{th}}$ power in $F$. This is not possible if $k = 1, 2, \ldots, p - 1$. Thus either $k = 0$ or $k = p$. If $k = 0$ then $I = \emptyset$ and $g(x) = 1$. If $k = p$ then $J = \emptyset $ and $h(x) = 1$. Thus we have either $g(x) = 1$ or $h(x) = 1$ and therefore the polynomial $f(x) = x^{p} - a$ is irreducible over $F$. We can formally state this as:

Theorem 7: Let $p$ be a prime number and let $a $ be a member of a field $F$ which is not a $p^{\text{th}}$ power in $F$. Then the polynomial $f(x) = x^{p} - a$ is irreducible over $F$.

Properties of Radical Extension $R = F(u)$

Next we consider a radical extension $R$ of a field $F$ of height $1$. This means that there is a prime number $p$ and an element $a \in F$ which is not $p^{\text{th}}$ power in $F$ and an element $u \in R$ such that $R = F(u)$ and $u^{p} = a$. Now each member of $R$ can be expressed as a rational function of $u$ with coefficients in $F$. It is easy to show that we can express members of $R$ as polynomials in $u$ with coefficients of $F$. To do this we must show that if $g(u)$ is a polynomial expression in $u$ with coefficients in $F$, then $1/g(u)$ can also be expressed as a polynomial in $u$ with coefficients in $F$.

Note that since $u^{p} = a$, powers of $p$ in $g(u)$ which are greater than $(p - 1)$ can be replaced with lower powers of $p$ and members of $F$. Thus we can assume that $g(u)$ has no powers of $u$ greater than $(p - 1)$. Let $g(x) \in F[x]$ be the corresponding polynomial. Then the degree of $g(x)$ can be at most $(p - 1)$. Also the polynomial $f(x) = x^{p} - a$ is irreducible over $F$. It follows that both the polynomials $f(x)$ and $g(x)$ are relatively prime to each other. Hence there exist polynomials $a(x), b(x) \in F[x]$ such that $f(x)a(x) + g(x)b(x) = 1$. Now putting $x = u$ and noting that $f(u) = 0$ we get $g(u)b(u) = 1$ so that $b(u) = 1/g(u)$. Thus we have been able to express $1/g(u)$ as a polynomial $b(u)$.

It follows now that any rational expression $h(u)/g(u) \in F(u) = R$ can be expressed as polynomial in $u$ with coefficients in $F$. Note further that degree of such a polynomial can always be made less than $p$ by using the relation $u^{p} = a$. We have thus shown that any member $v \in F(u) = R$ can be expressed in the form $$v = a_{0} + a_{1}u + a_{2}u^{2} + \cdots + a_{p - 1}u^{p - 1}$$ where $a_{0}, a_{1}, \ldots, a_{p - 1} \in F$. We will further establish that this expression of $v$ in terms of $u$ is unique. Clearly if we have $$v = a_{0} + a_{1}u + a_{2}u^{2} + \cdots + a_{p - 1}u^{p - 1} = b_{0} + b_{1}u + b_{2}u^{2} + \cdots + b_{p - 1}u^{p - 1}$$ then $u$ is the root of a polynomial $g(x) = c_{0} + c_{1}x + c_{2}x^{2} + \cdots + c_{p - 1}x^{p - 1} \in F[x]$ where $c_{i} = a_{i} - b_{i}$. Clearly the degree of $g(x)$ is at most $(p - 1)$ and $u$ is a root of an irreducible polynomial $f(x) = x^{p} - a$ of degree $p$. Hence we must have $g(x) = 0$ identically and therefore $c_{i} = 0$ for all $i$. Thus $a_{i} = b_{i}$ and therefore the expression of $v$ in terms of $u$ is unique. What we have proved so far is that:

Theorem 8: Let $R$ be a radical extension of a field $F$ of height $1$ so that $R = F(u)$ where $u \in R$ is such that $u^{p} = a \in F$ where $p$ is a prime number and $a$ is not a $p^{\text{th}}$ power in $F$. Then every element $v \in R$ can be expressed as a linear combination $$v = a_{0} + a_{1}u + a_{2}u^{2} + \cdots + a_{p - 1}u^{p - 1}$$ where $a_{0}, a_{1}, \ldots, a_{p - 1} \in F$ in a unique manner.

Abel goes further and states that if in the above result $v \notin F$ then we can choose a suitable $u \in R$ such that the coefficient $a_{1}$ can be made unity. This we state as a theorem next:

Theorem 9: Let $R$ be a radical extension of a field $F$ of height $1$. Let $v \in R$ and $v \notin F$. Then an element $u \in R$ can be chosen such that $R = F(u)$ and a prime number $p$ can be found with $u^{p} \in F$ and $u^{p}$ not being a $p^{\text{th}}$ power in $F$ such that $$v = a_{0} + u + a_{2}u^{2} + \cdots + a_{p - 1}u^{p - 1}$$ where $a_{0}, a_{2}, \ldots, a_{p - 1} \in F$ and this expression for $v$ in terms of $u$ is unique.

Let $R = F(u')$ where $u' \in R$ is such that $u'^{p} = a' \in F$ and $a'$ is not a $p^{\text{th}}$ power in $F$. Also we have a unique expression for $v$ in terms of $u'$ as $$v = a'_{0} + a'_{1}u' + \cdots + a'_{p - 1}u'^{p - 1}$$ where $a'_{i} \in F$. Since $v \notin F$ it follows that one of the coefficients $a'_{1}, a'_{2}, \ldots, a'_{p - 1}$ must be non-zero. Let the first such non-zero coefficient be $a'_{k}$.

Let $u = a'_{k}u'^{k}$ so that $u^{p} = {a'}_{k}^{p}{a'}^{k}$. Since $k \in \{1, 2, \ldots, p - 1\}$ it follows that ${a'}^{k}$ is not a $p^{\text{th}}$ power in $F$ and hence $u^{p}$ is also not a $p^{\text{th}}$ power in $F$. Now it is clear that $F(u) \subseteq R$ and thus we need to prove that $R \subseteq F(u)$. This can be achieved if we show that every member of $R$ can be expressed as a rational function of $u$. We will show that in fact we can express every member of $R$ as a polynomial in $u$ with coefficients in $F$. Note that every element of $R$ can be expressed as a polynomial in $u'$ so it makes sense to first express powers of $u'$ in terms of $u$.

If $i \in \{1, 2, \ldots, p - 1\}$ then $u^{i} = {a'}_{k}^{i}{u'}^{ik}$. As $i$ varies from $1, 2, \ldots, p - 1$, so does $ik \pmod {p}$ (but in a different order) and hence we can write $ik = pm + j$ where $m$ is an integer and $j \in \{1, 2, \ldots, p - 1\}$. Thus $u^{i} = {a'}_{k}^{i}{a'}^{m}{u'}^{j}$. Since the correspondence between $i$ and $j$ is one to one, it follows that we can express every power of $u'$ as ${u'}^{j} = \{{a'}_{k}^{i}{a'}^{m}\}^{-1}u^{i}$ i.e. as a multiple of a power of $u$. It thus follows that every member of $R$ can be expressed as a polynomial in $u$ with coefficients in $F$. By the irreducibility of $(x^{p} - u^{p})$ over $F$ it follows that such an expression in unique. Also note that by the way we have chosen $u$ the coefficient of $u$ in this expression for $v$ is $1$ and hence we can write $$v = a_{0} + u + a_{2}u^{2} + \cdots + a_{p - 1}u^{p - 1}$$ in a unique fashion.

Next Abel uses a very ingenious argument and shows the dependence of the coefficients $a_{0}, a_{2}, a_{p - 1}$ on $v$ in a very direct manner by expressing them as rational functions of $v$ and its counterparts (or "conjugates"). In fact the radical $u$ also gets expressed in the same manner in terms of $v$. This we state below:

Theorem 10: Let $R$ be a radical extension of $F$ with $u \in R$ and a prime $p$ such that $u^{p} = a \in F$ is not a $p^{\text{th}}$ power in $F$ and $R = F(u)$. Also assume that $F$ contains a primitive $p^{\text{th}}$ root of unity $\zeta$. Let $v \in R - F$ be a root of a polynomial $f(x) \in F[x]$ and let $$v = a_{0} + u + a_{2}u^{2} + \cdots + a_{p - 1}u^{p - 1}$$ be the unique expression of $v$ in terms of $u$. Then $R$ contains $p$ roots of $f(x)$ and $u, a_{0}, a_{2}, \ldots, a_{p - 1}$ are rational expressions of these $p$ roots of $f(x)$ with coefficients in $\mathbb{Q}(\zeta)$.

Let $g(y) = f(a_{0} + y + a_{2}y^{2} + \cdots + a_{p - 1}y^{p - 1}) \in F[y]$ be a polynomial. Then the fact that $f(v) = 0$ implies that $g(u) = 0$ so that $u$ is a root of $g(x)$. At the same time $u$ is also a root of $h(x) = x^{p} - a$. Since $h(x)$ is irreducible it follows that $h(x)$ divides $g(x)$ and hence every root of $h(x)$ is a root of $g(x)$. Now the roots of $h(x)$ are of the form $\zeta^{i}u$ where $i = 0, 1, 2, \ldots, p - 1$, it follows that $g(\zeta^{i}u) = 0$. Let $$v_{i} = a_{0} + a_{1}\zeta^{i}u + a_{2}\zeta^{2i}u^{2} + \cdots + a_{p - 1}\zeta^{(p - 1)i}u^{p - 1}$$ with $a_{1} = 1$ so that $v = v_{0}$ and each $v_{i}$ is a root of $f(x)$. Since each $v_{i} \in R$ it follows that $R$ contains $p$ roots of $f(x)$. Now we can see that $$\sum_{i = 0}^{p - 1}\zeta^{-ik}v_{i} = \sum_{j = 0}^{p - 1}\left(\sum_{i = 0}^{p - 1}\zeta^{(j - k)i}\right)a_{j}u^{j}$$ Clearly if $j \neq k$ then $\zeta^{j - k}$ is a primitive $p^{\text{th}}$ root of unity and hence the sum with index $i$ on the right side above vanishes. If $j = k$ then the same sum evaluates to $p$. It follows that $$\sum_{i = 0}^{p - 1}\zeta^{-ik}v_{i} = pa_{k}u^{k}$$ for all $k = 0, 1, 2, \ldots, p - 1$. Putting $k = 1$ and noting that $a_{1} = 1$ we see that $u$ is expressed as a rational function of the roots $v_{i}$ with coefficients in $\mathbb{Q}(\zeta)$ and then $$a_{k} = \dfrac{{\displaystyle \sum_{i = 0}^{p - 1}\zeta^{-ik}v_{i}}}{pu^{k}}$$ which shows that $a_{k}$ is also expressed as a rational function of the roots $v_{i}$ with coefficients in $\mathbb{Q}(\zeta)$.

Also note the usefulness of the fact that $a_{1} = 1$ in the above derivation. Without this it would not have been possible to express $u$ in terms of $v_{i}$. Therefore the theorem 9 is significant which allows us to make the coefficient $a_{1} = 1$.

We are now ready to prove the famous result of Abel namely:

Theorem of Natural Irrationalities

Let $F = \mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ and $K = \mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$ where $x_{i}$ are $n$ indeterminates and the $s_{i}$ are their elementary symmetric functions. Thus $K$ represents the field of rational functions in $n$ indeterminates with complex coefficients and $F$ represents the field of symmetric rational functions of the same $n$ indeterminates with complex coefficients. Clearly $F \subseteq K$. With this notation we have the following theorem:

Theorem 11: If an element $v \in K$ lies in a radical extension of $F$, then $K$ contains a radical extension of $F$ which contains $v$.

The theorem says that if we wish to express any of $x_{i}$ in a radical extension of the base field $F$ then all the intermediate fields in this tower of radical extensions are also contained in $K$. Note that the theorem crucially depends on the fact that the base field contains root of unity. The theorem is false if we replace field $\mathbb{C}$ with $\mathbb{Q}$. The term "natural irrationalities" refers to the fact that the irrationalities (i.e. radicals) used in expressing $v$ can be assumed to lie in $K$ rather than some external field.

We will use induction on the height $h$ of the radical extension $R$ of $F$ which contains $v \in K$. If $h = 0$ then $R = F$ and then there is nothing to prove. So let $h > 0$. Also let us assume as the induction hypothesis that any element of $K$ which is contained in a radical extension of $F$ height less than $h$ then it is contained in a radical extension of $F$ which lies in $K$.

Let $v \in K$ be contained in a radical extension $R$ of $F$ of height $h$. Then we have a radical extension $R_{1}$ of $F$ of height $(h - 1)$ and $R$ is a radical extension of height $1$ of $R_{1}$. If $v \in R_{1}$ then the proof is done by induction hypothesis. So let $v \in R - R_{1}$. We have an element $u \in R$ and a prime $p$ such that $R = R_{1}(u)$ and $u^{p} \in R_{1}$ with $u^{p}$ not being a $p^{\text{th}}$ power in $R_{1}$. Then by theorem 9 we can express $v$ as $$v = a_{0} + u + a_{2}u^{2} + \cdots + a_{p - 1}u^{p - 1}$$ where $a_{0}, a_{2}, \ldots, a_{p - 1} \in R_{1}$.

Since $v \in K$ it follows that $v$ is a root of a polynomial with coefficients in $F$ and hence in $R_{1}$ (clearly $v$ and its images under various permutations of $x_{i}$ will be the roots of this polynomial). Now by theorem 10, we can deduce that $u, a_{0}, a_{2}, \ldots, a_{p - 1}$ are rational functions of these roots and hence lie in $K$. At the same time $u^{p}$ and the coefficients $a_{i}$ lie in $R_{1}$ which is a radical extension of $F$ of height $(h - 1)$. Therefore by induction hypothesis each of $u^{p}, a_{0}, a_{2}, \ldots, a_{p - 1}$ lies in a radical extension of $F$ contained in $K$. By theorem 3 of the last post there is a single radical extension of $F$, say $R'$, which contains all these elements and is itself contained in $K$. Since $u^{p} \in R'$ therefore $R'(u)$ is radical extension of $R'$ and hence a radical extension of $F$. Since $u \in K$ therefore $R'(u) \subseteq K$ and thus we have a radical extension of $F$ which contains $v$ and is contained in $K$.

Using this theorem we will prove in the next post that the general polynomial of degree $5$ or higher is not solvable by radicals over the field of its coefficients.

Print/PDF Version

Abel and the Insolvability of the Quintic: Part 2

2014-01-02T16:24:00.002+05:30

In the last post we defined the concept of a radical field extension along the lines of the definition of algebraic functions given by Abel. In the current post we will study some properties of such field extensions which will ultimately enable us to study the field extension $\mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$ of $\mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ where $s_{1}, s_{2}, \ldots, s_{n}$ are elementary symmetric functions of the indeterminates $x_{1}, x_{2}, \ldots, x_{n}$.

While discussing properties of radical extensions it will be found that at certain times it is useful to let the base field contain all the roots of unity. Abel and his contemporaries always assumed the existence of roots of unity as a given while dealing with solution of algebraic equations.

Properties of Radical Extensions

We first show that in the definition of a radical extension $R$ of $F$ with $R = F(u)$ where $u \in R$ with $u^{p} \in F$ we may drop the requirement that $p$ be a prime. This is possible if we assume that the base field $F$ contains roots of unity.

Theorem 1: Let $R \supseteq F$ be a field extension such that $R = F(u)$ where $u \in R$ is such that $u^{n} \in F$ for some positive integer $n$. If $F$ contains a primitive $n^{\text{th}}$ root of unity then $R$ is a radical extension of $F$.

We proceed by induction. Starting with $n = 1$ we see that $u \in F$ so that $R = F$ and clearly $R$ is then a radical extension of height $0$ of $F$. So let's suppose that $n > 1$ and that the result holds for all exponents of $u$ upto $n - 1$.

Let us first suppose that $n = rs$ where $r, s$ are positive integers greater than $1$ and less than $n$. Since we have $u^{n} = u^{rs} = (u^{r})^{s}$, it follows by the induction hypothesis that $R = F(u)$ is a radical extension of $F(u^{r})$ and $F(u^{r})$ is a radical extension of $F$. Hence it follows that $R$ is a radical extension of $F$.

If $n$ is prime then we need to consider two cases: either $u^{n}$ is an $n^{\text{th}}$ power in $F$ or it is not an $n^{\text{th}}$ power in $F$. In the latter case we are done by the definition of a radical extension. So we only need to consider the case that $u^{n} = a^{n}$ for some $a \in F$. In this case we may assume that $a \neq 0$ otherwise $u = 0$ and $R = F$ so that $R$ is a radical extension of height $0$ of $F$. Now we have $(u/a)^{n} = 1$ so that $u/a$ is an $n^{\text{th}}$ root of unity. Since $F$ contains a primitive $n^{\text{th}}$ root of unity, it contain all the $n^{\text{th}}$ roots of unity. Hence $u/a \in F$ so that $u \in F$ and therefore $R = F$ so that $R$ is a radical extension of height $0$ of $F$.

Next we see how we can generate radical extensions based on some given radical extension $R$ of $F$. The following result discusses the scenario where we have towers of field extensions $F\subseteq R\subseteq K$ and $F\subseteq L\subseteq K$. If $R$ is a radical extension of $F$ then based on it we can create a radical extension $S$ of $L$ which also includes $R$.

Theorem 2: Let fields $F, R, L, K$ be such that $F \subseteq R\subseteq K$ and $F\subseteq L\subseteq K$. Also let us assume that $\mathbb{C} \subseteq F$. If $R$ is a radical extension of $F$ then there is a radical extension $S$ of $L$ which is contained in $K$ and contains $R$.

The above theorem can be graphically represented as below:

Here the lines represent containment / field extension with the convention that a field at the top contains the field below. Red lines indicate radical extensions.

The idea of the proof is that we adjoin the radicals $u$ which are used in creation of $R$ from $F$ to the field $L$ to create a radical extension $S$. This will naturally contain $R$ and be contained in $K$ because all such radicals are part of $K$. We carry this idea in a formal fashion in the proof that follows.

We use induction on the height of radical extension $R$ of $F$. Let $h$ be this height. If $h = 0$, then $R = F$ and we can take $S = L$ which contains $R = F$ and is contained in $K$ and clearly $S$ is a radical extension of $L$ of height $0$. So we have verified the theorem in case $h = 0$. Now we assume that the result holds for all radical extensions $R$ of $F$ with height less than $h$.

Since $R$ is a radical extension of $F$ of height $h$, it follows that we have a radical extension $R_{1}$ of $F$ of height $h - 1$ and $R$ is a radical extension of height $1$ of $R_{1}$. This means that we have a member $u \in R$ such that $R = R_{1}(u)$ and $u^{p} \in R_{1}$ where $p$ is a certain prime and $u^{p}$ is not a $p^{\text{th}}$ power in $R_{1}$.

Clearly by the induction hypothesis we have a radical extension $S_{1}$ of $L$ which contains $R_{1}$ and is contained in $K$. Now $u^{p} \in S_{1}$ and since $S_{1}$ contains $\mathbb{C}$ and thus all the roots of unity we can use Theorem 1 above to deduce that $S = S_{1}(u)$ is a radical extension of $S_{1}$ and hence of $L$. Clearly since $u \in R \subseteq K$ and $S_{1} \subseteq K$ therefore $S = S_{1}(u) \subseteq K$. Also $R_{1}\subseteq S_{1} \subseteq S$ and $u \in S$ so that $R = R_{1}(u) \subseteq S$. Thus we have found a radical extension $S$ of $L$ which contains $R$ and is contained in $K$.

The above result is so useful in combining multiple radical extensions to create one radical extension. We have the following result in this connection:

Theorem 3: Let $\mathbb{C}\subseteq F\subseteq K$ be a field extension and let each of the elements $v_{1}, v_{2}, \ldots, v_{n}$ in $K$ lie in a radical extension of $F$ contained in $K$. Then there is a single radical extension of $F$ contained in $K$ which contains all the elements $v_{1}, v_{2}, \ldots, v_{n}$.

Clearly if $n = 1$ the result holds. So let us suppose that there is a single radical extension $L$ of $F$ which contains all the elements $v_{1}, v_{2}, \ldots, v_{n - 1}$. Also let $R$ be the radical extension of $F$ which contains $v_{n}$. Using Theorem 2 we can see that there is a radical extension $S$ of $L$ which contains $R$ and thus we see that $S$ is a radical extension of $F$ which contains all the elements $v_{1}, v_{2}, \ldots, v_{n}$.

Roots of Unity and Radicals

Note that the above results are dependent on the fact that the base field $F$ contains roots of unity. However if this is not the case then we have a remarkable result which shows that all the roots of unity can be obtained via radical extensions. This was established by Gauss using his theory of periods. We establish this result along the lines of Gauss using induction.

Theorem 4: If $n$ is a positive integer and $F$ is a field then the $n^{\text{th}}$ roots of unity lie in a radical extension of $F$.

We need to establish this only for the primitive $n^{\text{th}}$ roots of unity as the other roots are their powers. For $n = 1$ the result is trivial and hence let $n > 1$. Let us assume that all $k^{\text{th}}$ roots of unity lie in a radical extension of $F$ for all $ k = 1, 2, \ldots, n - 1$.

Clearly if $n$ is composite, say $n = rs$ with $r, s$ being positive integers greater than $1$ and less than $n$ and $\zeta$ is a primitive $n^{\text{th}}$ root of unity then $\zeta^{r}$ is an $s^{\text{th}}$ root of unity and hence by induction hypothesis lies in a radical extension $R_{1}$ of $F$. Again by the induction hypothesis we can find a radical extension of $R_{2}$ of $R_{1}$ which contains all $r^{\text{th}}$ roots of unity. Now $\zeta^{r}\in R_{1} \subseteq R_{2}$ and $R_{2}$ contains all $r^{\text{th}}$ roots of unity therefore by theorem 1, $R_{2}(\zeta)$ is a radical extension of $R_{2}$ and hence of $F$ which contains the primitive $n^{\text{th}}$ root $\zeta$.

If $n$ is prime then the argument is a bit tricky but Gauss uses the technique of Lagrange resolvents. As usual let $g$ be a primitive root of $n$ and let $\zeta$ be a primitive $n^{\text{th}}$ root of unity. We set $\zeta_{i} = \zeta^{g^{i}}$. Then $\zeta_{0}, \zeta_{1}, \zeta_{2}, \ldots, \zeta_{n - 2}$ are the $(n - 1)$ primitive $n^{\text{th}}$ roots of unity. Let $\omega$ be an $(n - 1)^{\text{th}}$ root of unity other than $1$ and we form the Lagrange resolvent $$t(\omega) = \zeta_{0} + \omega\zeta_{1} + \omega^{2}\zeta_{2} + \cdots + \omega^{n - 2}\zeta_{n - 2}$$ and then we can see that the cyclic permutation of $\zeta_{0}, \zeta_{1}, \ldots, \zeta_{n - 2}, \zeta_{0}$ in that order changes the expression $t(\omega)$ into $\omega^{-1}t(\omega)$. It follows that the expression $\{t(\omega)\}^{n - 1}$ remains invariant by the application of this permutation.

Now note that if we calculate the expression $\{t(\omega)\}^{n - 1}$ it will involve multiplying the various $\zeta$'s and since these are primitive $n^{\text{th}}$ roots of unity thier products will also be expressed as an $n^{\text{th}}$ root of unity. It follows that the expression $\{t(\omega)\}^{n - 1}$ can be expressed in the form of a linear combination of the $\zeta$'s where the coefficients will be certain polynomials in $\omega$. Thus we have $$\{t(\omega)\}^{n - 1} = a_{0}\zeta_{0} + a_{1}\zeta_{1} + \cdots + a_{n - 2}\zeta_{n - 2}$$ where $a_{i}$ are polynomials in $\omega$. Now we apply the cyclic permutation of the $\zeta$'s to the above expression and note that doing so does not change the LHS. Thus we get \begin{align} \{t(\omega)\}^{n - 1} &= a_{0}\zeta_{0} + a_{1}\zeta_{1} + \cdots + a_{n - 2}\zeta_{n - 2}\notag\\ &= a_{0}\zeta_{1} + a_{1}\zeta_{2} + \cdots + a_{n - 2}\zeta_{0}\notag\\ &= a_{0}\zeta_{2} + a_{1}\zeta_{3} + \cdots + a_{n - 2}\zeta_{1}\notag\\ &= \cdots\notag\\ &= a_{0}\zeta_{n - 2} + a_{1}\zeta_{0} + \cdots + a_{n - 2}\zeta_{n - 3}\notag \end{align} Adding these equations we get $$(n - 1)\{t(\omega)\}^{n - 1} = \left(\sum_{i = 0}^{n - 2}a_{i}\right)\left(\sum_{i = 0}^{n - 2}\zeta_{i}\right)$$ and since the $\zeta$'s sum to $-1$ it follows that expression $\{t(\omega)\}^{n - 1}$ is a polynomial in $\omega$. By induction hypothesis the $(n - 1)^{\text{th}}$ root $\omega$ lies in a radical extension $R_{1}$ of $F$ and hence the expression $\{t(\omega)\}^{n - 1}$ also lies in $R_{1}$. It follows by theorem 1 that $R_{1}(t(\omega))$ is a radical extension of $R_{1}$ and hence of $F$. Considering all the $(n - 1)^{\text{th}}$ roots of unity it is clear that we can find a radical extension $R_{2}$ of $F$ which contains all such expression $t(\omega)$. It can now be easily checked that $$\zeta_{i} = \frac{1}{n - 1}\sum_{\omega}\omega^{-i}t(\omega)$$ and therefore $\zeta_{i} \in R_{2}$. Thus we see that the primitive $n^{\text{th}}$ root $\zeta$ lies in a radical extension of $F$. This completes the proof.

We next want to prove an important result regarding solvability of polynomials over a field. If $P(x)$ is a polynomial over field $F$ and $L \supseteq F$ is a field extension then $P(x)$ can also be regarded as a polynomial over field $L$. We will establish that if $P(x)$ is solvable by radicals over $F$ then it is also solvable by radicals over $L$. This also shows that if $P(x)$ is not solvable by radicals over $L$ then it is not solvable by radicals over $F$. Thus in case we are trying to establish non-solvability by radicals of some polynomial then it does not harm to extend the field of coefficients. Thus it makes sense to always enlarge the field of coefficients to include roots of unity and thereby all the complex numbers. By doing this we achieve a lot of simplicity (via the use of theorems 1, 2, 3 above) in our proofs without losing any generality. We first prove a preliminary result.

Theorem 5: Let $F\subseteq L$ be a field extension. If $R$ is a radical extension of $F$ then there is a radical extension $S$ of $L$ such that $R$ can be identified with a subfield of $S$.

This result should be contrasted with theorem 2 as it does away with the requirement that base field $F$ should contain $\mathbb{C}$.

Again as usual the proof will be by induction on the height $h$ of $R$ over $F$. If $h = 0$ then $R = F$ and we can take $S = L$ which contains $R = F$. If $h = 1$ so that $R = F(u)$ with $u^{p} = a \in F$ not being a $p^{\text{th}}$ power in $F$. Now consider the polynomial $P(x) = x^{p} - a$. It is a polynomial over $F$ as well as over $L$. Hence there is a splitting field $K \supseteq L$ which contains all the roots of $P(x)$. Since $u$ is a root of this polynomial we can identify this $u$ with some member of $K$. Since $K \supseteq L \supseteq F$ it follows that $K$ contains all the rational expressions in $u$ with coefficients in $F$. In this sense $K \supseteq R = F(u)$.

If $a$ is not a $p^{\text{th}}$ power in $L$, then $L(u)$ is a radical extension of $L$ of height $1$ and clearly it contains both $F$ and $u$ and therefore $R = F(u)$. We thus have a radical extension of $L$ which contains $R$.

If $a$ is a $p^{\text{th}}$ power in $L$ say $ a = b^{p}$ with $b \in L$ then we have $u^{p} = b^{p}$ so that $(u/b)^{p} = 1$ so that $u/b$ is a $p^{\text{th}}$ root of unity. Clearly via theorem 4 there is a radical extension $S$ of $L$ which contains $u/b$ and hence contains $u$. Clearly then $S$ contains $F$ and $u$ and therefore $R = F(u)$. This completes the proof when $R$ is a radical extension of height $h = 1$ of $F$.

If $h > 1$, then we have a radical extension $R_{1}$ of $F$ of height $h - 1$ and $R$ is a radical extension of height $1$ of $R_{1}$. By induction hypothesis we may assume that there is a radical extension $S_{1}$ of $L$ which contains $R_{1}$. Now we have scenario that $R_{1} \subseteq S_{1}$ and $R$ is a radical extension of $R_{1}$ of height $1$. Clearly from the proof for height $h = 1$ we can see that there is a radical extension $S$ of $S_{1}$ and hence of $L$ which contains $R$.

We now come to the final result of this post which shows that solvability by radicals of a polynomial is not affected by extending the field of coefficients.

Theorem 6: Let $P(x)$ be a polynomial over field $F$. If $P(x)$ is solvable by radicals over $F$ then it is also solvable by radicals over any field extension $L \supseteq F$.

Let $R$ be a radical extension of $F$ containing a root $r$ of $P(x)$. Clearly by theorem 5, the field $R$ can be assumed to be contained in some radical extension $S$ of $L$ and hence $r \in S$. It thus follows that the radical extension $S$ of $L$ contains a root $r$ of $P(x)$ and hence $P(x)$ is solvable by radicals over $L$.

We have covered the groundwork regarding radical extensions and these results will be used in the next post to establish the fundamental theorem of natural irrationalities which was first proved by Abel.

Print/PDF Version

Abel and the Insolvability of the Quintic: Part 1

2013-12-07T12:15:00.000+05:30

Introduction

Most of the students come across the solution of linear and quadratic equations in their secondary classes. While the solution of a linear equation $ax + b = 0$ with $a, b$ being rational does not present any difficulties (because the solution $x$ itself turns out to be a rational number), a quadratic equation of the form $ax^{2} + bx + c = 0$ (with $a, b, c$ rational) does present significant challenges. For one thing the solution may not be rational and sometimes may not be even real. Usually one encounters the use of square roots to solve such an equation. Fortunately there is a standard formula for solving such equations $$x = \frac{-b \pm \sqrt{b^{2} - 4ac}}{2a}$$ so that the equation can be solved directly in terms of its literal coefficients.

Many mathematicians tried to extend these ideas to solve the equations of third and fourth degrees. Thus during the 16th century Cardano solved the cubic and Ferrari solved the quartic equation. Later in the 18th century Lagrange published his classic work "Reflexions sur la resolution algébrique des equations" in which he unified the existing methods of solving equations upto degree $4$. He hoped that unifying all the available approaches into one coherent theory would help in solving higher degree equations. But neither Lagrange nor any other mathematician was able to provide a solution to quintics (equations of degree $5$) or higher degree equations. Then in 1824 a young Nowergian mathematician Niels Henrik Abel proved that it is not possible to solve a quintic equation in the same way as it is possible to solve equations of degree $2, 3$ or $4$.

In this series of posts we will study the above mentioned result of Abel and its very tricky and non-obvious proof.

Solution of Algebraic Equations Through Radicals

Before we start onto a discussion of Abel's theorem, it is better to discuss the solution of quadratic and cubic equations and note some similarities in these approaches. Since we have already mentioned the quadratic formula for the solution of a quadratic equation we discuss the solution of cubic equations. In this regard, we follow the approach of Cardano.

Let the cubic equation be given by $$a_{0}x^{3} + a_{1}x^{2} + a_{2}x + a_{3} = 0\tag{1}$$ By dividing the equation with $a_{0}$ we can always assume that the coefficient of $x^{3}$ is $1$. Hence let's assume the equation is of the form $x^{3} + a_{1}x^{2} + a_{2}x + a_{3} = 0$ where $a_{1}, a_{2}, a_{3}$ are real or complex numbers. What we want here is a formula consisting of $a_{1}, a_{2}, a_{3}$ through which we can get the solution of the equation by substituting numerical values of $a_{1}, a_{2}, a_{3}$. Thus what we need is a general formula for expressing the root of the equation in terms of its literal coefficients. If we assume that $\alpha, \beta, \gamma$ are roots of the equation then we get $\alpha + \beta + \gamma = -a_{1}$ so that $$(\alpha + a_{1}/3) + (\beta + a_{1}/3) + (\gamma + a_{1}/3) = 0$$ and thus if we put $y = x + a_{1}/3$ we will get a cubic equation in $y$ whose sum of roots is $0$ so that the equation won't have a term containing $y^{2}$. Thus by simple linear substitution it is possible to reduce any cubic equation in the form $$x^{3} + ax + b = 0\tag{2}$$ We will solve this standard form of the cubic equation. Let's the assume that the solution is of the form $x = A + B$ so that $$x^{3} = (A + B)^{3} = A^{3} + B^{3} + 3AB(A + B)$$ which leads to $$x^{3} - 3ABx - (A^{3} + B^{3}) = 0\tag{3}$$ Comparing this with the original equation $(2)$ we get $3AB = -a, A^{3} + B^{3} = -b$ so that $A^{3}B^{3} = -a^{3}/27$. Therefore $A^{3}, B^{3}$ are the roots of $t^{2} + bt - (a^{3}/27) = 0$ i.e. $$ t = \dfrac{-b \pm \sqrt{b^{2} + \dfrac{4a^{3}}{27}}}{2} = -\frac{b}{2} \pm \sqrt{\frac{b^{2}}{4} + \frac{a^{3}}{27}}$$ and thus we get the solution for the cubic as $$x = A + B = \sqrt[3]{-\frac{b}{2} + \sqrt{\frac{b^{2}}{4} + \frac{a^{3}}{27}}} + \sqrt[3]{-\frac{b}{2} - \sqrt{\frac{b^{2}}{4} + \frac{a^{3}}{27}}}\tag{4}$$ The solution is thus seen to be composed of nested radicals. The square roots are supposed to generate two values and both of them are taken care of in the above formula. Similarly the cube roots are supposed to generate $3$ values each and thus the above expression seems to give $9$ values of $x$. However there is a constraint of $AB = -a/3$ which restricts our choices and we get only three values of $A, B$ and hence $3$ values of $x$. If $A, B$ represent one pair of values then the other two pairs are $A\omega, B\omega^{2}$ and $A\omega^{2}, B\omega$ where $\omega$ is an primitive cube root of unity given by $\omega^{2} + \omega + 1 = 0$.

If we observe carefully we find that the solution ultimately requires the use of roots of unity as well as a series of nested radicals. We thus say that a general cubic equation (by general we mean an equation with literal coefficients) can be solved by radicals. The solution of a general quartic equation is also composed of a series of nested radicals and roots of unity so that a quartic equation is also solvable by radicals.

However all attempts to solve equations of higher degrees met with failure and some people started suspecting that the problem itself is insoluble. While equations like $x^{5} - 2 = 0$ did have a solution in terms of radicals namely $x = \sqrt[5]{2}$ no such solution was forthcoming for a general quintic. While investigating this problem Abel tried to think of most general form of a solution by radicals and then figured out that such a solution for a quintic equation led to a contradiction. This way by a very long and clever argument he established in 1824 that a general quintic could not be solved by radicals.

Before we present Abel's proof it is better to describe the concept of "solvability by radicals" in a slightly more formal manner. To do that we need to understand the technicality of radicals appearing in the solution of cubic and quadratic equations. Thus for example if we consider a quadratic equation $x^{2} + ax + b = 0$ and its solution $x = (-a \pm \sqrt{a^{2} - 4b})/2$ we see that apart from the expression composed of the coefficients $a, b$ and algebraic operations $+, -, \times, /$ we also need a square root operation to generate new quantities like $\sqrt{a^{2} - 4b}$. The idea of a square root is handled very smartly by proposing the existence of a quantity $u$ such that $u^{2} = a^{2} - 4b$ and allowing the quantity $u$ to be operated with existing quantities $a, b$ using the usual algebraic operation of $+, -, \times, /$ and following the standard algebraical rules of these operations.

In effect we are starting with a field containing rational expressions in $a, b$ and then proposing a new element $u$ and making rational expressions in $u, a, b$. Also whenever possible we replace $u^{2}$ and higher powers of $u$ by rational expressions in $a, b, u$ so that any rational expression in $u, a, b$ is of the form $Au + B$ where $A, B$ are rational expressions in $a, b$. While dealing with a rational expression it is obvious that the coefficients must include rational numbers and to account for the roots of unity we propose that the coefficients in these rational expressions can be complex numbers. The field of rational expressions in literals $a, b$ with complex coefficients is denoted by $\mathbb{C}(a, b)$. When viewed in this way we see that the field of rational functions in $u, a, b$ i.e. $\mathbb{C}(u, a, b)$ includes $\mathbb{C}(a, b)$ and many other new elements of the form $Au + B$ with $A \neq 0$. We thus say that the field $\mathbb{C}(u, a, b)$ is an extension of the field $\mathbb{C}(a, b)$.

Radical Extensions

It is to be noted that if $(a^{2} - 4b)$ is a perfect square (i.e. square of some quantity in $\mathbb{C}(a, b)$) so that $u \in \mathbb{C}(a, b)$ then there is no field extension and we have $\mathbb{C}(u, a, b) = \mathbb{C}(a, b)$. Hence for the concept of field extensions to work it is necessary to posit a further constraint that there is no member $u \in \mathbb{C}(a, b)$ such that $u^{2} = a^{2} - 4b$. Such a quantity $u$ is then called a radical and the corresponding field extension $\mathbb{C}(u, a, b)$ is called a radical extension of the field $\mathbb{C}(a, b)$.

More formally we say that a field $R$ is a radical extension of height $1$ of a field $F$, if there exist $u \in R, a \in F$ and a prime number $p$ such that $u^{p} = a$ and there is no member in $F$ whose $p^{\text{th}}$ power is $a$ and every member of $R$ is a rational expression in $u$ and members of $F$.

In the above case we write $R = F(u)$. Unless otherwise stated the base field $F$ will be assumed to be a field of rational expressions in a finite number of literals with complex coefficients i.e. $F = \mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$. For completeness we say that $R$ is a radical extension of $F$ of height $0$ if $R = F$.

Using induction we now define radical extension with arbitrary height $h$ where $h$ is a positive integer. A field $R$ is said to be a radical extension of height $h$ of a field $F$, if there is another field $R_{1}$ such that $R$ is a radical extension of height $1$ of $R_{1}$ and $R_{1}$ is a radical extension of height $h - 1$ of base field $F$. Thus a radical extention $R$ of height $h$ of base field $F$ can be viewed as tower of radical extensions $F = R_{0}, R_{1}, R_{2}, \ldots, R_{h} = R$ such that each $R_{i}$ is a radical extension of height $1$ of $R_{i - 1}$.

Since each radical extension of height $1$ involves an element $a$ in base field and a prime $p$ such that $a$ is not a $p^{\text{th}}$ power in base field and a quantity $u$ in the extension field such that $u^{p} = a$, it follows that the existence of a radical extension $R = R_{h}$ of height $h$ of base field $F = R_{0}$ implies the existence of $h$ prime numbers $p_{1}, p_{2}, \ldots, p_{h}$ and elements $a_{1} \in R_{0}, a_{2} \in R_{1}, \cdots, a_{h} \in R_{h - 1}$ and $b_{1} \in R_{1}, b_{2} \in R_{2}, \cdots, b_{h} \in R_{h}$ such that $a_{i}$ is not a $p_{i}^{\text{th}}$ power in $R_{i - 1}$ and $b_{i}^{p_{i}} = a_{i}$. This is the way we express the concept of a nested radical in a formal fashion. Each level of nesting increases height of radical extension by $1$. From the definition above it is obvious that the property of "being a radical extension" is a transitive one in the sense that if a field $R$ is a radical extension of field $F$ and field $L$ is a radical extension of field $R$, then $L$ is also a radical extension of $F$.

If we observe the solution of cubic equation $x^{3} + ax + b = 0$ given above by Cardano, then we can see that it involves a square root $\sqrt{(b^{2}/4) + (a^{3}/27)} = u$ which can be said to lie in a radical extension of $\mathbb{C}(a, b)$ of height $1$ as we have $u^{2} \in \mathbb{C}(a, b)$ and for sure the expression $(b^{2}/4) + (a^{3}/27)$ is not a perfect square of any rational expression in $\mathbb{C}(a, b)$. If we call this radical extension $R_{1} = \mathbb{C}(u, a, b)$ then we can see that we need another radical extension of $R_{1}$, say $R_{2}$, to handle the cube roots involved in the formula for $x$. In fact we need two radical extensions $R_{2}$ and $R_{3}$ such that $u_{1} \in R_{2}$ and $u_{2} \in R_{3}$ such that $u_{1}^{3} = -(b/2) + u \in R_{1}$ and $ u_{2}^{3} = -(b/2) - u \in R_{1}$. We will later see that we can find a single radical extension $R'$ of $R_{1}$ which contains $x = u_{1} + u_{2}$.

Let us now define formally the concept of solvability of a polynomial equation by radicals. Let $P(x)$ be a polynomial with coefficients in a field $F$. Then $P(x)$ is said to be solvable by radicals over $F$ if there is a radical extension of $F$ which contains a root of $P(x)$. Using the quadratic formula we can say that the polynomial $x^{2} + ax + b$ is solvable by radicals over $\mathbb{Q}(a, b)$. Similarly from Cardano's formulas it is obvious that polynomial $x^{3} + ax + b$ is solvable by radicals over $\mathbb{Q}(a, b)$.

Abel did not have this terminology or notation of radical extensions but he used expressions like $R^{1/p}$ to denote radical quantities and was able to put forth his long argument without the use of a sufficiently general notation. Parallel to the concept of radical extensions of height $h$, Abel defined the concept of algebraic functions of order $k$.

An algebraic function of order $0$ is a rational expression of the coefficients of the given polynomial equation. The coefficients used to make this rational expression are complex numbers. Suppose we are dealing with a cubic equation $x^{3} + ax^{2} + bx + c = 0$. Then an algebraic function of order $0$ is a rational function $f_{0}(a, b, c)$ with complex coefficients. Next an algebraic function of order $1$ is a rational function of the form $f_{1}(f_{0}^{1/m}, a, b, c)$ where $f_{0}$ is an algebraic function of order $0$ and $m$ is a positive integer which can be assumed to be prime. This way Abel introduced radicals of type $f_{0}^{1/m}$. This process can be carried on inductively to define an algebraic function of order $k$. This shows the sheer brilliance of a young genius who solved a famous long standing problem which daunted the likes of Gauss and Lagrange.

The General Polynomial Equation

In what follows we will focus on the solution of the general polynomial equation of degree $n$ given by $$x^{n} - s_{1}x^{n - 1} + \cdots + (-1)^{n - 1}s_{n - 1}x + (-1)^{n}s_{n} = (x - x_{1})(x - x_{2})\cdots (x - x_{n})$$ where the literals (aka indeterminates) $x_{1}, x_{2}, \ldots, x_{n}$ are the $n$ distinct roots and $s_{1}, s_{2}, \ldots s_{n}$ are the elementary symmetric polynomials in roots $x_{i}$.

The field of rational expressions in $s_{1}, s_{2}, \ldots, s_{n}$ (i.e. coefficients of the polynomial equation) will be denoted by $\mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ and the field of rational expressions in the roots $x_{i}$ will be denoted by $\mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$. It is then then clear that $\mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ represents the field of symmetric rational expressions in literals $x_{1}, x_{2}, \ldots, x_{n}$. It thus follows that $\mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ is a proper subfield of $\mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$.

In this notation the problem of "solving the general polynomial equation by radicals" is equivalent to finding whether there exists a radical extension $R$ of $\mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$ which contains $x_{1}, x_{2}, \ldots, x_{n}$ and thus contains $\mathbb{C}(x_{1}, x_{2}, \ldots, x_{n})$. When $n = 1, 2, 3, 4$ it is known that such a radical extension exists. Abel showed that when $n \geq 5$ no such radical extension $R$ exists.

Ruffini's Work on Solvability of Equations

When the problem is formulated in this manner it becomes almost obvious that before we can solve this problem it is absolutely necessary to study the properties of radical extensions of field $\mathbb{C}(s_{1}, s_{2}, \ldots, s_{n})$. Since this base field contains all symmetric rational functions of $x_{1}, x_{2}, \ldots, x_{n}$ but the field of rational expressions in roots $x_{i}$ contains many asymmetric rational functions of the roots too we need to study how far the properties of symmetry can be carried off through a radical extension. Thus the problem is intimately related to the invariance of the members of a radical extension under a permutation of roots $x_{i}$.

Towards the end of 18th century, an Italian mathematician Paolo Ruffini studied the problem of solvability of general polynomial equations along the lines of considerations of invariance of expressions under permutations of its variables and published the first controversial proof of insolvability of quintic by radicals. Most of his peers did not understand his long (more than 500 pages) proof but Cauchy was able to notice the gems contained in his paper. There were some gaps in Ruffini's proof but his approach was essentially correct. Few years later Abel filled in these gaps and provided a correct proof.

Abel's fundamental idea was that if the root of an equation could be expressed as a radical expression in the coefficients of the polynomial then each such radical expression itself must be a rational function of the roots. Thus in case of a quadratic equation $x^{2} + ax + b = 0$ the radical expression $\sqrt{a^{2} - 4b}$ is either $x_{1} - x_{2}$ or $x_{2} - x_{1}$. This is a significant step in Abel's and Ruffini's proof and Ruffini assumed this without any proof. Abel provided a rigorous and detailed proof of this key step and generalized some of the results obtained by Ruffini and Cauchy on the number of values taken by a function under permutation of its variables. Using these results and assuming the most general radical expression for a root of the equation he was able to arrive at a contradiction. Abel's argument to obtain the desired contradiction is however very complicated and we will follow a simplified approach by Ruffini in these posts.

In the next post we will study various properties of radical extensions which are necessary to understand the arguments put forth by both Abel and Ruffini to solve the historically famous problem of solvability of algebraic equation by radicals.

Print/PDF Version

Teach Yourself Limits in 8 Hours: Part 4

2013-11-09T13:39:00.002+05:30

After dealing with various techniques to evaluate limits we now provide proofs of the results on which these techniques are founded. This material is not difficult but definitely somewhat abstract and may not be suitable for beginners who are more interested in learning techniques and solving limit problems. But those who are interested in the justification of these techniques must pay great attention to what follows.

Proofs of Rules of Limits

We provide proofs for some of the rules and let the reader provide proofs for remaining rules based on similar line of argument. First we start with rule dealing with inequalities:
If $f(x) \leq g(x)$ in the neighborhood of $a$ then $\lim_{x \to a}f(x) \leq \lim_{x \to a}g(x)$ provided both these limits exist.

Let $A = \lim_{x \to a}f(x), B = \lim_{x \to a}g(x)$ and we have to prove that $A \leq B$. Suppose that $A > B$ and let $\epsilon = (A - B)/2$. Then we know that there is a $\delta > 0$ such that $|f(x) - A| < \epsilon$ and $|g(x) - B| < \epsilon$ when $0 < |x - a| < \delta$. Then we have $$A - \epsilon < f(x) < A + \epsilon,\,\, B - \epsilon < g(x) < B + \epsilon$$ and noting that $A - \epsilon = B + \epsilon$ we see that $$g(x) < B + \epsilon = A - \epsilon < f(x)$$ which is contrary to the hypotheses $f(x) \leq g(x)$. Hence we must have $A \leq B$. Note that the same argument follows even if hypotheses are modified to $f(x) < g(x)$.

Translating this formal proof into common language we see that values of $f(x)$ are close to $A$ and those of $g(x)$ are near to $B$ and if $A > B$ there will be values of $f(x)$ which exceed values of $g(x)$ contrary to the hypotheses.

Next we establish the following rule of division:
$\displaystyle \lim_{x \to a}\frac{f(x)}{g(x)} = \dfrac{{\displaystyle \lim_{x \to a}f(x)}}{{\displaystyle \lim_{x \to a}g(x)}}$ provided that both the limits $\lim_{x \to a}f(x)$ and $\lim_{x \to a}g(x)$ exist and $\lim_{x \to a}g(x) \neq 0$.

Clearly since $\lim_{x \to a}g(x) = B \neq 0$ there exists a number $\delta_{1} > 0$ such that $$|g(x) - B| < \frac{|B|}{2}$$ whenever $0 < |x - a| < \delta_{1}$ so that $|g(x)| > |B|/2$ for $0 < |x - a| < \delta_{1}$. Next let $\lim_{x \to a}f(x) = A$. Then for any $\epsilon > 0$ there exist a $\delta_{2} > 0$ such that $$|f(x) - A| < \frac{|B|\epsilon}{4}$$ for $0 < |x - a| < \delta_{2}$. Further there is a $\delta_{3} > 0$ such that $$|g(x) - B| < \frac{|B|^{2}\epsilon}{4|A| + 1}$$ whenever $0 < |x - a| < \delta_{3}$. Let $\delta = \min(\delta_{1}, \delta_{2}, \delta_{3})$ and then for $0 < |x - a| < \delta$ we have $1/|g(x)| < 2/|B|$ so that \begin{align} \left|\frac{f(x)}{g(x)} - \frac{A}{B}\right| &= \left|\frac{Bf(x) - AB + AB - Ag(x)}{Bg(x)}\right|\notag\\ &= \left|\frac{f(x) - A}{g(x)} + \frac{A}{B}\frac{B - g(x)}{g(x)}\right|\notag\\ &\leq \frac{|f(x) - A|}{|g(x)|} + \frac{|A|}{|B|}\frac{|B - g(x)|}{|g(x)|}\notag\\ &< \frac{|B|\epsilon}{4}\cdot\frac{2}{|B|} + \frac{|A|}{|B|}\cdot\frac{|B|^{2}\epsilon}{4|A| + 1}\cdot\frac{2}{|B|}\notag\\ &< \frac{\epsilon}{2} + \frac{\epsilon}{2}\notag\\ &= \epsilon\notag \end{align} and therefore $f(x)/g(x) \to A/B$ as $x \to a$.

Going further we establish the Sandwich Theorem (or Squeeze Theorem):
If $f(x) \leq g(x) \leq h(x)$ in a neighborhood of $a$ and $\lim_{x \to a}f(x) = \lim_{x \to a}h(x) = L$ then $\lim_{x \to a}g(x) = L$.

Clearly for any $\epsilon > 0$ we have a $\delta > 0$ such that $$|f(x) - L| < \epsilon, |h(x) - L| < \epsilon$$ whenever $0 < |x - a| < \delta$. This implies that $$L - \epsilon < f(x) < L + \epsilon\text{ and }L - \epsilon < h(x) < L + \epsilon$$ Therefore $$L - \epsilon < f(x) \leq g(x) \leq h(x) < L + \epsilon$$ and hence $$L - \epsilon < g(x) < L + \epsilon$$ i.e. $|g(x) - L| < \epsilon$ so we get $\lim_{x \to a}g(x) = L$.

Finally we establish the rule of substitution in limits:
If $\lim_{x \to a}g(x) = b$ and $\lim_{x \to b}f(x) = L$ and $g(x) \neq b$ in a certain neighborhood of $a$ (except possibly at $a$) then $\lim_{x \to a}f\{g(x)\} = L$.

This I will present in the informal language itself. For $\lim_{x \to a}f\{g(x)\}$ to be equal to $L$ we must be able to get the values of $f(g(x))$ arbitrarily close to $L$ by choosing $x$ sufficiently close to $a$. Now we know that values of function $f$ can be made arbitrarily close to $L$ by choosing its argument sufficiently close to $b$ (because $\lim_{x \to b}f(x) = L$). Here in $f(g(x))$ the argument of $f$ is $g(x)$ and hence it follows that $f(g(x))$ can be made arbitrarily close to $L$ by making $g(x)$ sufficiently close to $b$ and since $\lim_{x \to a}g(x) = b$ this is possible by choosing $x$ sufficiently close to $a$. Note that the condition $g(x) \neq b$ is needed because the limit $\lim_{x \to b}f(x) = L$ assumes that we are dealing with values of $f$ when its argument is not equal to $b$. Hence while handling $f(g(x))$ it is essential that argument of $f$ namely $g(x)$ should not be equal to $b$.

Proof of Standard Limits

We first start with the trigonometric limit $$\lim_{x \to 0}\frac{\sin x}{x} = 1$$ We note that $(\sin (-x)) / (-x) = (\sin x) / x$ and hence it is sufficient to consider the case when $x \to 0+$. Let us then consider the case when $0 < x < \pi / 2$.

In the above figure we have a circle with center $O$ and radius $r$. $A, B$ are two points on the circle such that $\angle AOB = x$ and $AT$ is a tangent to the circle at $A$. Radius $OB$ extended meets tangent $AT$ at $T$ so that $OAT$ is a triangle. We can clearly see that $AT = r\tan x$ so that area of triangle $OAT$ is $\dfrac{1}{2}r^{2}\tan x$. We now see that \begin{align} &\text{area of triangle }OAB < \text{area of sector }OAB < \text{area of triangle }OAT\notag\\ &\Rightarrow \frac{1}{2}r^{2} \sin x < \frac{1}{2}r^{2}x < \frac{1}{2}r^{2}\tan x\notag\\ &\Rightarrow \sin x < x < \tan x = \frac{\sin x}{\cos x}\notag\\ &\Rightarrow \cos x < \frac{\sin x}{x} < 1\notag \end{align} Taking limits as $x \to 0+$ and noting $\lim_{x \to 0+}\cos x = 1$ we see that we have (by Sandwich Theorem) $\lim_{x \to 0+}\dfrac{\sin x}{x} = 1$

Next we focus on the algebraic limit $$\lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} = na^{n - 1}$$ In case $n$ is positive integer the result follows easily by Binomial theorem as follows: $$\lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} = \lim_{h \to 0}\frac{(a + h)^{n} - a^{n}}{h} = \lim_{h \to 0}\frac{(a^{n} + na^{n - 1}h + \cdots) - a^{n}}{h} = na^{n - 1}$$ If $n = 0$ then clearly the numerator vanishes and limit is $0$ so the formula is true in this case also. If $n$ is negative integer $n = -m$ so that $m$ is a positive integer, then $$\lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} = \lim_{x \to a}\frac{a^{m} - x^{m}}{(x - a)x^{m}a^{m}} = -\frac{ma^{m - 1}}{a^{2m}} = -ma^{-m - 1} = na^{n - 1}$$ To handle the case when $n = p/q$ is a fraction we restrict to case when $a > 0$ and $p, q$ are positive integers. First we note that $$\lim_{x \to a}\frac{x^{q} - a^{q}}{x - a} = qa^{q - 1}$$ so that the ratio $\dfrac{x^{q} - a^{q}}{x - a}$ is bounded but away from zero when $x$ is in certain neighborhood of $a$ (and since $\lim\limits_{x \to a}x^{q} = a^{q}$ it follows that $x^{q}$ is in a certain neighborhood of $a^{q}$). Replacing $x$ by $x^{1/q}$ and $a$ by $a^{1/q}$ we note that $\dfrac{x - a}{x^{1/q} - a^{1/q}}$ is bounded but away from zero if $x$ is in a certain neighborhood of $a$. It follows that $\dfrac{x^{1/q} - a^{1/q}}{x - a}$ is also bounded and away from zero when $x \to a$ so that there is a constant $K$ such that $$0 < \frac{x^{1/q} - a^{1/q}}{x - a} < K$$ Thus we have $$0 < |x^{1/q} - a^{1/q}| < K(x - a)$$ Taking limits when $x \to a$ and using Sandwich theorem we get $$\lim_{x \to a}x^{1/q} = a^{1/q}$$ We can now see that \begin{align} \lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} &= \lim_{x \to a}\frac{x^{p/q} - a^{p/q}}{x^{1/q} - a^{1/q}}\cdot \frac{x^{1/q} - a^{1/q}}{x - a}\notag\\ &= \lim_{y \to b}\frac{y^{p} - b^{b}}{y - b}\cdot\frac{y - b}{y^{q} - b^{q}}\text{ (putting }y = x^{1/q}, b = a^{1/q})\notag\\ &= pb^{p - 1}/qb^{q - 1} = \frac{p}{q}b^{p - q} = na^{n - 1}\notag \end{align} If $n$ is negative rational then the proof follows exactly along the same lines as the case for $n$ being a negative integer. In the above proof notice that it is absolutely essential that we establish that $\lim_{x \to a}x^{1/q} = a^{1/q}$ first and it requires a little bit of ingenuity in achieving this. Another simpler approach to prove this is to consider various possibilities for the limit $\lim_{x\to a} x^{1/q}$ and show that if this limit does not exist then $\lim_{x\to a} x$ does not exist. This is clearly a contradiction and hence the limit $\lim_{x\to a} x^{1/q}=b$ exists and is non-negative. Using the product rule of limits we can easily see that $b^{q} =a$ and hence $b=a^{1/q}$.

When $n$ is irrational then the definition of $x^{n}$ is dependent on the definitions of exponential and logarithm functions and hence we defer this part of the proof after the logarithmic and exponential limits are established.

To establish the logarithmic and exponential limits it is absolutely essential that we define them first. A typical easy route to their definition starts by defining $\log x$ as an integral. If we define $$\log x = \int_{1}^{x}\frac{dt}{t}$$ then we can see immediately that $\log 1 = 0$ and the derivative $(\log x)' = 1/x$ so that its derivative at $x = 1$ is $1$ and therefore we get $$\lim_{x \to 1}\frac{\log x - \log 1}{x - 1} = 1$$ or $$\lim_{h \to 0}\frac{\log(1 + h)}{h} = 1$$ Exponential function $\exp(x)$ or $e^{x}$ is defined as inverse of the logarithm function by relation $y = \exp(x)$ if $\log y = x$. So if we put $\log(1 + h) = x$ we get $h = e^{x} - 1$ and as $h \to 0$ we also have $x \to 0$. Thus $\lim_{h \to 0}(\log(1 + h))/h = 1$ implies that $$\lim_{x \to 0}\frac{x}{e^{x} - 1} = 1$$ so that the exponential limit is also established.

We return to the case of algebraic limit when $n$ is irrational. In this case $a$ must be positive otherwise $a^{n}$ is not defined. We have $x^{n} = \exp(n\log x)$ and $a^{n} = \exp(n\log a)$. Let $\log x = y$ and $\log a = b$ so that $x \to a$ implies $y \to b$. We now have \begin{align} \lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} &= \lim_{y \to b}\frac{e^{ny} - e^{nb}}{e^{y} - e^{b}}\notag\\ &= \lim_{y \to b}\frac{e^{ny} - e^{nb}}{n(y - b)}\cdot\frac{n(y - b)}{e^{y} - e^{b}}\notag\\ &= n\lim_{h \to 0}\frac{e^{n(b + h)} - e^{nb}}{nh}\cdot\frac{h}{e^{b + h} - e^{b}}\notag\\ &= n \lim_{h \to 0}\frac{e^{nb}(e^{nh} - 1)}{nh}\cdot\frac{h}{e^{b}(e^{h} - 1)}\notag\\ &= ne^{nb}/e^{b} = ne^{b(n - 1)} = n(e^{b})^{n - 1} = na^{n - 1}\notag \end{align} Next we establish the limit $\lim_{x \to \infty}\dfrac{\log x}{x^{a}} = 0$ for $a > 0$. This is based on the definition of $\log x$ as an integral. Let $0 < b < a$ and $x > 1$ then we can see that $t^{-1} < t^{b - 1}$ for $t \in (1, x]$ so that \begin{align} 0 &< \int_{1}^{x}\frac{1}{t}\,dt < \int_{1}^{x}t^{b - 1}\,dt\notag\\ \Rightarrow 0 &< \log x < \frac{x^{b} - 1}{b} < \frac{x^{b}}{b}\notag\\ \Rightarrow 0 &< \frac{\log x}{x^{a}} < \frac{1}{bx^{a - b}}\notag \end{align} Taking limits as $x \to \infty$ and noting that $a - b > 0$ we see that $$\lim_{x \to \infty}\dfrac{\log x}{x^{a}} = 0$$ The above proofs based on definitions of exponential and logarithm functions are not suitable for a beginner learning limits for the first time because they depend upon the higher level concepts of integral and derivative. Also these proofs make use of properties of exponential and logarithmic in an implicit fashion (these can not be proved here because of the constraint to keep the post to a reasonable length). However when one has got an understanding of derivative and integral then it is better to revisit these proofs which justify the use of fundamental logarithmic and exponential limits in solving various limit problems.

Before we proceed to discuss the justification of L'Hospital's Rule and power series expansions we say a few words on the justification of limits dealing with $\infty$. Most of the thumb rules given earlier for dealing with $\infty$ can be easily proved if one uses the definitions of limits properly. For example consider the rule which says that if $f(x) \to \infty, g(x) \to \infty$ as $x \to a$ then $f(x) + g(x) \to \infty$ as $x \to a$. This can be easily understood if we follow definitions. Clearly for any given $N > 0$ we have a neighborhood of $a$ in which $f(x) > N, g(x) > N$ so that $f(x) + g(x) > 2N > N$ so that $f(x) + g(x) \to \infty$. Readers should try to establish other thumb rules dealing with $\infty$. In this connection it is important that $f(x) > N$ and $g(x) > N$ do not give any idea about the values of $f(x) - g(x)$ and hence there is no thumb rule provided for dealing with $f(x) - g(x)$ (or $f(x)/g(x)$).

Proof of L'Hospital's Rule

To establish this rule we need to prove a fundamental result called Cauchy's Mean Value Theorem:
If $f(x), g(x)$ are continuous on $[a, b]$ and differentiable on $(a, b)$ and $f'(x), g'(x)$ never vanish for the same value of $x$ and $g(b) \neq g(a)$ then there is a $c \in (a, b)$ such that $$\frac{f(b) - f(a)}{g(b) - g(a)} = \frac{f'(c)}{g'(c)}$$ This can be proved by using Rolle's theorem. Let $$h(x) = f(b) - f(x) - \frac{f(b) - f(a)}{g(b) - g(a)}\{g(b) - g(x)\}$$ then we have $h(a) = h(b)$ and hence by Rolle's theorem there is a $c \in (a, b)$ for which $h'(c) = 0$ i.e. $$f'(c) = \frac{f(b) - f(a)}{g(b) - g(a)}g'(c)$$ Now $g'(c) \neq 0$ otherwise $f'(c)$ would also be zero and simultaneous vanishing of $f'(x), g'(x)$ is not allowed. Hence dividing by $g'(c)$ our result follows.

Next we come to L'Hospital's Rule:
If $f(x), g(x)$ are differentiable in a certain neighborhood of $a$ (but not necessarily at $a$), $\lim_{x \to a}f(x) = \lim_{x \to a}g(x) = 0$ and $\lim_{x \to a}\dfrac{f'(x)}{g'(x)} = L$ then $\lim_{x \to a}\dfrac{f(x)}{g(x)} = L$.

We will consider the case of $x \to a+$ (case $x \to a-$ can be handled similarly). Let us define $f(a) = g(a) = 0$ so that $f(x), g(x)$ are continuous at $a$. Since $f'(x)/g'(x) \to L$ as $x \to a+$ there is a neighborhood of $a$ of type $(a, b]$ where $g'(x) \neq 0$. Now $f(x), g(x)$ are continuous on $[a, b]$ and differentiable in $(a, b)$. Also $g(a) \neq g(b)$ otherwise by Rolle's theorem $g'(x)$ would vanish somewhere in $(a, b)$. Hence by Cauchy's Mean Value theorem we have $$\frac{f(b)}{g(b)} = \frac{f(b) - f(a)}{g(b) - g(a)} = \frac{f'(c)}{g'(c)}$$ for some $c \in (a, b)$. If $b \to a+$ then $c \to a+$ and hence $$\lim_{b \to a+}\frac{f(b)}{g(b)} = \lim_{c \to a+}\frac{f'(c)}{g'(c)} = L$$ and L'Hospital's Rule is established.

There is another version of L'Hospital's Rule which is not so widely known and we state (and prove) it below:
If $f(x), g(x)$ are differentiable in a certain neighborhood of $a$ (but not necessarily at $a$), $\lim_{x \to a}\dfrac{1}{g(x)} = 0$ (equivalently $|g(x)| \to \infty$ as $x \to a$) and $\lim_{x \to a}\dfrac{f'(x)}{g'(x)} = L$ then $\lim_{x \to a}\dfrac{f(x)}{g(x)} = L$.

Thus in order to apply this version of L'Hospital's Rule we need to check that $|g(x)| \to \infty$ as $x \to a$. No check apart from differentiability is needed for the function $f(x)$. We prove this rule using the $\epsilon, \delta$ definition of limit. Since $f'(x)/g'(x) \to L$ as $x \to a$, it follows that $f'(x)/g'(x)$ is bounded in a certain deleted neighborhood of $a$. Therefore there is a number $A > 0$ and a number $\delta_{1} > 0$ such that $$\left|\frac{f'(x)}{g'(x)}\right| < A$$ for all $x$ with $0 < |x - a| < \delta_{1}$. Let $\epsilon > 0$ be arbitrary. Then we know that there is a $\delta_{2} > 0$ such that $$\left|\frac{f'(x)}{g'(x)} - L\right| < \frac{\epsilon}{3}$$ for all $x$ with $0 < |x - a| < \delta_{2}$.

Let's consider the ratio $$\frac{f(x) - f(y)}{g(x) - g(y)}$$ where both $x, y$ are distinct points lying in deleted neighborhood $(a - \delta_{3}, a + \delta_{3}) - \{a\}$ of $a$ and $\delta_{3} = \min(\delta_{1}, \delta_{2})$. We can express this ratio as $$\frac{f(x) - f(y)}{g(x) - g(y)} = \dfrac{\dfrac{f(x)}{g(x)} - \dfrac{f(y)}{g(x)}}{1 - \dfrac{g(y)}{g(x)}}$$ and from this equation we obtain $$\frac{f(x)}{g(x)} = \frac{f(x) - f(y)}{g(x) - g(y)}\left(1 - \frac{g(y)}{g(x)}\right) + \frac{f(y)}{g(x)} = \frac{f'(c)}{g'(c)}\left(1 - \frac{g(y)}{g(x)}\right) + \frac{f(y)}{g(x)}$$ where $c$ is some number between $x$ and $y$. Let $y$ have a fixed value in the deleted neighborhood $(a - \delta_{3}, a + \delta_{3}) - \{a\}$ of $a$. Then we know that $g(y)/g(x) \to 0, f(y)/g(x) \to 0$ as $x \to a$. Hence there are positive numbers $\delta_{4}, \delta_{5}$ such that $$\left|\frac{g(y)}{g(x)}\right| < \frac{\epsilon}{3A}$$ for all $x$ with $0 < |x - a| < \delta_{4}$ and $$\left|\frac{f(y)}{g(x)}\right| < \frac{\epsilon}{3}$$ for all $x$ with $0 < |x - a| < \delta_{5}$. Let $\delta = \min(\delta_{3}, \delta_{4}, \delta_{5})$ and let $0 < |x - a| < \delta$ then we have \begin{align} \left|\frac{f(x)}{g(x)} - L\right| &= \left|\frac{f'(c)}{g'(c)}\left(1 - \frac{g(y)}{g(x)}\right) + \frac{f(y)}{g(x)} - L\right|\notag\\ &= \left|\frac{f'(c)}{g'(c)} - L - \frac{f'(c)}{g'(c)}\cdot\frac{g(y)}{g(x)} + \frac{f(y)}{g(x)}\right|\notag\\ &\leq \left|\frac{f'(c)}{g'(c)} - L\right| + \left|\frac{f'(c)}{g'(c)}\right|\left|\frac{g(y)}{g(x)}\right| + \left|\frac{f(y)}{g(x)}\right|\notag\\ &< \frac{\epsilon}{3} + A\cdot\frac{\epsilon}{3A} + \frac{\epsilon}{3}\notag\\ &= \epsilon\notag \end{align} It now follows that $f(x)/g(x) \to L$ as $x \to a$ and the proof of the second version of L'Hospital's Rule is complete. In both the versions of L'Hospital's Rule it is easy to prove that if $f'(x)/g'(x)$ tends to $\infty$ (or to $-\infty$) then so does $f(x)/g(x)$. Note however that if $f'(x)/g'(x)$ does not tend to a limit (meaning that it oscillates finitely or infinitely) then we can't conclude anything about the limit of $f(x)/g(x)$ and in this case the L'Hospital's Rule does not apply.

Proof of Taylor's Theorem

We will use the L'Hospital's rule to prove a version of Taylor's theorem which forms the basis of the technique of using series expansions for evaluating certain limits:
If $f^{(n)}(a)$ exists then we have $$f(a + h) = f(a) + hf'(a) + \cdots + \frac{h^{n - 1}}{(n - 1)!}f^{(n - 1)}(a) + \frac{h^{n}}{n!}\{f^{(n)}(a) + \rho\}$$ where $\rho$ tends to $0$ with $h$.

Clearly this will be established if we show that $$\lim_{h \to 0}\dfrac{{\displaystyle f(a + h) - \sum_{k = 0}^{n - 1}\dfrac{h^{k}}{k!}f^{(k)}(a)}}{h^{n}} = \frac{f^{(n)}(a)}{n!}$$ Using L'Hospital's rule repeatedly we see that \begin{align} \lim_{h \to 0}\dfrac{{\displaystyle f(a + h) - \sum_{k = 0}^{n - 1}\dfrac{h^{k}}{k!}f^{(k)}(a)}}{h^{n}} &= \lim_{h \to 0}\dfrac{{\displaystyle f'(a + h) - \sum_{k = 1}^{n - 1}\dfrac{h^{k - 1}}{(k - 1)!}f^{(k)}(a)}}{nh^{n - 1}}\notag\\ &= \lim_{h \to 0}\dfrac{{\displaystyle f''(a + h) - \sum_{k = 2}^{n - 1}\dfrac{h^{k - 2}}{(k - 2)!}f^{(k)}(a)}}{n(n - 1)h^{n - 2}}\notag\\ &= \lim_{h \to 0}\dfrac{{\displaystyle f^{(n - 1)}(a + h) - \sum_{k = n - 1}^{n - 1}\dfrac{h^{k - n + 1}}{(k - n + 1)!}f^{(k)}(a)}}{n(n - 1)\cdots 3\cdot 2\cdot h}\notag\\ &= \lim_{h \to 0}\dfrac{f^{(n - 1)}(a + h) - f^{(n - 1)}(a)}{n! h}\notag\\ &= \frac{f^{(n)}(a)}{n!}\notag \end{align} It is important to note that in each application of the L'Hospital's rule the conditions for applicability of the rule are satisfied.

To conclude this series of posts on the techniques of evaluation of limits and their justification I just want to give a few remarks. Most of the time evaluation of a limit requires manipulation of the expression under consideration using various algebraic, trigonometric, exponential or logarithmic identities. The goal of these manipulations is to reduce the expression to a form which can take advantages of the standard limit formulas and rules. Sometimes one has to take the route of inequalities and Sandwich theorem to establish a limit (as we have done for the trigonometric limit).

If the manipulations don't lead to a form which allows us to take advantage of standard limits then we try to check if the expression meets the conditions of L'Hospital's rule and if so then this rule is applied in the hope that the new expression after the application of L'Hospital's rule will be simpler than the original one and will be amenable to evaluation by standard limits. As a last resort we may have to use the series expansions which are highly powerful but require some skill in the manipulation of power series. Once the reader has got familiarity with enough limit problems it is instructive for him to approach theoretical problems (such as this one from MSE) which require the use of definition of limits in innovative ways.

In this series of posts I have not dealt with the limit of sequences deliberately as they are not part of introductory calculus and require theoretical tools of different nature. However some limits of this type can be handled via the following obvious result: if $\lim_{x \to \infty}f(x) = L$ then $\lim_{n \to \infty}f(n) = L$ where $n$ takes integral values only.

Print/PDF Version

Teach Yourself Limits in 8 Hours: Part 3

2013-11-08T15:12:00.005+05:30

In last two posts we have developed basic concepts and rules of limits. Continuing our journey further we now introduce certain powerful tools which help us in evaluation of limits of complicated expressions. We start with the simplest technique first.

Limits using Logarithms

In case we need to evaluate the limit of an expression of type $\{f(x)\}^{g(x)}$ then we can take logarithm and then the evaluation of limits becomes simpler. We will first illustrate the technique through an example and then provide the justification.

Let us suppose we wish to evaluate the limit $$\lim_{x \to 0+} \left(2 - e^{\arcsin^{2}\sqrt{x}}\right)^{3/x}$$ Let us write $$f(x) = \left(2 - e^{\arcsin^{2}\sqrt{x}}\right)^{3/x}$$ and then we have $$\log f(x) = \frac{3}{x}\log \left(2 - e^{\arcsin^{2}\sqrt{x}}\right)$$ We will first calculate the limit of $\log f(x)$ as $x \to 0+$. \begin{align} \lim_{x \to 0}\log f(x) &= \lim_{x \to 0+}\frac{3}{x}\log \left(2 - e^{\arcsin^{2}\sqrt{x}}\right)\notag\\ &= 3\lim_{x \to 0+}\frac{\log(1 + 1 - e^{\arcsin^{2}\sqrt{x}})}{1 - e^{\arcsin^{2}\sqrt{x}}}\cdot\frac{1 - e^{\arcsin^{2}\sqrt{x}}}{x}\notag\\ &= 3\lim_{x \to 0+} 1\cdot \frac{1 - e^{\arcsin^{2}\sqrt{x}}}{x}\notag\\ &= -3 \lim_{x \to 0+} \frac{e^{\arcsin^{2}\sqrt{x}} - 1}{x}\notag\\ &= -3\lim_{x \to 0+} \frac{e^{\arcsin^{2}\sqrt{x}} - 1}{{\arcsin^{2}\sqrt{x}}}\cdot\frac{{\arcsin^{2}\sqrt{x}}}{x}\notag\\ &= -3\lim_{x \to 0+}1\cdot\left(\frac{\arcsin\sqrt{x}}{\sqrt{x}}\right)^{2}\notag\\ &= -3\lim_{y \to 0+}\left(\frac{y}{\sin y}\right)^{2}\text{ (putting }y = \arcsin\sqrt{x})\notag\\ &= -3\cdot 1^{2} = -3\notag \end{align} Then we have $\lim_{x \to 0+}f(x) = \exp\left(\lim_{x \to 0+}\log f(x)\right) = e^{-3}$. The justification of this last step is provided by the rule of substitution (provided in last post) of limits namely:

If $\lim_{x \to a}g(x) = b$ and $\lim_{x \to b}f(x) = L$ and further $g(x) \neq b$ in a deleted neighborhood of $a$, then $\lim_{x \to a}f\{g(x)\} = L$.

Replacing $g(x)$ by $\log g(x)$ and setting $f(x) = e^{x}$ in the above rule we see that $\lim_{x \to a}\log g(x) = b$ and $\lim_{x \to b}e^{x} = e^{b}$ and then $\lim_{x \to a}e^{\log g(x)} = e^{b}$ or $$\lim_{x \to a}g(x) = \exp\left(\lim_{x \to a}\log g(x)\right)$$ The example limit problem is taken from MSE.

To summarize, in order to evaluate limit of an expression of type $\{f(x)\}^{g(x)}$ we take logarithm of this expression and evaluate the limit of resulting expression. If the limit of this resulting expression is $L$ then the limit of original expression is $e^{L}$.

Next we study the most overused and highly powerful technique which involves concept of differentiation.

L'Hôpital's Rule

This rule is also written with simplified spelling as L'Hospital's Rule and we state its exact statement below:

Version 1: If $f(x), g(x)$ are functions differentiable in a certain neighborhood of $a$ (except possibly at $a$) and $\lim_{x \to a}f(x) = \lim_{x \to a}g(x) = 0$ and $\lim\limits_{x \to a}\dfrac{f'(x)}{g'(x)} = L$ then we have $\lim\limits_{x \to a}\dfrac{f(x)}{g(x)} = L$.
Version 2: If $f(x), g(x)$ are functions differentiable in a certain neighborhood of $a$ (except possibly at $a$) and $|g(x)| \to \infty$ as $x \to a$ and $\lim\limits_{x \to a}\dfrac{f'(x)}{g'(x)} = L$ then we have $\lim\limits_{x \to a}\dfrac{f(x)}{g(x)} = L$.

For both the versions of the rule it is also true that if $f'(x)/g'(x)$ tends to $\infty$ (or to $-\infty$) as $x \to a$ then so does $f(x)/g(x)$. From the statement of the rule above we can see that this rule is applicable to specially troublesome cases when substitution leads to zero numerators and zero denominators (or when the denominator tends to $\pm\infty$). In the informal / crude language we say that rule can be applied in case of indeterminate forms $0/0$ and $\text{(anything)}/(\pm\infty)$. I have mentioned the word "indeterminate forms" because this is prevalent in most calculus texts although I find this term very confusing and a source of many troubles for the beginners. I prefer to state that one should try to apply L'Hospital's rule if one is supposed to evaluate the limit of an expression of type $f(x)/g(x)$ where both numerator and denominator tend to $0$ or the denominator tends to $\infty$ in absolute value as $x \to a$. Moreover the rule will work only when $\lim_{x \to a}f'(x)/g'(x)$ exists (or is $\pm\infty$).

The rule is not fool-proof because under the same conditions it may happen that $\lim_{x \to a}f(x)/g(x)$ exists but $\lim_{x \to a}f'(x)/g'(x)$ does not exist. For example the rule fails when we try to use it to evaluate the limit $$\lim_{x \to 0}\frac{x^{2}\sin(1/x)}{x}$$ This is the case when both numerator and denominator tend to $0$. And if we apply L'Hospital Rule we get the ratio $$\frac{2x\sin(1/x) - \cos(1/x)}{1}$$ which does not tend to a limit as $x \to 0$. On the other hand the original function gets simplified to $x\sin(1/x)$ and this tends to $0$ as $x \to 0$. Another example of the failure of the rule is $$\lim_{x \to \infty}\frac{x}{x + \sin x}$$ Here both numerator and denominator tend to $\infty$ and the limit is easily seen to be $1$. But if we apply L'Hospital's Rule we get the ratio $$\frac{1}{1 + \cos x}$$ and this does not tend to a limit as $x \to \infty$ simply because of the fact that denominator vanishes for infinitely many large values of $x$. The proofs of both versions of the rule will be provided in the next post.

This technique, although powerful, has some shortcomings. Sometimes differentiation can generate complicated expression and depending upon the problem multiple applications of this rule may be needed which may lead to very complicated expressions as a result of multiple differentiation. Another problem might be that differentiation itself can lead to an expression where the evaluation of limit does not seem possible. In my opinion the rule should be used only when other techniques described so far have failed. Jumping to L'Hospital's Rule for any and every limit problem is not a good idea.

I will start with the classic example where other methods don't seem to work namely $$\lim_{x \to 0}\frac{\sin x - x}{x^{3}}$$ Clearly if we use L'Hospital's rule we get $$\lim_{x \to 0}\frac{\sin x - x}{x^{3}} = \lim_{x \to 0}\frac{\cos x - 1}{3x^{2}} = -\frac{1}{3}\lim_{x \to 0}\frac{1 - \cos x}{x^{2}} = -\frac{1}{3}\cdot\frac{1}{2} = -\frac{1}{6}$$ Another example of the application of L'Hospital's rule is presented in an earlier post. Next example is from MSE: $$\lim_{x \to 1}\frac{x}{x - 1} - \frac{1}{\log x}$$ Clearly we have \begin{align} \lim_{x \to 1}\frac{x}{x - 1} - \frac{1}{\log x} &= \lim_{x \to 1}\frac{x\log x - x + 1}{(x - 1)\log x}\notag\\ &= \lim_{x \to 1}\dfrac{\log x}{\log x + \dfrac{x - 1}{x}}\notag\\ &= \lim_{x \to 1}\dfrac{1}{1 + \dfrac{1}{x}\cdot\dfrac{x - 1}{\log x}}\notag\\ &= \dfrac{1}{1 + \dfrac{1}{1}\cdot 1} = \frac{1}{2}\notag \end{align} In the last step we have used $\lim_{x \to 1}(x - 1)/\log x = \lim_{h \to 0}h/\log(1 + h) = 1$.

Next example is of a rather different kind. Suppose that $f''(a)$ exists. We will show that in this case $$\lim_{h \to 0}\frac{f(a + h) - 2f(a) + f(a - h)}{h^{2}} = f''(a)$$ Note that the limit on LHS may exist even though $f''(a)$ may not exist hence this should not be taken as a definition of $f''(a)$. To establish this result we first note that existence of $f''(a)$ implies the existence of $f'(x)$ in a neighbourhood of $a$ and we can see that in this limit both numerator and denominator tend to zero as $h \to 0$. Hence we can apply L'Hospital's rule to get $$\lim_{h \to 0}\frac{f(a + h) - 2f(a) + f(a - h)}{h^{2}} = \lim_{h \to 0}\frac{f'( a + h) - f'(a - h)}{2h}$$ Note that we can't apply L'Hospital Rule once more as we don't know whether $f''(x)$ exists in a neighbourhood of $a$ or not. We only have the existence of $f''(a)$. The way to proceed now is that we have to use the definition of $f''(a)$: $$f''(a) = \lim_{h \to 0}\frac{f'(a + h) - f'(a)}{h}$$ so that $$f'(a + h) = f'(a) + h\{f''(a) + \rho\}$$ where $\rho$ tends to zero with $h$. Similarly $$f'(a - h) = f'(a) - h\{f''(a) + \rho'\}$$ where $\rho' \to 0$ as $h \to 0$. Thus we have \begin{align} f'(a + h) - f'(a - h) &= 2hf''(a) + h(\rho - \rho')\notag\\ \Rightarrow \frac{f'(a + h) - f'(a - h)}{2h} &= f''(a) + \frac{\rho - \rho'}{2}\notag \end{align} Letting $h \to 0$ we get $$\lim_{h \to 0}\frac{f'(a + h) - f'(a - h)}{2h} = f''(a)$$ A related problem is available at MSE.

Series Expansions

We finally describe the technique of using series expansions. This technique is most easily applied for problems where the limit variable tends to zero. The case $x \to a$ can be replaced by $h \to 0$ by putting $x = a + h$ so effectively the technique applies to this scenario also. The technique is based on the following result (which is more popularly known as Taylor's Theorem):

If $f^{(n)}(a)$ exists then $$f(a + h) = f(a) + hf'(a) + \frac{h^{2}}{2!}f''(a) + \cdots + \frac{h^{n - 1}}{(n - 1)!}f^{(n - 1)}(a) + \frac{h^{n}}{n!}\{f^{(n)}(a) + \rho\}$$ where $\rho$ is an expression in $a, h$ which tends to $0$ with $h$.

If we put $a = 0$ and replace $h$ by $x$ we see that if $f^{(n)}(0)$ exists then $$f(x) = f(0) + xf'(0) + \frac{x^{2}}{2!}f''(0) + \cdots + \frac{x^{n - 1}}{(n - 1)!}f^{(n - 1)}(0) + \frac{x^{n}}{n!}\{f^{(n)}(0) + \rho\}$$ Using this formula (which can and will be proved using L'Hospital's rule in next post) we have the following series expansions: \begin{align} e^{x} &= 1 + x + \frac{x^{2}}{2!} + \frac{x^{3}}{3!} + \cdots + \frac{x^{n}}{n!}\{1 + \rho\}\notag\\ \sin x &= x - \frac{x^{3}}{3!} - \frac{x^{5}}{5!} + \cdots + (-1)^{n}\frac{x^{2n + 1}}{(2n + 1)!}\{1 + \rho\}\notag\\ \cos x &= 1 - \frac{x^{2}}{2!} - \frac{x^{4}}{4!} + \cdots + (-1)^{n}\frac{x^{2n}}{(2n)!}\{1 + \rho\}\notag\\ \log(1 + x) &= x - \frac{x^{2}}{2} + \frac{x^{3}}{3} - \frac{x^{4}}{4} + \cdots + (-1)^{n - 1}\frac{x^{n}}{n}\{1 + \rho\}\notag\\ \tan^{-1}(x) &= x - \frac{x^{3}}{3} + \frac{x^{5}}{5} - \frac{x^{7}}{7} + \cdots + (-1)^{n}\frac{x^{2n + 1}}{2n + 1}\{1 + \rho\}\notag \end{align} In all these expressions $\rho \to 0$ as $x \to 0$. For any other specific function one may have to derive its series expansion by calculating successive derivatives. Depending upon a specific problem we decide how many terms of the expansion are needed.

Let us apply this technique to evaluate the following limit $$\lim_{x \to 0}\frac{\tan x\tan^{-1}x - x^{2}}{x^{6}}$$ We first need to get the expansion of $\tan x$. After some labor we can see that all the even derivatives of $\tan x$ at $x = 0$ are $0$ and calculation of first few odd derivatives at $x = 0$ we get $$\tan x = x + \frac{x^{3}}{3} + \frac{2}{15}x^{5} + \rho x^{6}$$ and we already have $$\tan^{-1}x = x - \frac{x^{3}}{3} + \frac{x^{5}}{5} + \rho' x^{6}$$ so that $$\tan x \tan^{-1}x = x^{2} + \left(\frac{2}{15} + \frac{1}{5} - \frac{1}{9}\right)x^{6} + \text{ terms with higher powers of }x\text{ with }\rho, \rho'$$ It now follows that $$\tan x \tan^{-1}x - x^{2} = \frac{2}{9}x^{6} + \text{ terms with higher powers of }x\text{ with }\rho, \rho'$$ and therefore we can see that $$\lim_{x \to 0}\frac{\tan x\tan^{-1}x - x^{2}}{x^{6}} = \frac{2}{9}$$ In practice we don't write the terms containing $\rho$ and manipulate the power series using various algebraical rules of addition, multiplication and division and assume that there will be terms containing expressions composed of $\rho$ and higher powers of $x$ at the end. I have shown that taking limits via the series expansion is justified because in reality the number of terms in series in finite and last term contains higher power of $x$ and also contains expressions like $\rho$ which tend to $0$ with $x$.

Most calculus texts try to treat this technique in a very non-rigorous way and prefer to use infinite series and their manipulations which needs some justification using high level concepts (like uniform convergence) whereas in the above I have described a finite series expansion based on a version of Taylor's Theorem.

This technique of power series expansions should be used only when all the other techniques (rules of limits, L'Hospital's Rule) fail.

Print/PDF Version

Teach Yourself Limits in 8 Hours: Part 2

2013-11-07T11:57:00.002+05:30

After the definitions and basic examples in Part 1, we now focus on the rules of evaluation of limits which will be highly useful in solving various limit problems. We will postpone the proofs of these rules to the last post in the series to avoid any distraction.

Rules of Limits

In the following rules we assume that the functions described are defined in a certain neighborhood of $a$ except possibly at point $a$. All the relations between the functions (if any) also hold in this neighborhood of point $a$ (except possibly at point $a$).

1) If $f(x) \leq g(x)$ and both $\lim_{x \to a}f(x)$ and $\lim_{x \to a}g(x)$ exist then we have $$\lim_{x \to a}f(x) \leq \lim_{x \to a}g(x)$$ Even if $f(x) < g(x)$ then also we get $$\lim_{x \to a}f(x) \leq \lim_{x \to a}g(x)$$ so that taking limits always weakens the inequalities.

2) $\displaystyle \lim_{x \to a}k = k$ where $k$ is a constant (i.e. not dependent on $x$).

3) $\displaystyle \lim_{x \to a}\{f(x) \pm g(x)\} = \lim_{x \to a}f(x) \pm \lim_{x \to a}g(x)$ provided both the limits on the right side exist.

4) $\displaystyle \lim_{x \to a}\{f(x) \cdot g(x) \} = \lim_{x \to a}f(x) \cdot \lim_{x \to a}g(x)$ provided both the limits on the right side exist.

5) $\displaystyle \lim_{x \to a}\frac{f(x)}{g(x)} = \dfrac{{\displaystyle \lim_{x \to a}f(x)}}{{\displaystyle \lim_{x \to a}g(x)}}$ provided both the limits on the right side exist and $\lim_{x \to a}g(x) \neq 0$

6) If $f(x) \leq g(x) \leq h(x)$ and $\displaystyle \lim_{x \to a}f(x) = \lim_{x \to a}h(x) = L$ then $\displaystyle \lim_{x \to a}g(x) = L$. The inequalities can be strict without any change in conclusion. This result is also known as Sandwich Theorem or Squeeze Theorem.

7) If $\displaystyle \lim_{x \to a}g(x) = b$ and $\lim_{x \to b}f(x) = L$ and $g(x) \neq b$ in a certain neighborhood of $a$ (except possibly at $a$) then $\displaystyle \lim_{x \to a}f\{g(x)\} = L$. This is known as the rule of substitution.

There are similar rules for the scenarios when $x \to \infty$ and $x \to -\infty$ which the reader can easily formulate for himself. These simple rules are almost always provided in all introductory calculus texts but are never used with explicit knowledge by beginners. Most students underestimate the power of these rules and prefer to use high level techniques (to be discussed in next post).

We now provide some simple applications of these rules. Since $\lim_{x \to a}x = a$ it follows that if $$f(x) = a_{0}x^{n} + a_{1}x^{n - 1} + \cdots + a_{n - 1}x + a_{n}$$ is a polynomial then $\lim_{x \to a}f(x) = f(a)$ so that in case of polynomials we can simply put $x = a$ and evaluate the limit. This technique works only because of the rules above and may not work for functions other than polynomials.

Going further using rule 5) we can see that the technique of substituting $x = a$ will work for rational functions (ratio of two polynomials) also provided that after the substitution the denominator does not vanish.

Again we can easily establish (directly using the definition) that $\lim_{x \to 0}|x| = 0$. And from basic trigonometry we have the inequality $|\sin x| \leq |x|$ for $|x| \leq \pi/2$. Thus we see that $-|x| \leq \sin x \leq |x|$ and by Sandwich theorem we get the result $$\lim_{x \to 0}\sin x = 0$$ Next we can proceed as follows: \begin{align} \lim_{x \to 0}\cos x &= \lim_{x \to 0} 1 - 2\sin^{2}(x/2)\notag\\ &= 1 - 2\lim_{x \to 0}\sin^{2}(x/2)\notag\\ &= 1 - 2\lim_{y \to 0}\sin^{2}y\text{ (putting }y = x/2\text{)}\notag\\ &= 1 - 2\lim_{y \to 0}\sin y\lim_{y \to 0}\sin y\notag\\ &= 1 - 2\cdot 0\cdot 0 = 1\notag \end{align} Using these limits we can see that \begin{align} \lim_{x \to a}\sin x &= \lim_{h \to 0}\sin(a + h)\text{ (putting }x = a + h)\notag\\ &= \lim_{h \to 0}\sin a\cos h + \cos a\sin h\notag\\ &= \sin a\lim_{h \to 0}\cos h + \cos a\lim_{h \to 0}\sin h\notag\\ &= \sin a\cdot 1 + \cos a\cdot 0 = \sin a\notag \end{align} and in the same way we can prove that $\lim_{x \to a}\cos x = \cos a$.

It now follows that if we have to evaluate the limit of a rational function in $\sin x, \cos x$ we can use the technique of substituting $x = a$ provided this substitution does not lead to a zero denominator.

Standard Limits

We have seen that for certain functions evaluation of a limit is equivalent to substituting $x = a$ in the given expression for the function provided such a substitution does not lead to a zero denominator. But most of the limit problems one encounters in calculus often lead to zero denominators and for such problems we make use of the following standard limits:

1) If $x^{n}$ is defined in a neighborhood of $a$ (and also at point $a$) then $$\lim_{x \to a}\frac{x^{n} - a^{n}}{x - a} = na^{n - 1}$$ provided $a^{n - 1}$ is also defined.

The above rule helps in calculating various limits related to algebraic functions. In some simpler cases it might to be easier to do algebraic manipulations (like rationalization to remove radicals) rather than using this formula. From the above formula it also follows that \begin{align} \lim_{x \to a}x^{n} &= \lim_{x \to a}(x - a) \cdot\frac{x^{n} - a^{n}}{x - a} + a^{n}\notag\\ &= (a - a)\cdot na^{n - 1} + a^{n} = a^{n}\notag \end{align} provided that $a^{n}$ is defined.

2) $\displaystyle \lim_{x \to 0}\frac{\sin x}{x} = 1$

This is the fundamental limit used to evaluate limits involving trigonometric functions. Normally this is used in conjuction with various trigonometric identities and manipulation based on them. For example \begin{align} \lim_{x \to 0}\frac{1 - \cos x}{x^{2}} &= \lim_{x \to 0}\dfrac{2\sin^{2}(x/2)}{x^{2}}\notag\\ &= 2\lim_{x \to 0}\dfrac{\sin^{2}(x/2)}{(x/2)^{2}}\frac{(x/2)^{2}}{x^{2}}\notag\\ &= 2\cdot 1\cdot \frac{1}{4} = \frac{1}{2}\notag \end{align} 3) $\displaystyle \lim_{x \to 0}\frac{e^{x} - 1}{x} = 1$

This result handles limits involving exponential functions and has to be used in conjuction with the properties of exponential function. Using this formula we can derive another important result: \begin{align} \lim_{x \to 0}\frac{a^{x} - 1}{x} &= \lim_{x \to 0}\frac{e^{x\log a} - 1}{x}\notag\\ &= \lim_{x \to 0}\frac{e^{x\log a} - 1}{x\log a}\frac{x\log a}{x}\notag\\ &= 1\cdot \log a = \log a\notag \end{align} 4) $\displaystyle \lim_{x \to 0}\frac{\log (1 + x)}{x} = 1$

5) $\displaystyle \lim_{x \to \infty}\frac{\log x}{x^{a}} = 0$ for all $a > 0$. It has a counterpart namely that
$\displaystyle \lim_{x \to \infty}\frac{x^{a}}{e^{x}} = 0$

6) $e^{x} \to \infty$ as $x \to \infty$ and $\log x \to \infty$ as $x \to \infty$

Using the above results we can see that \begin{align} \lim_{x \to a}e^{x} &= \lim_{h \to 0}e^{a + h}\notag\\ &= \lim_{h \to 0}e^{a}\cdot e^{h}\notag\\ &= e^{a}\lim_{h \to 0}\left\{h\cdot\left(\frac{e^{h} - 1}{h}\right) + 1\right\}\notag\\ &= e^{a}(0\cdot 1 + 1) = e^{a}\notag \end{align} and if $a > 0$ then \begin{align} \lim_{x \to a}\log x &= \lim_{h \to 0}\log(a + h)\notag\\ &= \lim_{h \to 0}\log\left\{a\left(1 + \frac{h}{a}\right)\right\}\notag\\ &= \lim_{h \to 0}\left\{\log a + \log\left(1 + \frac{h}{a}\right)\right\}\notag\\ &= \log a + \lim_{h \to 0}\frac{h}{a}\cdot\dfrac{\log\left(1 + \dfrac{h}{a}\right)}{\dfrac{h}{a}}\notag\\ &= \log a + 0\cdot 1 = \log a\notag \end{align} From the above we can see that subtitution $x = a$ is allowed in case of logarithmic and exponential functions also provided that we take care of the fact that logarithm is defined for positive numbers only. Using the previous results of this kind we formulate our thumb rule as follows:

To evaluate the limit of an expression (consisting of trigonometric, logarithmic, exponential and algebraical functions and arithmetical operations) when $x \to a$, it is OK to substitute $x = a$ provided that such substitution does not lead to an undefined expression (like zero denominator, square root of negative number, $\log$ of zero or a negative number, etc). Also in case of exponential expression it is important that either the base or the exponent must be a constant. To handle expressions like $\{f(x)\}^{g(x)}$ it is important to recast them in the form $\exp\{g(x)\log(f(x))\}$ and then perform substitution.

However as will be seen in the examples to follow, most usual limit problems will involve expressions where substitution leads to undefined expressions. Hence we have to do certain manipulations to transform the expression into a form which allows us to take advantage of the above standard limits (note that all standard limits lead to undefined expression if we use direct substitution). This manipulation requires a kind of skill which comes with practice.

Handling Infinity ($\infty$)

Before we proceed to demonstrate the effectiveness of standard limit formulas it is necessary to clarify some misconceptions about the use of $\infty$ in limit problems. This particular aspect of limit problem dealing with $\infty$ is the most confusing to beginners and I have seen ample examples of confusion on MSE. Here are some simple thumb rules to follow and thereby steer clear of any confusion:

$\infty$ is not a number and hence can't be used with common operations like $+, -, \times, /, = $ like real numbers.
$\infty$ has a meaning in certain contexts only because of special definitions which give it a meaning in that context.
Rules for substituting $x = a$ mentioned in last subsection apply only for $x \to a$ scenario and not for $x \to \pm \infty$ scenarios
If $f(x) \to \infty$ (or $f(x) \to -\infty$) as $x \to a$ then $\displaystyle \lim_{x \to a}\dfrac{1}{f(x)} = 0$
However, if $\displaystyle \lim_{x \to a}f(x) = 0$ then it does not necessarily follow that $1/f(x) \to \infty$ (or $1/f(x) \to -\infty$) as $x \to a$. If $f(x)$ maintains a constant sign, say positive (or negative) as $x \to a$ then $1/f(x) \to \infty$ (or $1/f(x) \to -\infty$) as $x \to a$. If $f(x)$ does not maintain a constant sign as $x \to a$, then $1/f(x)$ oscillates infinitely as $x \to a$.
In many cases the algebraical manipulation becomes simpler if the limit $x \to \infty$ (or $x \to -\infty$) is replaced by $\lim_{h \to 0+}$ (or $\lim_{h \to 0-}$) by putting $x = 1/h$.
If $f(x) \to \infty$ as $x \to a$ and $k > 0$ then $kf(x) \to \infty$ as $x \to a$ and if $k < 0$ then $kf(x) \to -\infty$ as $x \to a$.
If $f(x) \to \infty$ and $g(x) \to \infty$ as $x \to a$ then $f(x) + g(x) \to \infty$ as $x \to a$.
If $f(x) \to \infty$ and $g(x) \to \infty$ as $x \to a$ then $f(x) \cdot g(x) \to \infty$ as $x \to a$.

The last three rules of thumb are important and they have counterparts for $-\infty$ version which the reader can formulate himself. One may ask why we don't have similar rules for $f(x) - g(x)$ and $f(x)/g(x)$. Just to clarify, each of the above thumb rules we have mentioned can be proved rigorously (and will be done in the last post of this series) using definition of limit and these proofs are not applicable to these cases. To provide an example let $f(x) = 2/x, g(x) = 1/x$ then $f(x) - g(x) = 1/x \to \infty$ as $x \to 0+$. But if $f(x) = 1/x, g(x) = 1/x$ then $f(x) - g(x) = 0$ and this tends to $0$ when $x \to 0+$.

Another question which might come to mind about handling of $kf(x)$ is what happens when $k = 0$. Clearly when $k = 0$ then $kf(x) = 0$ and thus $\lim_{x \to a}kf(x) = 0$ irrespective of the nature and behavior of $f(x)$. On the other hand if we have $f(x) \to 0$ as $x \to a$ and $g(x) \to \infty$ as $x \to a$ then we can't say anything about limit of $f(x)g(x)$ as $x \to a$. We will see that in such cases often the manipulation of the expression allows us to use some standard limits.

Examples

I will provide some examples which may seem tough but can be handled easily with the above rules and standard limits.

1) $\displaystyle \lim_{x \to 0}\frac{1 - \cos(1 - \cos x)}{x^{4}}$
We can proceed as follows: \begin{align} \lim_{x \to 0}\frac{1 - \cos(1 - \cos x)}{x^{4}} &= \lim_{x \to 0}\frac{1 - \cos(2\sin^{2}(x/2))}{x^{4}}\notag\\ &= \lim_{x \to 0}\frac{2\sin^{2}(\sin^{2}(x/2))}{x^{4}}\notag\\ &= 2\lim_{x \to 0}\dfrac{\sin^{2}(\sin^{2}(x/2))}{\sin^{4}(x/2)}\frac{\sin^{4}(x/2)}{x^{4}}\notag\\ &= 2\lim_{x \to 0}\left(\dfrac{\sin(\sin^{2}(x/2))}{\sin^{2}(x/2)}\right)^{2}\left(\dfrac{\sin(x/2)}{(x/2)}\right)^{4}\frac{(x/2)^{4}}{x^{4}}\notag\\ &= 2\cdot 1^{2}\cdot 1^{4}\cdot\frac{1}{16} = \frac{1}{8}\notag\end{align} This seemingly tough example is taken from G. H. Hardy's "A Course of Pure Mathematics".

2) $\displaystyle \lim_{x \to a+}\frac{\log(x - a)}{\log(e^{x} - e^{a})}$
Clearly we have \begin{align} L &= \lim_{x \to a+}\frac{\log(x - a)}{\log(e^{x} - e^{a})}\notag\\ &= \lim_{h \to 0+}\frac{\log h}{\log(e^{a + h} - e^{a})}\notag\\ &= \lim_{h \to 0+}\frac{\log h}{\log\{e^{a}(e^{h} - 1)\}}\notag\\ &= \lim_{h \to 0+}\frac{\log h}{a + \log(e^{h} - 1)}\notag\\ &= \lim_{h \to 0+}\dfrac{\log h}{a + \log\left(h\cdot\dfrac{e^{h} - 1}{h}\right)}\notag\\ &= \lim_{h \to 0+}\dfrac{\log h}{a + \log h + \log\left(\dfrac{e^{h} - 1}{h}\right)}\notag\\ &= \lim_{h \to 0+}\dfrac{1}{\dfrac{a}{\log h} + 1 + \dfrac{1}{\log h}\cdot\log\left(\dfrac{e^{h} - 1}{h}\right)}\notag\\ &= \frac{1}{0 + 1 + 0\cdot 0} = 1\notag \end{align} Here if $h \to 0+$ and $y = 1/h$ then $\log h = \log (1/y) = -\log y$ and $y \to \infty$ so that $\log y \to \infty$. It follows that $\log h \to -\infty$ as $h \to 0+$ and hence $1/\log h \to 0$ as $h \to 0+$.

This problem is taken from MSE.
3) $\displaystyle \lim_{x \to 0}\left(\frac{1}{\log(x + \sqrt{1 + x^{2}})} - \frac{1}{\log(1 + x)}\right)$
I presented the following solution on MSE: \begin{align} L &= \lim_{x \to 0}\left(\frac{1}{\log(x + \sqrt{1 + x^{2}})} - \frac{1}{\log(1 + x)}\right)\notag\\ &= \lim_{x \to 0}\frac{\log(1 + x) - \log(x + \sqrt{1 + x^{2}})}{\log(x + \sqrt{1 + x^{2}})\log(1 + x)}\notag\\ &= \lim_{x \to 0}\frac{\log(1 + x) - \log(x + \sqrt{1 + x^{2}})}{\log(1 - 1 + x + \sqrt{1 + x^{2}})\log(1 + x)}\notag\\ &= \lim_{x \to 0}\dfrac{\log(1 + x) - \log(x + \sqrt{1 + x^{2}})}{{\displaystyle \left(x + \sqrt{1 + x^{2}} - 1\right)\dfrac{\log\left\{1 + \left(x + \sqrt{1 + x^{2}} - 1\right)\right\}}{x + \sqrt{1 + x^{2}} - 1}\cdot x\cdot\dfrac{\log(1 + x)}{x}}}\notag\\ &= \lim_{x \to 0}\dfrac{\log(1 + x) - \log(x + \sqrt{1 + x^{2}})}{{\displaystyle \left(x + \sqrt{1 + x^{2}} - 1\right)1\cdot x\cdot 1}}\notag\\ &= \lim_{x \to 0}\dfrac{\log(1 + x) - \log(x + \sqrt{1 + x^{2}})}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\log(1 + x) - \log\left(\dfrac{\sqrt{1 + x^{2}} - x}{\sqrt{1 + x^{2}} - x}\cdot\left(x + \sqrt{1 + x^{2}}\right)\right)}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\log(1 + x) + \log\left(\sqrt{1 + x^{2}} - x\right)}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\log\left\{(1 + x)\left(\sqrt{1 + x^{2}} - x\right)\right\}}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\log\left\{\sqrt{1 + x^{2}} - x + x\sqrt{1 + x^{2}} - x^{2}\right\}}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\log\left\{1 - 1 + \sqrt{1 + x^{2}} - x + x\sqrt{1 + x^{2}} - x^{2}\right\}}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\log\left\{1 + \left(\sqrt{1 + x^{2}} - x + x\sqrt{1 + x^{2}} - x^{2} - 1\right)\right\}}{\left(\sqrt{1 + x^{2}} - x + x\sqrt{1 + x^{2}} - x^{2} - 1\right)}\notag\\ &\,\,\,\,\,\,\,\,\,\,\,\,\,\times\dfrac{\left(\sqrt{1 + x^{2}} - x + x\sqrt{1 + x^{2}} - x^{2} - 1\right)}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\left(\sqrt{1 + x^{2}} - x + x\sqrt{1 + x^{2}} - x^{2} - 1\right)}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0}\dfrac{\left(x^{2} + x\sqrt{1 + x^{2}} - x + \sqrt{1 + x^{2}} - 2x^{2} - 1\right)}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= \lim_{x \to 0} 1 + \dfrac{\sqrt{1 + x^{2}} - 2x^{2} - 1}{x \left(x + \sqrt{1 + x^{2}} - 1\right)}\notag\\ &= 1 + \lim_{x \to 0}\dfrac{1 + x^{2} - (2x^{2} + 1)^{2}}{x \left(1 + x^{2} - (1 - x)^{2}\right)}\frac{\sqrt{1 + x^{2}} + 1 - x}{\sqrt{1 + x^{2}} + 2x^{2} + 1}\notag\\ &= 1 + \lim_{x \to 0}\dfrac{-3x^{2} - 4x^{4}}{2x^{2}}\cdot\frac{2}{2} = 1 - \frac{3}{2} = -\frac{1}{2}\notag \end{align} If we observe the steps above carefully we will find that the logarithmic limit has been used smartly to get rid of the logarithm function itself and the expression is transformed into an algebraic one. The calculation of algebraic limits is far simpler than those containing logarithms and exponentials and in this particular example we have used rationalization to get rid of radicals.

Misuse of Rules of Limits

Before concluding this post I would like to warn against the misuse of rules of limits which deal with operations of $+, -, \times, /$. Each of these rules involves two functions and limits of both the functions must exist for the rule to be applicable. A common misuse is to apply the rules even when limit of one function does not exist.

Another very popular misuse of the rules is the following (and there are many variations on it):
Let $\lim_{x \to a}h(x) = A$ and $$\lim_{x \to a}\frac{f(x)}{g(x) + h(x)} = \lim_{x \to a}\frac{f(x)}{g(x) + A}\textbf{ (wrong)}$$ This step is wrong!! Rules of limits allow us to split one limit operation on a complex expression into multiple limit operations applied to each smaller part of the expression connected by $+, -, \times, /$. Unless the split of the limit operation is not done completely till the smaller parts of expression one can't replace smaller parts by their limits. In the above example the valid step is like this: $$\lim_{x \to a}\frac{f(x)}{g(x) + h(x)} = \dfrac{{\displaystyle \lim_{x \to a}f(x)}}{{\displaystyle \lim_{x \to a}g(x) + A}}\textbf{ (correct)}$$ provided $\lim_{x \to a}g(x)$ exists and is not equal to $-A$.

Let us consider the following two examples: $$\lim_{x \to 0}\frac{x}{x + \log(1 + x)} = \lim_{x \to 0}\frac{x}{0 + \log(1 + x)} = 1\textbf{ (wrong)}$$ $$\lim_{x \to 0}\frac{x}{x^{2} + \log(1 + x)} = \lim_{x \to 0}\frac{x}{0 + \log(1 + x)} = 1\textbf{ (wrong)}$$ Both the examples use faulty manipulation as they replace $x$ and $x^{2}$ by the limit $0$ without splitting the limit operation till smaller parts. Out of these two examples first one has the wrong answer whereas (surprise!!) the second one has correct answer. People who are so fond of such replacements say that the replacement of $x^{2}$ by $0$ is valid because $x^{2}$ is of smaller order compared to $\log(1 + x)$ and hence can be safely put as $0$ whereas the same can't be done for $x$ (in first example) because it is of same order as that of $\log(1 + x)$.

I prefer to avoid such vague talk of orders and follow the right approach to both examples:
$$\lim_{x \to 0}\frac{x}{x + \log(1 + x)} = \lim_{x \to 0}\dfrac{1}{1 + \dfrac{\log(1 + x)}{x}} = \frac{1}{1 + 1} = \frac{1}{2}\textbf{ (correct)}$$ $$\lim_{x \to 0}\frac{x}{x^{2} + \log(1 + x)} = \lim_{x \to 0}\dfrac{1}{x + \dfrac{\log(1 + x)}{x}} = \frac{1}{0 + 1} = 1\textbf{ (correct)}$$ In both the examples the direct split of limits for division operation is not possible as the limit of denominator is zero and hence an algebraic manipulation is needed (division of both numerator and denominator by $x$) and then the limit operation is split across all parts (not shown above) and each part replaced by their limit to get the right answer.

However there are two occasions where it is possible to substitute an expression by its limit without affecting the final result:

1) If $h(x) = f(x) \pm g(x)$ and $\lim_{x \to a}g(x) = L$ exists then the behavior of $h(x)$ as $x \to a$ is same as the behavior of $f(x)$ as $x \to a$.

This means that if $f(x)$ tends to a limit (or diverges to $\pm\infty$, or oscillates finitely or infinitely) as $x \to a$, then so does $h(x)$ and hence while trying to find the limit of $h(x)$ we can substitute $g(x)$ by its limit $L$ without any problem.

2) If $h(x) = f(x) \cdot g(x)$ or $h(x) = f(x)/g(x)$ and $\lim_{x \to a}g(x) = L \neq 0$ exists then the behavior of $h(x)$ as $x \to a$ is same as the behavior of $f(x)$ as $x \to a$.

Note that the condition $L \neq 0$ is necessary here. Hence if $h(x) = f(x)\cdot g(x)$ or $h(x) = f(x)/g(x)$ and $\lim_{x \to a}g(x) = L \neq 0$ then we can replace $g(x)$ by its limit $L$ without affecting the calculation for limit of $h(x)$ as $x \to a$.

In the next post we will study certain advanced techniques for solving limit problems.

Print/PDF Version

Teach Yourself Limits in 8 Hours: Part 1

2013-11-06T14:08:00.000+05:30

Introduction

While looking at certain limit problems posed in math.stackexchange.com (henceforth to be called MSE) I found that most beginners studying limits are living in a fantasy world consisting of vague notions, infinities and what not. I too had my share of such experiences during my time as a student learning calculus but I was lucky enough to get over with this phase very quickly through the help of "A Course of Pure Mathematics".

Regarding the answers posted on MSE I found that most of the answers although correct were not suitable for beginners studying limits. Some answers suggested that their authors themselves had the same vague notions but they somehow managed to avoid their pitfalls. Some other answers were using sophisticated techniques which involved deeper concepts than the concept of limits itself. And there were some heated arguments favoring one approach over another.

Therefore I decided to write a series of posts providing a step by step approach to solve limit problems encountered in an introductory calculus course. I have tried to split the whole topic into $4$ posts and I believe that the gist of each post can be assimilated in not more than $2$ hours and that's the logic behind the title of this series.

Contrary to my policy on this blog, I will prefer to avoid rigorous/formal proofs of the various results which I present here. This is mainly because a beginner in calculus may not be that interested in proofs and presenting these proofs at the start might serve as a detractor in learning the basic techniques. However some of the proofs will be provided in last post in the series for the sake of completeness.

Concept of a Limit

Limit is a very simple idea which can be used to study the behavior of a function when its argument takes values around a given value. Roughly speaking there are occasions when it makes sense to study the behavior of a function $f$ defined in a neighborhood of a certain point $a$ (but not necessarily at that point $a$, for example the function $f(x) = 1/x$ around point $x = 0$). For study of such situations the concept of limit was introduced. We try to figure out if the values of a function $f(x)$ tend to lie around a certain particular value when the value of $x$ lies around a certain value. The formal definition of a limit is bit clumsy but still needs to be provided:

Limit as $x \to a$

Let $f$ be a function defined in a certain neighborhood of point $a$ (but not necessarily at $a$). Then a number $L$ is said to be the limit of $f(x)$ as $x$ tends to $a$ (written symbolically as $\lim\limits_{x \to a}f(x) = L$ or $f(x) \to L$ as $x \to a$) if for any given number $\epsilon > 0$ it is possible to choose a corresponding number $\delta > 0$ such that $|f(x) - L| < \epsilon$ whenever $0 < | x - a| < \delta$.

The above definition is a bit complicated and hence some detailed explanation and remarks are necessary. First of all it should be noted that the limit of the function $f(x)$ as $x \to a$ (sometimes also called limit of $f$ at $a$) has has nothing to do with the value of $f$ at $a$ but has everything to do with the values of $f$ near $a$. The part $0 < |x - a| < \delta$ ensures that $x \neq a$ and also covers all values of $x$ which are near $a$ (the distance between $x$ and $a$ being less than $\delta$). We can thus say that a limit of a function is not necessarily a value of the function, but it is defined using the values of the function. At the same time the limit $L$ may be one of the values of the function $f$ (say for example when the function $f(x) = L$ is a constant function).

The primary objective of the concept of limit is to study the behavior of a function $f$ near some specific point $a$ and we can loosely say that it tries to find out a pattern/trend in the values of $f$. The pattern sought after is to check whether all the values of the function are near some specific number $L$ when we consider values of $x$ near specific point $a$. If we can ensure that all values of $f$ can be made to lie as near to $L$ as we please for all values of $x$ sufficiently near to $a$ then only we say that the limit of $f$ is $L$ as $x \to a$. Thus in the definition above the inequality $|f(x) - L| < \epsilon$ serves as a goal (and this in reality is a tough goal which has to be achieved for every arbitrary $\epsilon > 0$) and our means to achieve the goal is to take $x$ sufficiently close to $a$ (or better say that we take $x$ as close to $a$ as is needed to achieve the goal) and this closeness is measured in terms of $\delta$.

As one can easily observe the definition is formulated in terms of a check. It allows us to check whether a given number $L$ is or is not the limit of a function $f(x)$ when $x \to a$. Prima facie, it does not allow us to figure out (or calculate) the limit of a function. However the above definition has turned out to be very fruitful and it allows us to derive various useful results regarding limits which can then be effectively used to calculate limit of a function. Also one should note that the basic prerequisites for $\lim_{x \to a}f(x)$ to exist is that $f(x)$ must be defined in a certain neighborhood of $a$ (except possibly at $a$).

As a simple example we can show that $$\lim_{x \to a} x = a$$ This is more or less obvious if we use the definition and note that here $\delta = \epsilon$ is sufficient.

Another point to note is that if $\lim_{x \to a}f(x) = L$ exists then the function $f(x)$ is bounded in a certain neighborhood of $a$. This is easy to follow if we put $\epsilon = 1$ in the definition and note that there will be a $\delta > 0$ such that $0 < |x - a| < \delta$ implies that $|f(x) - L| < 1$ i.e. $L - 1 < f(x) < L + 1$ and clearly this shows that $f(x)$ is bounded in the interval $(a - \delta, a + \delta)$.

Also note that when $x \to a$ we may have the case that $x$ takes values greater than $a$. This is denoted by $x \to a+$. If $x$ takes values less than $a$ then we write $x \to a-$. These give rise to left-hand and right-hand limits of $f(x)$ as $x \to a$. Thus we say that $\lim\limits_{x \to a+}f(x) = L$ if for any $\epsilon > 0$ it is possible to find a $\delta > 0$ such that $|f(x) - L| < \epsilon$ whenever $ 0 < x - a < \delta$. And we write $\lim\limits_{x \to a-}f(x) = L$ if for any $\epsilon > 0$ it is possible to find a $\delta > 0$ such that $|f(x) - L| < \epsilon$ whenever $ 0 < a - x < \delta$.

Looking at the definitions of limits given above we see that $\lim_{x \to a}f(x)$ exists if and only if both right-hand limit $\lim_{x \to a+}f(x)$ and and left-hand limit $\lim_{x \to a-}f(x)$ exist and are equal. These concepts of left-hand and right-hand limits are useful when the definition of a function is different for values of $x < a$ and $x > a$.

Limit as $x \to \infty$ (or as $x \to -\infty$)

It is also important to discuss another limit to study the behavior of a function for large values of its argument. Then we define as follows:

Let $f$ be a function defined on an interval of type $(a, \infty)$ i.e. $f$ be defined for all values of $x$ satisfying $x > a$. Then a number $L$ is said to be the limit of $f(x)$ as $x \to \infty$ (and written symbolically as $\lim\limits_{x \to \infty}f(x) = L$) if for any given number $\epsilon > 0$ it is possible to find a corresponding number $N > 0$ such that $|f(x) - L| < \epsilon$ whenever $x > N$.

Let $f$ be a function defined on an interval of type $(-\infty, a)$ i.e. $f$ be defined for all values of $x$ satisfying $x < a$. Then a number $L$ is said to be the limit of $f(x)$ as $x \to -\infty$ (and written symbolically as $\lim\limits_{x \to -\infty}f(x) = L$) if for any given number $\epsilon > 0$ it is possible to find a corresponding number $N < 0$ such that $|f(x) - L| < \epsilon$ whenever $x < N$.

It must be clearly understood that the symbol $\infty$ used in above definitions has a meaning only in the context of these definitions and is not a number which can be operated upon via $+, -, \times, /$ (or many other operations applicable to numbers). Ignoring this fact is the source of all confusion prevailing in introductory calculus. In general the symbol $\infty$ is given a meaning via special definition such as above and its use is valid only within the context of any such definition.

We now establish the fundamental limits $$\lim_{x \to \infty} \frac{1}{x} = 0,\, \lim_{x \to -\infty}\frac{1}{x} = 0$$ Clearly in order that the first limit is $0$ we must be able to find a number $N > 0$ corresponding to any given number $\epsilon > 0$ such that $|1/x - 0| < \epsilon$ whenever $x > N$. Now we can see that if $x > N > 0$ then $|1/x - 0| = 1/x$ and this can be made smaller than $\epsilon$ if $x > 1/\epsilon$. Thus we can set $N = 1/\epsilon$ and the definition above allows us to say that $\lim_{x \to \infty} 1/x = 0$. In a similar manner we can handle $\lim_{x \to -\infty}1/x = 0$.

Non-existence of a Limit

It is important to understand that there may be scenarios where we are not able to find any number $L$ satisfying the definition of limit given earlier. In this case we say that the limit of a function $f(x)$ does not exist as $x \to a$(or $x \to \infty$ or $x \to -\infty$ as the case may be). We also need to study various ways (via examples) in which a function may fail to have a limit. We have the following possibilities (these are given for the case $ x \to a$ but the reader may formulate the corresponding scenarios for $x \to a+, x \to a-, x \to \infty$ and $x \to -\infty$):

It may happen that for any given number $M > 0$ it is possible to find a number $\delta > 0$ such that $f(x) > M$ whenever $0 < |x - a|< \delta$. In this case we write $f(x) \to \infty$ as $x \to a$ (note that this is another use of symbol $\infty$).
It may happen that for any given number $M < 0$ it is possible to find a number $\delta > 0$ such that $f(x) < M$ whenever $0 < |x - a| < \delta$. In this case we write $f(x) \to -\infty$ as $x \to a$.
None of the above happens. In this case we say that the function $f(x)$ oscillates as $x \to a$. If $f(x)$ is bounded in neighborhood of $a$ then we say that $f(x)$ oscillates finitely otherwise we say that $f(x)$ oscillates infinitely.

We offer certain simple examples. It is not difficult to show that $1/x \to \infty$ as $x \to 0+$ and $1/x \to -\infty$ as $x \to 0-$. The examples for oscillating functions are bit tricky to understand.

Let us observe the function $f(x) = \sin(1/x)$ in the neighborhood of $x = 0$. Lets first analyze the case for $x \to 0+$. If we put $x = 2/(4n + 3)\pi$ then $f(x) = -1$ and if $x = 2/(4n + 1)\pi$ then $f(x) = 1$. Hence we can see further that given any $\delta > 0$ there exist some values of $x$ with $0 < x < \delta$ for which $f(x) = 1$ and some other values of $x$ say $x'$ again satisfying $0 < x' < \delta$ for which $f(x') = -1$. Thus we don't have any number $L$ which can satisfy the limit condition of $|f(x) - L| < \epsilon$ for all $x$ with $0 < x - a < \delta$. Same is the case when $x \to 0-$. Since $\sin(1/x)$ is bounded the function oscillates finitely.

The same example can be modified to give a function which oscillates infinitely as $x \to 0$. Clearly we see that $1/x \to \infty$ as $x \to 0+$ hence $f(x) = (1/x)\sin(1/x)$ is unbounded and we can show (using arguments similar to those given in last paragraph) that $f(x)$ oscillates infinitely as $x \to 0$.

It is instructive to study a similar function $f(x) = x \sin(1/x)$. This function tends to limit $L = 0$ as $x \to 0$. This is because $|f(x) - L| = |x\sin(1/x)| = |x||\sin(1/x)| \leq |x|$ and hence the expression $|f(x) - L|$ can be made less than $\epsilon$ by making $|x| < \epsilon$ and thus we can choose $\delta = \epsilon$ and satisfy the definition of limit.

We don't offer a plethora of examples here as it would unnecessarily increase the length of the post. Whatever basic concepts are required have been presented and we have provided very simple examples in which a beginner won't face any difficulty. Before concluding this post we summarize the important results and examples we have developed in this post:

For $\lim_{x \to a}f(x)$ to exist it is absolutely essential that

$f(x)$ is defined in a certain neighborhood of $a$ (except possibly at $a$)
$\lim_{x \to a-}f(x) = \lim_{x \to a+}f(x)$

If $\lim_{x \to a}f(x)$ exists then $f(x)$ is bounded in a certain neighborhood of $a$
$\displaystyle \lim_{x \to a}x = a$
$\displaystyle \lim_{x \to \infty}\frac{1}{x} = 0,\, \lim_{x \to -\infty}\frac{1}{x} = 0$
$1/x \to \infty$ as $x \to 0+$ and $1/x \to -\infty$ as $x \to 0-$
Function $f(x) = \sin(1/x)$ oscillates finitely as $x \to 0$
Function $f(x) = (1/x)\sin(1/x)$ oscillates infinitely as $x \to 0$.
$\lim_{x \to 0}x\sin(1/x) = 0$

In the next post we will study certain limit formulas and their applications.

Print/PDF Version

Cavalieri's Principle and its Applications

2013-10-26T13:01:00.000+05:30

Introduction

In this post we will discuss something is which is very elementary and fascinating, yet not available in a high school curriculum. More precisely we will study a part of solid geometry related to calculation of volume of solids. In so doing we will need the famous Cavalieri's Principle which relates volumes of two solids under certain conditions.

Cavalieri's Principle

The Cavalieri's Principle states that:
If two solids lie between two parallel planes and any plane parallel to these planes intersects both the solids into cross sections of equal areas then the two solids have the same volume.

Intuitively the principle is easy to understand once one recognizes that the volume of a solid can be found by slicing the solid into multiple parts through a set of parallel planes and then adding the volume of each small part. The idea is that if parts are sufficiently thin (distance between the parallel planes is very small) then the volume of each part can be approximated by "area of cross section of each part" times "height of each part". The approximation becomes better and better as the parts become thinner and thinner. The formal justification of this principle lies in Integral Calculus but we won't go into that and assume that reader is content with the intuitive idea of cutting solids into thin parts.

As a first application we will establish the formulas for volume of a pyramid and a cone. The basic idea is to first establish the formula for the volume of prism. A prism is a simple solid consisting of two congruent triangular bases parallel to each other and and remaining three surfaces are obtained as planes passing through corresponding sides of the triangular bases. A prism $ABCA'B'C'$ is shown in the figure below.

The volume of a prism is easy to find. We just need to construct a square of area equal to the triangular base of the prism and develop a right cuboid on top of it with height same as that of the prism. Clearly the cuboid and the prism lie between same parallel planes (planes passing through their bases) and each plane parallel to the base cuts cross sections of equal areas from the prism and cuboid. Therefore by Cavalieri's principle the volume of the prism and the cuboid are equal. Thus the volume of a prism is given as $$\displaystyle V_{\text{prism}} = A_{\text{prism}} \times h_{\text{prism}}$$ where $V, A, h$ denote the volume, area of base and height of the prism respectively. Next we derive the formula for the volume of a pyramid.

Volume of a Pyramid and a Cone

We first start with the simplest case where the base of pyramid is a triangle. Such a solid is more properly called a tetrahedron which consists of four vertices, six edges and four triangular faces. We first establish the following property of a tetrahedron.

Two tetrahedrons with same base area and same height are of equal volume.

In the figure below we have two tetrahedrons $ABCD$ and $PQRS$ such that base triangles $ABC$ and $PQR$ have same area. Also the height of both tetrahedrons is same i.e. the distance of point $D$ from plane of $ABC$ is same as that of point $S$ from plane of $PQR$.

We have to notice that if a plane parallel to the base $ABC$ cuts the tetrahedron $ABCD$ in a triangular cross section $A'B'C'$ then the triangle $A'B'C'$ is similar to triangle $ABC$. Also the ratio of the sides of $A'B'C'$ to those of $ABC$ depends upon the distance of this cutting plane from the plane containing base $ABC$. This implies that the area of triangle $A'B'C'$ bears a constant ratio to that of triangle $ABC$ which depends only on the distance between planes containing $A'B'C'$ and $ABC$. Similar is the case with areas of triangles $P'Q'R'$ and $PQR$ coming from tetrahedron $PQRS$.

Since the height of both the tetrahedrons is same both of them can be brought to lie between two parallel planes (one plane containing their bases $ABC$ and $PQR$ and the other containing vertices $D$ and $S$). If an intermediate plane parallel to the base plane cuts these tetrahedrons in triangular cross sections $A'B'C'$ and $P'Q'R'$ then we have $$\frac{\text{area }A'B'C'}{\text{area }ABC} = \frac{\text{area }P'Q'R'}{\text{area }PQR}$$ and this constant ratio is only dependent on the distance of the cutting plane from the plane containing the bases $ABC$ and $PQR$. Since the areas of bases $ABC$ and $PQR$ are equal it follows that the areas of triangles $A'B'C'$ and $P'Q'R'$ are equal. It now follows from Cavalieri's principle that both the tetrahedrons $ABCD$ and $PQRS$ have the same volume.

We now establish the following formula for the volume of a tetrahedron $$V_{\text{tetra}} = \frac{1}{3}\times A_{\text{tetra}}\times h_{\text{tetra}}$$ where $A$ represents the area of a base of tetrahedron and $h$ represents the height of the tetrahedron relative to the chosen base. If we compare this formula with the formula for volume of a prism, it is obvious that the area of a tetrahedron is one-third that of a prism with the same base area and same height. This suggests that we should be able to able cut a solid prism into three tetrahedrons of equal volume (similar to the way we can cut a parallelogram into two triangles of equal area through the diagonal which leads to formula of area of triangle as half of product of base and height). It turns out that this is possible and simple enough but may not be obvious (because of the difficulty of imagining in 3-D).

In the above figure we have a prism $ABCA'B'C'$ and we show to how to cut it into three tetrahedrons of equal volume. First cut is simple: it is made by a plane passing through the two blue diagonals $A'B$ and $C'B$. The result of this cut is a tetrahedron $A'B'C'B$ and another figure (which is a pyramid $A'C'CAB$ with base $A'C'CA$ and vertex $B$, this is not shown in figure above). Next it is possible to cut the remaining pyramid $A'C'CAB$ by a plane passing through the red diagonal $A'C$ and the triangle $A'BC$. This cut results into two tetrahedrons $ABCA'$ and $A'C'CB$. Thus we have cut the original prism $ABCA'B'C'$ into three tetrahedrons $A'B'C'B$, $ABCA'$ and $A'C'CB$.

What remains to be shown is that the three tetrahedrons obtained by this process are of equal volume. First we can easily see that the tetrahedrons $A'B'C'B$ and $ABCA'$ have bases $A'B'C'$ and $ABC$ of same area and clearly their height is same as it is the distance between the planes containing $ABC$ and $A'B'C'$ in the original prism. Hence both these tetrahedrons have the same volume. Next we can see that the tetrahedrons $ABCA'$ and $A'C'CB$ have bases $A'CA$ and $A'C'C$ which are of equal area as these are obtained by cutting the parallelogram $A'C'CA$ through the diagonal $A'C$. Also the height of these tetrahedrons is same as it is the distance of vertex $B$ from the plane containing $A'C'CA$ in the original prism. It follows that the tetrahedrons $ABCA'$ and $A'C'CB$ have the same volume.

It now follows that the volume of the tetrahedron $A'B'C'B$ is one-third that of the prism $ABCA'B'C'$. Since both of them have bases (namely $A'B'C'$) of same area and same height, it follows that the volume of a tetrahedron is one-third the product of area of its base and its height relative to the chosen base.

For a general pyramid we can cut its polygonal base into a number of triangles via diagonals and joining each of these triangles to the vertex of the pyramid we get a number of tetrahedrons of same height. The volume of the pyramid is now the sum of volumes of these tetrahedrons. Since each of these tetrahedrons is of same height, while adding the volumes we effective add the areas of their bases and multiply by the common height. Thus we obtain the following formula $$V_{\text{pyr}} = \frac{1}{3}\times A_{\text{pyr}}\times h_{\text{pyr}}$$ where $A$ represents the area of base of pyramid and $h$ represents the height of pyramid.

In the case of the cone we can think of the cone as the limiting case of a pyramid whose polygonal base tends to the curved based of the cone as the number of sides of polygonal base approaches infinity (imagine $n$ points on the periphery of the curved base of the cone and join them to make a polygonal base and join this base with vertex of cone to generate a pyramid). Therefore the same formula applies in case of a cone too. Thus we obtain $$V_{\text{cone}} = \frac{1}{3}\times A_{\text{cone}}\times h_{\text{cone}}$$ In the case of a right circular cone of radius $r$ and height $h$ we know that $A_{\text{cone}} = \pi r^{2}$ and we obtain the familiar formula $$V_{\text{right circular cone}} = \frac{1}{3}\pi r^{2}h$$ When I studied this formula in $8$th grade in my school days I found the factor $1/3$ totally mysterious and no explanation was given apart from the experimental verification of pouring liquid from a conical flask into a corresponding cylindrical flask. Later when I studied calculus and "solids of revolution" I got the formula by a simple integration. But somehow the mystery of $1/3$ remained in my thoughts (compare the case of a triangle where the factor $1/2$ in the formula for area of a triangle can be explained both by integral calculus as well as from elementary geometry by dissecting a parallelogram through a diagonal into two equal triangles).

Sometime later (but luckily in the pre-college years of $12$th grade) I got hold of a book on solid geometry which taught solid geometry via the old and golden approach of Euclid axioms (compared to the usual and boring coordinate geometry). In this book I found the first mention of Cavalieri principle and the proofs which I have presented above. Its rather unfortunate that I can't recall the name of that book for the benefit of readers.

Volume of a Sphere

Next we will use the Cavalieri Principle to establish the following formula for volume of sphere $$V_{\text{sphere}} = \frac{4}{3}\pi r^{3}$$ where $r$ is the radius of the sphere. We start with a non-obvious technique. Let's us have a right circular cylinder of radius $r$ and height $2r$ so that its volume is $V_{\text{cyl}} = \pi r^{2}(2r) = 2\pi r^{3}$ and remove from it two cones having same base as the cylinder and height as $r$. The volume of each such cone is $V_{\text{cone}} = (1/3)\pi r^{2} r = \pi r^{3}/3$. The volume of remaining solid is therefore $2\pi r^{3} - 2(\pi r^{3}/3) = (4/3)\pi r^{3}$.

We will show that the volume of a sphere of radius $r$ is equal to the volume of a solid obtained by removing two right circular cones of radius $r$ and height $r$ from a right circular cylinder of radius $r$ and height $2r$. This we do via Cavalieri's principle.

In the above figure a cylinder of radius $r$ and height $2r$ and a sphere of radius $r$ are placed side by side. Also two cones of radius $r$ and height $r$ are removed from the cylinder (one from above and one from below). Clearly both solids are of same height $2r$ and hence are contained between two parallel planes (each passing through the circular base of the cylinder). The center of the cylinder and that of the sphere are at the same height. Let a plane parallel to the base of the cylinder cut both the solids and let this plane be above the center of cylinder at a height $x$. The argument for the case when cutting plane is below the center is same due to symmetry.

Clearly in case of the sphere the cross section is a circle whose area is $\pi y^{2} = \pi(r^{2} - x^{2})$. In the case of solid derived from cylinder by removing two cones, the cross section is not a circle but a ring (as the central circular portion is removed during removal of cone). The external radius of the ring is $r$ and the internal radius is $x$ so that the total area of the ring shaped cross section is $\pi r^{2} - \pi x^{2} = \pi(r^{2} - x^{2})$. Thus the areas of the cross sections of two solids is same. By Cavalieri's principle the two solids have the same volume.

The above proof is straight from $8$th grade NCERT mathematics textbook (obviously the old edition which I studied in $1993$) except for the fact that the NCERT textbook does not mention the name Cavalieri's principle nor does provide an intuitive argument as to why the volume of two such solids having same cross section area and same height should be equal. But I think this is the best a non-extraordinary student of $8$th grade can handle at that tender age. This is one of the reasons why I hold NCERT textbooks in high esteem.

Print/PDF Version

Irrationality of ζ(2) and ζ(3): Part 2

2013-10-10T16:11:00.000+05:30

In the last post we proved that $\zeta(2)$ is irrational. Now we shall prove in a similar manner that $\zeta(3)$ is irrational. Note that this proof is based on Beukers' paper "A Note on the Irrationality of $\zeta(2)$ and $\zeta(3)$."

Irrationality of $\zeta(3)$

Like the case of $\zeta(2)$ we first establish certain formulas concerning some double integrals which are related to $\zeta(3)$. The derivation of these formulas is based on the integral formulas established in last post.

Preliminary Results

Let $r, s$ be non-negative integers with $r > s$. Then we have
\begin{align} \int_{0}^{1}\int_{0}^{1}\frac{-\log xy}{1 - xy}x^{r}y^{r}\,dx\,dy &= 2\sum_{n = 1}^{\infty}\frac{1}{(n + r)^{3}}\tag{1}\\ \int_{0}^{1}\int_{0}^{1}\frac{-\log xy}{1 - xy}x^{r}y^{s}\,dx\,dy &= \frac{1}{r - s}\left\{\frac{1}{(s + 1)^{2}} + \frac{1}{(s + 2)^{2}} + \cdots + \frac{1}{r^{2}}\right\}\tag{2} \end{align} Using equation $(2)$ from the last post we get $$\int_{0}^{1}\int_{0}^{1}\frac{x^{r + \alpha}y^{r + \alpha}}{1 - xy}\,dx\,dy = \sum_{n = 1}^{\infty}\frac{1}{(n + r + \alpha)^{2}}$$ Differentiating the above relation with respect to $\alpha$ we get $$ \int_{0}^{1}\int_{0}^{1}\frac{x^{r + \alpha}y^{r + \alpha}\log xy}{1 - xy}\,dx\,dy = \sum_{n = 1}^{\infty}\frac{-2}{(n + r + \alpha)^{3}}$$
Now putting $\alpha = 0$ the first result is established. This means that $$\int_{0}^{1}\int_{0}^{1}\frac{-\log xy}{1 - xy}\,dx\,dy = 2\zeta(3)$$ and if $r$ is a positive integer then $$\int_{0}^{1}\int_{0}^{1}\frac{-\log xy}{1 - xy}x^{r}y^{r}\,dx\,dy = 2\left\{\zeta(3) - \left(\frac{1}{1^{3}} + \frac{1}{2^{3}} + \cdots + \frac{1}{r^{3}}\right)\right\}$$
Next from equation $(3)$ of last post we have $$\int_{0}^{1}\int_{0}^{1}\frac{x^{r + \alpha}y^{s + \alpha}}{1 - xy}\,dx\,dy = \frac{1}{r - s}\left\{\frac{1}{s + \alpha + 1} + \frac{1}{s + \alpha + 2} + \cdots + \frac{1}{r + \alpha}\right\}$$ Differentiating the above relation with respect to $\alpha$ we get \begin{align} &\int_{0}^{1}\int_{0}^{1}\frac{x^{r + \alpha}y^{s + \alpha}\log xy}{1 - xy}\,dx\,dy\notag\\ &\,\,\,\,\,\,\,\,= \frac{1}{r - s}\left\{\frac{-1}{(s + \alpha + 1)^{2}} + \frac{-1}{(s + \alpha + 2)^{2}} + \cdots + \frac{-1}{(r + \alpha)^{2}}\right\}\notag \end{align} Putting $\alpha = 0$ in the above equation we obtain equation $(2)$.

From the above results it is now clear that if $P(x), Q(x)$ are polynomials of degree $n$ with integer coefficients then $$\int_{0}^{1}\int_{0}^{1}\frac{-\log xy}{1 - xy}P(x)Q(y)\,dx\,dy = \frac{a\zeta(3) + b}{d_{n}^{3}}$$ where $a, b$ are some integers dependent on polynomials $P(x), Q(x)$ and $d_{n}$ denotes the LCM of numbers $1, 2, \ldots, n$ (aslo for completeness we can assume $d_{0} = 1$).

Strategy of the Proof

Now we choose a specific polynomial $P_{n}(x)$ defined by $$P_{n}(x) = \frac{1}{n!}\frac{d^{n}}{dx^{n}}\{x^{n}(1 - x)^{n}\}$$ Since $P_{n}(x)$ is a polynomial of degree $n$ with integer coefficients it follows that the integral defined by $$I_{n} = \int_{0}^{1}\int_{0}^{1}\frac{-\log xy}{1 - xy}P_{n}(x)P_{n}(y)\,dx\,dy$$ can be expressed in the form $$I_{n} = \frac{a_{n}\zeta(3) + b_{n}}{d_{n}^{3}}$$ where $a_{n}, b_{n}$ are integers dependent on $n$.

We will establish that

$I_{n} \neq 0$ for all positive integers $n$.
$d_{n}^{3}I_{n} \to 0$ as $n \to \infty$.

This will imply that the expression $a_{n}\zeta(3) + b_{n} \to 0$ as $n \to \infty$ and is never zero for any value of $n$. If $\zeta(3)$ were rational, say $p/q$, then we would have $|a_{n}\zeta(3) + b_{n}|\geq 1/q$ and hence $a_{n}\zeta(3) + b_{n}$ would not tend to zero. This contradiction proves that $\zeta(3)$ is irrational.

Estimation of $I_{n}$

Now we come to the proof of the two claims mentioned above which are vital to obtain a contradiction needed to prove the irrationality of $\zeta(3)$.

First we need to observe that \begin{align} \int_{0}^{1}\frac{dz}{1 - az} &= \left[\frac{-1}{a}\log(1 - az)\right]_{z = 0}^{z = 1}\notag\\ &= -\frac{\log(1 - a)}{a}\notag \end{align} hence on putting $a = 1 - xy$ we get $$\frac{-\log xy}{1 - xy} = \int_{0}^{1}\frac{dz}{1 - (1 - xy)z}$$ Using the above equation we can write the integral $I_{n}$ as a triple integral $$I_{n} = \int_{0}^{1}\int_{0}^{1}\int_{0}^{1}\frac{P_{n}(x)P_{n}(y)}{1 - (1 - xy)z}\,dx\,dy\,dz\tag{3}$$ Using integration by parts $n$ times with respect to $x$ we get $$I_{n} = \int_{0}^{1}\int_{0}^{1}\int_{0}^{1}\frac{(xyz)^{n}(1 - x)^{n}P_{n}(y)}{\{1 - (1 - xy)z\}^{n + 1}}\,dx\,dy\,dz$$ Following Beukers, we apply the substitution $$w = \frac{1 - z}{1 - (1 - xy)z}$$ so that $$z = \frac{1 - w}{1 - (1 - xy)w},\, 1 - z = \frac{xyw}{1 - (1 - xy)w}$$ Hence $$dz = \frac{-xy}{\{1 - (1 - xy)w\}^{2}}dw$$ and \begin{align} \frac{z^{n}}{\{1 - (1 - xy)z\}^{n + 1}} &= \frac{(1 - w)^{n}}{\{1 - (1 - xy)w\}^{n}}\frac{w^{n + 1}}{(1 - z)^{n + 1}}\notag\\ &= \frac{(1 - w)^{n}(1 - (1 - xy)w)}{(xy)^{n + 1}}\notag \end{align} Also note that as $z$ moves from $0$ to $1$, $w$ moves from $1$ to $0$.

After substituting these expressions we get $$I_{n} = \int_{0}^{1}\int_{0}^{1}\int_{0}^{1}\frac{(1 - x)^{n}(1 - w)^{n}P_{n}(y)}{1 - (1 - xy)w}\,dx\,dy\,dw$$ Using integration by parts $n$ times with respect to $y$ we get $$I_{n} = \int_{0}^{1}\int_{0}^{1}\int_{0}^{1}\frac{x^{n}(1 - x)^{n}y^{n}(1 - y)^{n}w^{n}(1 - w)^{n}}{\{1 - (1 - xy)w\}^{n + 1}}\,dx\,dy\,dw\tag{4}$$ and from this expression it is clear that $I_{n} > 0$ as the integrand is positive for all $x, y, w \in (0, 1)$.

We next need to find an estimate for the function $f(x, y, w)$ defined by $$f(x, y, w) = \frac{x(1 - x)y(1 - y)w(1 - w)}{1 - (1 - xy)w}$$ for $x, y, w \in (0, 1)$. Finding the maximum value using first and second partial derivatives seems a bit complicated hence it is better to go for a simpler approach based on inequalities.

The denominator of $f(x, y, w)$ is $1 - w + xyw$ and clearly we have $$1 - w + xyw \geq 2\sqrt{(1 - w)xyw}$$ and hence we have $$f(x, y, w) \leq \frac{1}{2}\sqrt{x}(1 - x)\sqrt{y}(1 - y)\sqrt{w(1 - w)}$$ If we put $x = t^{2}$ then $\sqrt{x}(1 - x) = t(1 - t^{2}) = t - t^{3}$ which is maximum when $t = 1/\sqrt{3}$ and the maximum value is $2/(3\sqrt{3})$. Similar is the case for $\sqrt{y}(1 - y)$. The maximum value of $\sqrt{w(1 - w)}$ is clearly $(w + 1 - w)/2 = 1/2$. Hence we have $$f(x, y, w) \leq \frac{1}{2}\frac{2}{3\sqrt{3}}\frac{2}{3\sqrt{3}}\frac{1}{2} = \frac{1}{27}$$ Therefore by equation $(4)$ we get \begin{align} I_{n} &= \int_{0}^{1}\int_{0}^{1}\int_{0}^{1}\frac{\{f(x, y, w)\}^{n}}{1 - (1 - xy)w}\,dx\,dy\,dw\notag\\ &\leq \left(\frac{1}{27}\right)^{n} \int_{0}^{1}\int_{0}^{1}\int_{0}^{1}\frac{1}{1 - (1 - xy)w}\,dx\,dy\,dw\notag\\ &= \left(\frac{1}{27}\right)^{n} \int_{0}^{1}\int_{0}^{1}\frac{-\log xy}{1 - xy}\,dx\,dy\notag\\ &= 2\zeta(3)\left(\frac{1}{27}\right)^{n}\notag \end{align} From the last post we know that if $K > e$ is a fixed number then $d_{n} < K^{n}$ for all sufficiently large values of $n$. Hence it follows that $$d_{n}^{3}I_{n} < 2\zeta(3)\left(\frac{K^{3}}{27}\right)^{n}\tag{5}$$ for all sufficiently large values of $n$. If we choose $K$ such that $e < K < 3$ then we can see that the right hand side of equation $(5)$ above tends to zero as $n \to \infty$. Therefore $d_{n}^{3}I_{n} \to 0$ as $n \to \infty$. We have thus completed the proof of irrationality of $\zeta(3)$.

Print/PDF Version