The Mysterious Rank (of a Matrix): Elementary Row Operations

Be the first to leave a comment!
In the previous post we formed notions of matrix, determinants, rank of a matrix and system of linear equations. We also mentioned the fundamental theorem on solution of system of linear equations which tells us the importance of rank in deciding the nature of solution of such systems of equations. It is now time to establish the theorem and to do that we start by defining a systematic method of solving such systems of linear equations.

Elementary Row Operations on a Matrix

Let's start with the simple system of equations: \begin{align} x + y &= 8\notag\\ x - y &= 2\notag \end{align} If we add these equations we get $ 2x = 10$ so that $ y$ is eliminated and we get $ x = 5$. Putting this again in first equation we see that $ y = 8 - x = 3$. The basic idea is to eliminate one of variables (thereby reducing complexity of the system) by adding or subtracting the given equations.

Sometimes we might need to multiply the equation by suitable number before adding or subtracting. For example consider: \begin{align} x + 2y &= 7\notag\\ 2x - y &= 4\notag \end{align} Here we need to multiply the first equation by $ 2$ and then subtract the resulting equation from the second equation to get $ -5y = -10$ so that $ y = 2$ and then $ x = 3$. From these simple examples it is clear that the solution of such systems of equations consists of two parts:
  • Elimination of unknowns one by one to finally get only one variable
  • Back substitution to get value of variables one after another
The first step involves multiplying equations by some numbers and adding / subtracting them. While doing so we understand that we multiply by a non-zero number (otherwise the whole equation vanishes). In effect we are trying to form a linear combination of the given equations in system. More formally if the equations of the given system are: \begin{align} a_{11}x_{1} + a_{12}x_{2} + \cdots + a_{1n}x_{n} &= b_{1}\notag\\ a_{21}x_{1} + a_{22}x_{2} + \cdots + a_{2n}x_{n} &= b_{2}\notag\\ \cdots\cdots\cdots\cdots &= \cdots \notag\\ a_{m1}x_{1} + a_{m2}x_{2} + \cdots + a_{mn}x_{n} &= b_{m}\notag \end{align} then a linear combination of the above equations is of the form: \begin{align} &c_{1}(a_{11}x_{1} + a_{12}x_{2} + \cdots + a_{1n}x_{n}) +\notag\\ &c_{2}(a_{21}x_{1} + a_{22}x_{2} + \cdots + a_{2n}x_{n}) +\notag\\ &\cdots +\notag\\ &c_{m}(a_{m1}x_{1} + a_{m2}x_{2} + \cdots + a_{mn}x_{n})\notag\\ &= c_{1}b_{1} + c_{2}b_{2} + \cdots + c_{m}b_{m}\notag \end{align} or (to make things simpler): \begin{align} &(c_{1}a_{11} + c_{2}a_{21} + \cdots + c_{m}a_{m1})x_{1} +\notag\\ &(c_{1}a_{12} + c_{2}a_{22} + \cdots + c_{m}a_{m2})x_{2} +\notag\\ &\cdots +\notag\\ &(c_{1}a_{1n} + c_{2}a_{2n} + \cdots + c_{m}a_{mn})x_{n}\notag\\ &= c_{1}b_{1} + c_{2}b_{2} + \cdots + c_{m}b_{m}\notag \end{align} where $ c_{1}, c_{2},\ldots, c_{m}$ are not all zero.

Clearly any solution of the original system also satisfies this new equation, but there may be other values of the variables which satisfy this new equation. We can make many more linear combinations to get newer equations in this fashion to get a new system of linear equations. And then any solution of the original system is also a solution of the newer system, but the newer system might have some solutions which are not the solution to the original system. However if we are also able to express the equations of the original system as a linear combination of the equations of newer system, then it follows that both the systems of linear equations have the same solution.

If two systems of linear equations have the same set of solutions we say that they are equivalent. What we have shown above is that:
If two systems of linear equations are such that any equation of one system is a linear combination of the equations of the other system and vice-versa, then both the systems are equivalent.

The idea of solving these systems of equations is then to find simpler but equivalent systems by making linear combination of existing equations. Moreover the goal is to eliminate variables one by one which means that the linear combinations be so chosen that the coefficients of some variables in the resulting equations turn out to be zero. We further note that while making linear combination of equations we are just manipulating the coefficient of the variables and hence it might be better to drop the variables altogether and work with the coefficient/augmented matrix of the system. The process of linear combination of the equations thus leads to a corresponding operation on the rows of the augmented matrix.

It turns out that we need to have only those kind of operations on the rows of the matrix which lead to equivalent systems. Thus we should be able to get back the original matrix by applying suitable operation. With this requirement we define the elementary row operations on a matrix.

Elementary Row Operations

Let $ A$ be a matrix and let $ R_{i}$ denote its $ i^{\text{th}}$ row. Then we define three types of elementary row operations on $ A$:
  1. Multiplying a row by some non-zero number: $ cR_{i} \rightarrow R_{i}$
  2. Exchange of two rows: $ R_{i} \leftrightarrow R_{j}$
  3. Adding a multiple of row with another row: $ R_{i} + cR_{j} \rightarrow R_{i}$
We note that these operations can be reverted to get back the old system by using similar type of row operation. For example we can revert $ R_{i} + cR_{j} \rightarrow R_{i}$ with $ R_{i} - cR_{j} \rightarrow R_{i}$. Hence these operations will always lead to an equivalent system of equations. If $ e$ is any elementary row operation on a matrix $ A$ then $ e(A)$ denotes the new matrix obtained by applying operation $ e$ on $ A$.

We can now apply these row operations on the augmented matrix $ \tilde{A}$ of the system of equations and try to simplify it in a form which resembles the process of elimination of variables. Thus we would want to reduce the augmented matrix into a form which contains many zero entries. A systematic description of such a form of matrix is given below:

Row Reduced Echelon Form: A matrix is said to be the in the row-reduced echelon form if it satisfies the following properties:
  1. any row which consists of all zero entries can not lie above a row which has some non-zero entry, thus all the zero rows should lie below the non-zero rows
  2. in a non-zero row the first non-zero entry must be $ 1$
  3. if the first non-zero entry of a row lies in a column $ j$, then all the other entries of this column must be $ 0$
  4. if the first non-zero entries of all the non-zero rows lie in columns $ k_{1}, k_{2}, \ldots, k_{r}$, then $ k_{1} < k_{2} < \cdots < k_{r}$.
To explain these properties it is better to exhibit an example. The following matrix
$$ A = \begin{bmatrix} 1 & 2 & 0 & 0 & 3 \\ 0 & 0 & 1 & 0 & 2 \\ 0 & 0 & 0 & 1 & 2 \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix}$$ is in row-reduced echelon form because:
  1. the zero rows are below the non-zero rows
  2. first non-zero entry in first, second and third row is $ 1$
  3. whichever column this first non-zero entry is, the other entries of the column are zero (check columns 1, 3, 4)
  4. also we can see that 1 < 3 < 4 (i.e. the first non-zero entry of a row must lie after the first non-zero entry of the previous row)
To understand the importance of row-reduced echelon form we first prove that any matrix can be reduced to row-reduced echelon form by applying a series of elementary row operations. If there are no non-zero rows in the matrix then it is already in the desired form. If there is at least one non-zero row then we can put all the zero rows below the non-zero rows by using a finite number of row exchange operations. Next we can start with the first row and if the first non-zero entry is $ c$ then we can apply $ R_{1}/c \rightarrow R_{1}$ to make this first non-zero entry $ 1$. Then the entries in this column (which contains this first non-zero entry) can be made $ 0$ by a suitable operation of type $ R_{j} + cR_{1} \rightarrow R_{j}$. The job is now done as far first row is concerned. We can next proceed to do the same for the second row. While doing so we need to observe that none of the operations required will change the first row (which is already of desired form) and will not modify the column which contains the first non-zero entry of first row. Thus the job for second row is done. Similarly we can continue for all the other non-zero rows. After all the non-zero rows are done with, we will obtain a matrix which is almost in row-reduced echelon form except for the fact that the 4th condition for echelon form may not be satisfied. It is now a simple matter of row exchanges to make the matrix satisfy the the last condition for echelon form. The reader should try to apply the technique to reduce an actual matrix in the row-reduced echelon form (we avoid showing a non-trivial example to save space).

We now have the machinery to formulate a new definition of rank of a matrix and thereby put this concept in proper context. To that end we need to observe that if $ A$ is a square matrix and $ e$ is an elementary row operation then $ \text{det}(A) = 0 \Rightarrow \text{det}(e(A)) = 0$ and $ \text{det}(A) \neq 0 \Rightarrow \text{det}(e(A)) \neq 0$. Thus the rank (as defined using determinants of square submatrices) of any matrix does not change when we apply an elementary row operation on it. Hence the rank of a matrix is same as the rank of the corresponding matrix in the row-reduced echelon form. Clearly the rank of a matrix in row-reduced echelon form is the number of non-zero rows in it (because we can choose these rows and the columns which contain first non-zero entries of these rows to give a non-zero determinant, and choosing any more rows will always include a full row of zeroes and thereby lead to a zero determinant). We thus have the following result:

The rank of a matrix is the number of non-zero rows in the corresponding row-reduced echelon form.

Since the rank is a unique number, it follows that the number of non-zero rows in row-reduced echelon form is also unique and depends only on the original matrix and not on the sequence of elementary row operations used for reducing it to echelon form. In fact it is also true that the row-reduced echelon form itself is unique for a given matrix and does not depend upon the sequence of elementary row operations used to reduce the given matrix in echelon form (but we don't prove this fact here). The above result can also be taken as an alternative and proper definition of the rank of a matrix (in fact this is also a common approach in undergraduate texts).

We are now ready to prove the fundamental theorem of solution of system of linear equations and thereby show the importance of the concept of rank in these matters. However, this post has already grown a bit large in size, and we need to postpone these developments for the next and final post in this series.

Print/PDF Version

0 comments :: The Mysterious Rank (of a Matrix): Elementary Row Operations

Post a Comment