The extended echelon form of a matrix

Section 1.1 The extended echelon form of a matrix

It is a basic algorithm in linear algebra to find the inverse of a matrix \(A\) by row reducing the augmented matrix \([A \mid I]\text{.}\) However, let us investigate what happens when we do this when \(A\) is singular, or even non-square.

Example 1.1.1.

Let's start with the \(2 \times 2\) case with the matrix

\begin{equation*} \begin{bmatrix} 1 \amp -2 \\ 3 \amp -6 \end{bmatrix} \end{equation*}

If we augment \(A\) by the identity matrix and row reduce, we obtain

\begin{equation*} \rref [A \mid I] = \left[\begin{array}{cc|cc} 1 \amp -2 \amp 0 \amp 1/3 \\ 0 \amp 0 \amp 1 \amp -1/3 \end{array} \right] \end{equation*}

On the left, we get the row echelon form as expected. On the right, we get the matrix

\begin{equation*} A^{-} = \begin{bmatrix} 0 \amp 1/3 \\ 1 \amp -1/3 \end{bmatrix} \end{equation*}

for which \(A^{-}A\) is in reduced row echelon form.

In the Sage cell, there is also a vertical subdivision separating the row of zeros in \(\rref(A)\text{.}\) More on that later.

Definition 1.1.2.

Let us call the matrix \(A^{-}\) for which \(A^{-} A = \rref(A)\text{,}\) a left quasi-inverse. These are not unique if \(A\) is singular as the next example illustrates.

Example 1.1.3. Symbolic row reduction.

Suppose we take our matrix \(A\) as before, but this time we augment with the vector \(v = (x, y) \in \Q[x,y]\text{.}\) In this case, we get

\begin{equation*} \rref [A \mid v] = \left[\begin{array}{cc|c} 1 \amp -2 \amp x \\ 0 \amp 0 \amp -3x + y \end{array} \right] \end{equation*}

The coefficient matrix for the right hand side,

\begin{equation*} \begin{bmatrix} 1 \amp 0 \\ -3 \amp 1 \end{bmatrix} \end{equation*}

is again a left quasi-inverse. It differs from our previous \(A^{-}\) by a couple row operations that do not affect the identity \(A^-A = \rref A\text{.}\)

We know that a vector \(y = (y_1, y_2, \dots, y_m)\) is in the column space of \(A\) if and only if whenever the left side of \(\rref [A \mid y]\) is zero, the right side is also zero. If we now treat \(y_1, \dots, y_m\) as indeterminates, this gives us a list of equations that must be satisfied in order for \(y \in \col(A)\text{.}\) Let us now summarize.

Theorem 1.1.4. Equations for \(\col(A)\).

Let \(A\) be a \(m \times n\) matrix. Suppose that \(\rref [A \mid I_m]\) has the form

\begin{equation*} \left[ \begin{array}{c|c} A_{0,0} \amp A_{0,1} \\ \hline 0_{r \times n} \amp A_{1,1} \end{array} \right] \end{equation*}

where \(A_{0,0}\) is full-rank (has no row of zeros). Then \(\col(A) = \ker(A_{1,1})\text{.}\)

Proof.

Let \(A^{-} = \begin{bmatrix} A_{0,1} \\ A_{1,1} \end{bmatrix}\) be the left quasi-inverse associated with this echelon form.

As we saw in Example 1.1.3, if \(y = (y_1,\dots,y_m)\) is a vector in \(\R^m\) then

\begin{equation*} \rref [A \mid y] = A^{-} [A \mid y] = \left[ \begin{array}{c|c} A_{0,0} \amp A_{0,1}y \\ \hline 0_{r \times n} \amp A_{1,1}y \end{array} \right]. \end{equation*}

Since \(A_{0,0}\) is full-rank, this linear system is consistent if and only if \(A_{1,1}y = 0\text{.}\)

Recall that for any matrix \(X\text{,}\) the spaces \(\ker(X)\) and \(\row(X)\) are orthogonal complements. We will use this in the next theorem.

Theorem 1.1.5. Extended echelon form and the four fundamental subspaces.

Let \(A\) be a \(m \times n\) matrix. Write

\begin{equation*} \rref [A \mid I_m] = \left[ \begin{array}{c|c} A_{0,0} \amp A_{0,1} \\ \hline 0_{r \times n} \amp A_{1,1} \end{array} \right] \text{ and } \rref [A^\top \mid I_n] = \left[ \begin{array}{c|c} B_{0,0} \amp B_{0,1} \\ \hline 0_{s \times m} \amp B_{1,1} \end{array} \right] \end{equation*}

where \(A_{0,0}\) is full-rank (has no row of zeros).

Then:

\(\displaystyle \ker A = \row B_{1,1} = \ker A_{0,0}\)
\(\displaystyle \col A = \row B_{0,0} = \ker A_{1,1}\)
\(\displaystyle \ker A^\top = \row A_{1,1} = \ker B_{0,0}\)
\(\displaystyle \row A = \row A_{0,0} = \ker B_{1,1}\)

Proof.

It is a basic property of \(A_{0,0}\) that \(\row A_{0,0} = \row A\) and \(\ker A_{0,0} = \ker A\text{.}\) Likewise, \(\row B_{0,0} = \col A\) and \(\ker B_{0,0} = \ker A^\top\text{.}\)

By Theorem 1.1.4, we also have \(\ker A_{1,1} = \col A\) and \(\ker B_{1,1} = \row A\text{.}\)

Lastly, by orthogonality,

\begin{equation*} \row B_{1,1} \perp \ker B_{1,1} = \row A \perp \ker A \end{equation*}

So \(\row B_{1,1} = \ker A\) and similarly, \(\row A_{1,1} = \ker A^\top\text{.}\)

This theorem gives us two ways to write each of the four fundamental subspaces of \(A\text{.}\) First, we can write each subspace as the row space (span) of a matrix in echelon form. Second, we can write each subspace as the kernel (null space) of a matrix in echelon form.

Example 1.1.6.

Consider the matrix

\begin{equation*} A = \begin{bmatrix} 1 \amp -1 \amp 1 \amp -2 \\ 3 \amp 1 \amp 7 \amp 2 \\ 2 \amp 0 \amp 4 \amp 0 \end{bmatrix}. \end{equation*}

We have

\begin{equation*} \rref [A \mid I_3] = \left[\begin{array}{rrrr|rrr} 1 \amp 0 \amp 2 \amp 0 \amp 0 \amp 0 \amp \frac{1}{2} \\ 0 \amp 1 \amp 1 \amp 2 \amp 0 \amp 1 \amp -\frac{3}{2} \\ \hline 0 \amp 0 \amp 0 \amp 0 \amp 1 \amp 1 \amp -2 \end{array}\right] \end{equation*}

and

\begin{equation*} \rref[A^\top | I_4] = \left[\begin{array}{rrr|rrrr} 1 \amp 0 \amp \frac{1}{2} \amp 0 \amp 0 \amp \frac{1}{8} \amp -\frac{7}{16} \\ 0 \amp 1 \amp \frac{1}{2} \amp 0 \amp 0 \amp \frac{1}{8} \amp \frac{1}{16} \\ \hline 0 \amp 0 \amp 0 \amp 1 \amp 0 \amp -\frac{1}{2} \amp \frac{1}{4} \\ 0 \amp 0 \amp 0 \amp 0 \amp 1 \amp 0 \amp -\frac{1}{2} \end{array}\right] \end{equation*}

Using Theorem 1.1.5, we have

\begin{equation*} \col A = \spn\left\{ \begin{bmatrix} 1 \\ 0 \\ 1/2 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 1/2 \end{bmatrix}, \right\} = \ker \begin{bmatrix} 1 \amp 1 \amp -2 \end{bmatrix} \end{equation*}

and

\begin{equation*} \ker A = \spn\left\{ \begin{bmatrix} 1 \\ 0 \\ -1/2 \\ 1/4 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 0 \\ -1/2 \end{bmatrix} \right\} = \ker \begin{bmatrix} 1 \amp 0 \amp 2 \amp 0 \\ 0 \amp 1 \amp 1 \amp 2 \end{bmatrix} \end{equation*}

A mnemonic of sorts is that \(\rref[A \mid I]\) gives parametric representations of \(\row A\) and \(\ker A^\top = (\col A)^\perp\) and implicit equations for \(\ker A\) and \(\col A\text{.}\)

Of course, one only needs to remember half of that in order to piece together the other half. From that, one also can piece together the same thing for \(\rref [A^\top \mid I]\text{.}\) So one only needs to remember two things in order to be able to recall all eight statements from Theorem 1.1.5.