4.3 A Primer on Matrices

"If you had done something twice, you are likely to do it again."
- Brian Kernighan and Bob Pike (The Unix Programming Environment, p. 97)

4.3.1 Matrices

A matrix is a rectangular array of elements arranged in rows and columns. An example of a matrix is: $$ \begin{align*} \left[\begin{array}{ccc} 8.3 & 70 & 10.3\\ 8.6 & 65 & 10.3\\ 8.8 & 63 & 10.2\\ 10.5 & 72 & 16.4\\ \end{array}\right] \end{align*} $$ This matrix represents some of the data from the trees dataset. The values in the first column represents Girth, the second column represents Height, and the third column represents Volume.

Each row corresponds to a tree. The first row represents the values for the first tree. It has 8.3 for Girth, 70 for Height, and 10.3 for Volume.

So this matrix gives the values of three variables for four trees.

Notation

Each value of the matrix is called an element of that matrix. We denote the elements as $a_{ij}$ for the element in the $i$th row and the $j$th column. Note that the first subscript identifies the row number and the second the column number.

So for the matrix above, the elements can be denotes as $$ \begin{align*} \left[\begin{array}{ccc} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33}\\ a_{41} & a_{42} & a_{43} \end{array}\right] \end{align*} $$ A matrix may be denoted by a symbol such as $\bf{A}$, $\bf{X}$, or $\bf{Z}$. The matrix could also be a greek symbol such as $\bf{\Omega}$. The symbol is in boldface to identify that it refers to a matrix.

Thus, we might define for the above matrix; $$ \begin{align*} \bf{A} =\left[\begin{array}{ccc} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33}\\ a_{41} & a_{42} & a_{43} \end{array}\right] \end{align*} $$ Another notation we could use is: $$ \textbf{A}=\left[a_{ij}\right]\qquad i=1,\ldots,4; j=1,2,3 $$ This notation avoids the need for writing out all elements of the matrix by stating only the general element.

Sometimes we will specify the matrix with the dimension below the matrix symbol. For example, a $r$ x $c$ matrix can be expressed as \begin{align*} \underset{r\times c}{{\bf A}}=\left[a_{ij}\right]\qquad i=1,\ldots,r; j=1,\ldots,c \end{align*}

Matrix Dimensions

The dimension of the matrix above is 4 x 3, since there are four rows and three columns.

Recall that the trees dataset has 31 observations. So a matrix representing the full dataset would be 31 x 3.

Note that in giving the dimension of a matrix, we always specify the number of rows first and then the number of columns.

So a $r$ x $c$ matrix can be expressed as \begin{align*} \underset{r\times c}{{\bf A}} & =\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1c}\\ a_{21} & a_{22} & \cdots & a_{2c}\\ \vdots & \vdots & \ddots & \vdots\\ a_{r1} & a_{r2} & \cdots & a_{rc} \end{array}\right] \end{align*} or in the compact form \begin{align*} \underset{r\times c}{{\bf A}} & =\left[a_{ij}\right]\qquad i=1,\ldots,r;j=1,\ldots,c \end{align*} Again, the dimensions may or may not be given under the matrix symbol.

Square Matrices

A matrix is said to be square if the number of rows equals the number of columns. For example, the matrices $$ \begin{align*} \left[\begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array}\right] \end{align*} $$ and $$ \begin{align*} \left[\begin{array}{ccc} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33}\\ \end{array}\right] \end{align*} $$ are both square matrices.

Vectors

A matrix containing only one column is called a column vector or simply a vector. Two examples are: $$ \begin{align*} \textbf{A}=\left[\begin{array}{c} 1\\ 20\\ 7 \end{array}\right] & \qquad\textbf{B}=\left[\begin{array}{c} b_{1}\\ b_{2}\\ b_{3}\\ b_{4}\\ b_{5} \end{array}\right] \end{align*} $$ Note that the elements only have one subscript in $\bf{B}$ since there is only one column. The subscript indicates only the row.

A matrix containing only one row is called a row vector.

Two examples are: $$ \begin{align*} \textbf{B}^{\prime}=\left[\begin{array}{ccc} 15 & 25 & 50\end{array}\right] & \qquad\boldsymbol{\delta}^{\prime}=\left[\begin{array}{cc} \delta_{1} & \delta_{2}\end{array}\right] \end{align*} $$ We use the prime (${}^\prime$) symbol for row vectors for reasons to be seen next.

Transpose

The transpose of a matrix $\bf{A}$ is another matrix, denoted by $\textbf{A}^{\prime}$, that is obtained by interchanging corresponding columns and rows of the matrix $\bf{A}$.

For example, if: $$ \begin{align*} \underset{3\times2}{\textbf{A}}=\left[\begin{array}{cc} 1 & 7\\ 12 & 4\\ 5 & 9 \end{array}\right] \end{align*} $$ then the transpose $\bf{A}^\prime$ is: $$ \begin{align*} \underset{2\times3}{\textbf{A}^{\prime}}=\left[\begin{array}{ccc} 1 & 12 & 5\\ 7 & 4 & 9 \end{array}\right] \end{align*} $$ Note that the first column of $\bf{A}$ is the first row of $\bf{A}^\prime$, and similarly the second column of $\bf{A}$ is the second row of $\bf{A}^\prime$.

Note that the dimension of $\bf{A}$ becomes reversed for the dimension of $\bf{A}^\prime$.

S Note that the transpose of a column vector is a row vector, and vice versa.

This is the reason why we used the symbol $\bf{B}^\prime$ earlier to identify a row vector, since it may be thought of as the transpose of a column vector $\bf{B}$.

Symmetric Matrices

A matrix is said to be symmetric if $\bf{A}=\bf{A}^\prime$. A symmetric matrix $\bf{A}$ has elements $a_{ij}=a_{ji}$. Clearly, a symmetric matrix must be a square matrix.

Diagonal Matrices

A square matrix is said to be diagonal if all of the off-diagonal elements are zero.

For example \begin{align*} {\bf A} & =\left[\begin{array}{cccc} a_{11} & 0 & 0 & 0\\ 0 & a_{22} & 0 & 0\\ 0 & 0 & a_{33} & 0\\ 0 & 0 & 0 & a_{44} \end{array}\right] \end{align*} is a diagonal matrix.

Indentity Matrix

The identity matrix is a diagonal matrix with ones for all the diagonal elements. The identity matrix is denoted with with $\bf{I}$.

For Example \begin{align*} {\bf I} & =\left[\begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \end{align*} is a 4 x 4 identity matrix.

Matrices and Vectors of Ones and Zeros

A matrix of with ones for all the elements is denoted as \begin{align*} {\bf J} & =\left[\begin{array}{cccc} 1 & 1 & \cdots & 1\\ 1 & 1 & \cdots & 1\\ \vdots & \vdots & \ddots & \vdots\\ 1 & 1 & \cdots & 1 \end{array}\right] \end{align*} A vector with ones for all the elements is denoted as \begin{align*} {\bf 1} & =\left[\begin{array}{c} 1\\ 1\\ \vdots\\ 1 \end{array}\right] \end{align*} Likewise a vector of zeros is denoted as \begin{align*} {\bf 0} & =\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0 \end{array}\right] \end{align*}

4.3.2 Matrix Addition and Subtraction

Adding or subtracting two matrices requires that they have the same dimension.

The sum, or difference, of two matrices is another matrix whose elements each consist of the sum, or difference, of the corresponding elements of the two matrices.

Suppose: $$ \begin{align*} \underset{3\times2}{\textbf{A}}=\left[\begin{array}{cc} 1 & 4\\ 2 & 5\\ 3 & 6 \end{array}\right] & \qquad\underset{3\times2}{\textbf{B}}=\left[\begin{array}{cc} 1 & 2\\ 2 & 3\\ 3 & 4 \end{array}\right] \end{align*} $$ then: $$ \begin{align*} \underset{3\times2}{\textbf{A}+\textbf{B}=} & \left[\begin{array}{cc} 1+1 & 4+2\\ 2+2 & 5+3\\ 3+3 & 6+4 \end{array}\right]=\left[\begin{array}{cc} 2 & 6\\ 4 & 8\\ 6 & 10 \end{array}\right] \end{align*} $$ Similarly: $$ \begin{align*} \underset{3\times2}{\textbf{A}-\textbf{B}=} & \left[\begin{array}{cc} 1-1 & 4-2\\ 2-2 & 5-3\\ 3-3 & 6-4 \end{array}\right]=\left[\begin{array}{cc} 0 & 2\\ 0 & 2\\ 0 & 2 \end{array}\right] \end{align*} $$

4.3.3 Matrix Multiplication

The addition and subtraction rules discussed above are fairly straight forward and similar to addition and subtraction of (non-matrix) numbers.

Multiplication of matrices are not as straight forward as multiplication of (non-matrix) numbers.

Multiplication of a Matrix by a Scalar

A scalar is an ordinary number or a symbol representing a number.

In multiplication of a matrix by a scalar, every element of the matrix is multiplied by the scalar.

For example, suppose the matrix $\textbf{A}$ is given by: $$ \begin{align*} \textbf{A}=\left[\begin{array}{cc} 1 & 3\\ 5 & 7 \end{array}\right] \end{align*} $$ Then $2\textbf{A}$, where 2 is the scalar, equals: $$ \begin{align*} 2\textbf{A}=2\left[\begin{array}{cc} 1 & 3\\ 5 & 7 \end{array}\right] & =\left[\begin{array}{cc} 2 & 6\\ 10 & 14 \end{array}\right] \end{align*} $$

Multiplication of a Matrix by a Matrix

Consider the two matrices: $$ \begin{align*} \underset{2\times2}{\textbf{A}}=\left[\begin{array}{cc} 1 & 2\\ 3 & 4 \end{array}\right] & \qquad\underset{2\times2}{\textbf{B}}=\left[\begin{array}{cc} 5 & 6\\ 7 & 8 \end{array}\right] \end{align*} $$ Multiplying $\bf{A}$ by $\bf{B}$ is found by a multiplying the elements of each row vector by the elements of each each column vector and then summing the products.

For example, to find the element in the first row and the first column of the product $\textbf{AB}$, we work with the first row of $\textbf{A}$ and the first column of $\textbf{B}$: $$ \begin{align*} \begin{array}{cc} & \textbf{A}\\ & \left[\begin{array}{cc} {\color{red}1} & {\color{red}2}\\ 3 & 4 \end{array}\right]\\ \\ \end{array}\begin{array}{c} \textbf{B}\\ \left[\begin{array}{cc} {\color{red}5} & 6\\ {\color{red}7} & 8 \end{array}\right]\\ \begin{array}{cc} & \end{array} \end{array} & =\begin{array}{cc} & \textbf{AB}\\ & \left[\begin{array}{cc} \color{red}{\left(1\right)\left(5\right)+\left(2\right)\left(7\right)} &\\ \\ \end{array}\right]\\ \\ \end{array}\\ & = \begin{array}{cc} & \textbf{AB}\\ & \left[\begin{array}{cc} \color{red}{19} &\\ \\ \end{array}\right]\\ \\ \end{array} \end{align*} $$ To find the element in the first row and second column of $\textbf{AB}$: $$ \begin{align*} \begin{array}{cc} & \textbf{A}\\ & \left[\begin{array}{cc} {\color{red}1} & {\color{red}2}\\ 3 & 4 \end{array}\right]\\ \\ \end{array}\begin{array}{c} \textbf{B}\\ \left[\begin{array}{cc} 5 & \color{red}{6}\\ 7 & \color{red}{8} \end{array}\right]\\ \begin{array}{cc} & \end{array} \end{array} & =\begin{array}{cc} & \textbf{AB}\\ & \left[\begin{array}{cc} 19& \color{red}{\left(1\right)\left(6\right)+\left(2\right)\left(8\right)} \\ \\ \end{array}\right]\\ \\ \end{array}\\ & = \begin{array}{cc} & \textbf{AB}\\ & \left[\begin{array}{cc} 19 & \color{red}{22}\\ \\ \end{array}\right]\\ \\ \end{array} \end{align*} $$ Continuing this process we get $$ \begin{align*} \underset{2\times2}{\textbf{AB}} & =\left[\begin{array}{cc} \left(1\right)\left(5\right)+\left(2\right)\left(7\right) & \left(1\right)\left(6\right)+\left(2\right)\left(8\right)\\ \left(3\right)\left(5\right)+\left(4\right)\left(7\right) & \left(3\right)\left(6\right)+\left(4\right)\left(8\right) \end{array}\right]=\left[\begin{array}{cc} 19 & 22\\ 43 & 50 \end{array}\right] \end{align*} $$ Note that the order in matrix multiplication is important. In general, $\textbf{AB} \ne \textbf{BA}$. In fact, even though the product $\textbf{AB}$ may be defined, the product $\textbf{BA}$ may not be defined at all.

In general, the product $\textbf{AB}$ is defined only when the number of columns in $\textbf{A}$ equals the number of rows in $\textbf{B}$.

For example: $$ \begin{align*} \underset{{\color{red}2}\times{\color{blue}3}}{\textbf{A}} & \quad\underset{{\color{blue}3}\times{\color{red}1}}{\textbf{B}}=\underset{{\color{red}2}\times{\color{red}1}}{\textbf{AB}} \end{align*} $$ is defined since the number of columns of $\textbf{A}$ (3) is equal to the number of rows of $\textbf{B}$ (3).

However, note that $$ \begin{align*} \underset{{\color{blue}3}\times{\color{red}1}}{\textbf{B}}\quad\underset{{\color{red}2}\times{\color{blue}3}}{\textbf{A}} \end{align*} $$ is not defined since the number of columns of $\textbf{B}$ (1) is not equal to the number of rows of $\textbf{A}$ (2).

When obtaining the product $\textbf{AB}$, we say that $\textbf{A}$ is postmultiplied by $\textbf{B}$ or $\textbf{B}$ is premultiplied by $\textbf{A}$.

4.3.4 Inverse of a Matrix

For ordinary (non-matrix) numbers, the inverse of a number is its reciprocal. Thus, the inverse of 2 is $\frac{1}{2}$

A number multiplied by its inverse always equals 1: $$ \begin{align*} &2\cdot\frac{1}{2}=\frac{1}{2}\cdot2=1 \end{align*} $$

In matrix algebra, the inverse of a matrix $\textbf{A}$ is another matrix, denoted by $\textbf{A}^{-1}$, such that: $$ \textbf{A}^{-1}\textbf{A}=\textbf{A}\textbf{A}^{-1}=\textbf{I} $$ where $\textbf{I}$ is the identity matrix.

Thus, the identity matrix $\textbf{I}$ plays the same role as the number 1 in ordinary algebra.

An inverse of a matrix is defined only for square matrices.

Even so, many square matrices do not have inverses.

If a square matrix does have an inverse, the inverse is unique.

If a the inverse of a matrix does not exist, then we say the matrix is singular. If the inverse does exist, then we say the matrix is nonsingular.

4.3.5 Basic Matrix Results

Below are some basic results for matrices presented without proof. They will be useful as we use matrices in regression. $$ \begin{align*} \textbf{A}+\textbf{B} & =\textbf{B}+\textbf{A} & (4.7)\\ \left(\textbf{A}+\textbf{B}\right)+\textbf{C} & =\textbf{A}+\left(\textbf{B}+\textbf{C}\right) &(4.8)\\ \left(\textbf{A}\textbf{B}\right)\textbf{C} & =\textbf{A}\left(\textbf{B}\textbf{C}\right)&(4.9)\\ \textbf{C}\left(\textbf{A}+\textbf{B}\right) & =\textbf{C}\textbf{A}+\textbf{C}\textbf{B}&(4.10)\\ k\left(\textbf{A}+\textbf{B}\right) & =k\textbf{A}+k\textbf{B}&(4.11)\\ \left(\textbf{A}^{\prime}\right)^{\prime} & =\textbf{A}&(4.12)\\ \left(\textbf{A}+\textbf{B}\right)^{\prime} & =\textbf{A}^{\prime}+\textbf{B}^{\prime}&(4.13)\\ \left(\textbf{A}\textbf{B}\right)^{\prime} & =\textbf{B}^{\prime}\textbf{A}^{\prime}&(4.14)\\ \left(\textbf{A}\textbf{B}\textbf{C}\right)^{\prime} & =\textbf{C}^{\prime}\textbf{B}^{\prime}\textbf{A}^{\prime}&(4.15)\\ \left(\textbf{A}^{-1}\right)^{-1} & =\textbf{A}&(4.16)\\ \left(\textbf{A}^{\prime}\right)^{-1} & =\left(\textbf{A}^{-1}\right)^{\prime}&(4.17) \end{align*} $$

4.3.6 Matrix Differentiation

There are a number of results when using matrix calculus which are beyond the scope of this course. We will present a few results for matrix differentiation that will be useful in multiple regression.

It is important to note that matrix calculus can be confusing due to notational conventions that are used in various fields. There are two main conventions (although the two are sometimes mixed by some authors) that are based how to take a derivative with respect to a vector. One convention is the numerator layout and the other is the denominator layout. Below, we will present the results using numerator layout.

In all the results that follow, let $d$ be a scalar, ${\bf A}$ be a $n\times1$ vector with elements $[a_{i}]$, ${\bf B}$ be a $m\times1$ vector with elements $[b_{i}]$, and ${\bf C}$ be a $p\times q$ matrix with elements $[c_{ij}]$.

Vector by a Scalar

\begin{align*} \frac{\partial{\bf A}}{\partial d} & =\left[\begin{array}{c} \frac{\partial a_{1}}{\partial d}\\ \frac{\partial a_{2}}{\partial d}\\ \vdots\\ \frac{\partial a_{n}}{\partial d} \end{array}\right] \end{align*}

Scalar by a Vector

\begin{align*} \frac{\partial d}{\partial{\bf A}} =\left[\begin{array}{cccc} \frac{\partial d}{\partial a_{1}} & \frac{\partial d}{\partial a_{2}} & \cdots & \frac{\partial d}{\partial a_{n}}\end{array}\right] \end{align*}

Vector by a Vector

\begin{align*} \frac{\partial{\bf A}}{\partial{\bf B}} & =\left[\begin{array}{cccc} \frac{\partial a_{1}}{\partial b_{1}} & \frac{\partial a_{1}}{\partial b_{2}} & \cdots & \frac{\partial a_{1}}{\partial b_{m}}\\ \frac{\partial a_{2}}{\partial b_{1}} & \frac{\partial a_{2}}{\partial b_{2}} & \cdots & \frac{\partial a_{2}}{\partial b_{m}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial a_{n}}{\partial b_{1}} & \frac{\partial a_{n}}{\partial b_{2}} & \cdots & \frac{\partial a_{n}}{\partial b_{m}} \end{array}\right] \end{align*}

Matrix by a Scalar

\begin{align*} \frac{\partial{\bf C}}{\partial d} & =\left[\begin{array}{cccc} \frac{\partial c_{11}}{\partial d} & \frac{\partial c_{12}}{\partial d} & \cdots & \frac{\partial c_{1q}}{\partial d}\\ \frac{\partial c_{21}}{\partial d} & \frac{\partial c_{22}}{\partial d} & \cdots & \frac{\partial c_{2q}}{\partial d}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial c_{p1}}{\partial d} & \frac{\partial c_{p2}}{\partial d} & \cdots & \frac{\partial c_{pq}}{\partial d} \end{array}\right] \end{align*}

Scalar by a Matrix

\begin{align*} \frac{\partial d}{\partial{\bf C}} & =\left[\begin{array}{cccc} \frac{\partial d}{\partial c_{11}} & \frac{\partial d}{\partial c_{21}} & \cdots & \frac{\partial d}{\partial c_{p1}}\\ \frac{\partial d}{\partial c_{12}} & \frac{\partial d}{\partial c_{22}} & \cdots & \frac{\partial d}{\partial c_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial d}{\partial c_{1q}} & \frac{\partial d}{\partial c_{2q}} & \cdots & \frac{\partial d}{\partial c_{pq}} \end{array}\right] \end{align*}

Common Derivatives Involving Matrices

\begin{align*} & \frac{\partial{\bf A}^{\prime}{\bf A}}{\partial{\bf A}}=2{\bf A}^{\prime}\\ & \frac{\partial{\bf A}^{\prime}{\bf B}}{\partial{\bf B}}=\frac{\partial{\bf B}^{\prime}{\bf A}}{\partial{\bf B}}={\bf A}^{\prime} & & \text{(provided }m=n)\\ & \frac{\partial{\bf \left({\bf A}^{\prime}{\bf B}\right)^{2}}}{\partial{\bf A}}=2{\bf A}^{\prime}{\bf B}{\bf B}^{\prime} & & \text{(provided }m=n)\\ & \frac{\partial{\bf C}{\bf A}}{\partial{\bf A}}={\bf C} & & \text{(provided }q=n)\\ & \frac{\partial{\bf A}^{\prime}{\bf C}}{\partial{\bf A}}={\bf C}^{\prime} & & \text{(provided }p=n)\\ & \frac{\partial{\bf A}^{\prime}{\bf C}{\bf A}}{\partial{\bf A}}={\bf A}^{\prime}\left({\bf C}+{\bf C}^{\prime}\right) & & \text{(provided }n=p=q) \end{align*}

4.3.7 Random Matrices

A random matrix contains elements that are random variables.

Thus, the vector of the response vector \begin{align*} {\bf Y} & =\left[\begin{array}{c} Y_{1}\\ Y_{2}\\ \vdots\\ Y_{n} \end{array}\right] \end{align*} is a random vector since the $Y_i$ elements are random variables.

Expected Value

The expected value of ${\bf Y}$ is a matrix (or vector) that has elements that are the expected values of the elements of ${\bf Y}$. Thus, \begin{align*} {\bf E}\left[{\bf Y}\right] & =\left[\begin{array}{c} E\left[Y_{1}\right]\\ E\left[Y_{2}\right]\\ \vdots\\ E\left[Y_{n}\right] \end{array}\right] \end{align*}

Variance-Covariance Matrix

When working with random vectors, we will be interested in the variance of the individual elements \begin{align*} Var\left[Y_{i}\right] \end{align*} along with the covariance between pairs of elements \begin{align*} Cov\left[Y_{i},Y_{j}\right] & \text{ }i\ne j. \end{align*} All of these variances and covariances are given in the variance-covariance matrix or simply covariance matrix: \begin{align*} {\bf Cov}\left[{\bf Y}\right] & =\left[\begin{array}{cccc} Var\left[Y_{1}\right] & Cov\left[Y_{1},Y_{2}\right] & \cdots & Cov\left[Y_{1},Y_{n}\right]\\ Cov\left[Y_{2},Y_{1}\right] & Var\left[Y_{2}\right] & \cdots & Cov\left[Y_{2},Y_{n}\right]\\ \vdots & \vdots & \ddots & \vdots\\ Cov\left[Y_{n},Y_{1}\right] & Cov\left[Y_{n},Y_{2}\right] & \cdots & Var\left[Y_{n}\right] \end{array}\right] \end{align*} Note that ${\bf Cov}\left[{\bf Y}\right]$ is a symmetric matrix since $Cov\left[Y_{i},Y_{j}\right]=Cov\left[Y_{j},Y_{i}\right]$.

« 4.2: Estimating the Multiple Regression Model 4.4: The Regression Model in Matrix Terms »