Matrices

I give up. I'm forced into a matrix review page.

Grammarian note: It is one Matrix, two matrices.

So... Here is a quick review.

I have made a collection of Matrix Identities that I use. Reading them could be helpful in following the otherwise short derivations scattered around this paper.

Numbers

A single number in matrix notation is called a scalar. It can be looked at as a number, or as a 1 x 1 matrix, or as a one element row or column.

Rows

A row (also called a row vector) is just an ordered collection of elements. For example,

[ a  b  c ]

is a row.

If you have two rows of the same length, you can add the rows by adding the corresponding elements in each row. For example, the row

[ d e f ] + [ g h i ] = [ d+g  e+h  f+i ]

One can multiply a row by a scalar (number). For example,

2 * [ a b c ] = [ 2a 2b 2c ]

A row may have any number of elements, from one on up.

If Z is a row, Z(i) means the i'th element of that row.

Columns

A column (also called a column vector) is just like a row, except it is arranged vertically. For example:

[ a ]
[ b ]
[ c ]

Columns can be added, or multiplied by a scalar (number) the same way that rows can:

[ a ]     [ g ]     [ a+g ]
[ b ]  +  [ h ]  =  [ b+h ]
[ c ]     [ i ]     [ c+i ]

    [ a ]    [ 2a ]
2 * [ b ] =  [ 2b ]
    [ c ]    [ 2c ]

A column may have any number of elements, from one on up.

If Y is a column, then Y(i) means the i'th element of the column, counting from the top.

Operations on rows and columns

If you have a row and a column, you can form what's called an "inner product", or "vector product", or a "dot product". The inner product is the product of the first element of the row with first element of the column, plus the product of the second two elements, etc. For example:

[ a b c ] [ d ] =   [ a*d + b*e + c*f ]  (or just the scalar)  a*d + b*e + c*f
          [ e ]
          [ f ]

A row and column that have a dot product of zero are called Orthogonal vectors.

There is also another kind of product ("outer product")

[ a ] [ d e f ] =  [ a*d  a*e  a*f ]
[ b ]              [ b*d  b*e  b*f ]
[ c ]              [ c*d  c*e  c*f ]

Matrices

A matrix can be looked at as a column of rows, or as a row of columns.

Examples of matrices (with their sizes):

[ a b c ]   This is a 3 x 3 matrix.
[ d e f ]
[ g h i ]

[ a b c ]   This is a 2 x 3 matrix.
[ d e f ]

[ a b ]     This is a 3 x 2 matrix.
[ c d ]
[ e f ]

[ a ]      This is a 1 x 1 matrix.  Or it is a number, your choice.
           (Or it is a scalar)

[ a b c ]  This is a 1 x 3 matrix, or it is a row, also your choice.

[ a ]      This is a 3 by 1 matrix, or it is a column, whichever way you
[ b ]      want, they are the same thing.
[ c ]

If X is a matrix, X(i,j) means the element at the i'th row, and j'th column of the matrix. So, in the matrix X where X =

[ a b c ]         X(2, 3) = f
[ d e f ]         X(3, 2) = h
[ g h i ]
[ j k l ]

A matrix is called a square matrix if it has the same numbers of rows as columns.

A zero matrix is a matrix where all the elements are zeros. For example

   [ 0 0 0 ] is a zero matrix.
   [ 0 0 0 ]
   [ 0 0 0 ]

A zero matrix serves many of the same functions in matrix arithmetic that 0 does in regular arithmetic, and is the additive identity. It is often written as just "0" in matrix equations.

The identity matrix is the multiplicitive identity, and is a square matrix that has one's on the diagonal, and zeros everywhere else. For example:

[ 1 0 0 ]  is an identity matrix.
[ 0 1 0 ]
[ 0 0 1 ]

The identity matrix is usually written "I".

An identity matrix serves may of the same functions in matrix arithmetic that 1 does in regular arithmetic.

The diagonal of a matrix are the elements that have identical row and column numbers. (X(i,i)) For example:

The diagonal of [ a b c ] is a, e, and i.
                [ d e f ]
                [ g h i ]

A diagonal matrix is one that has non-zero elements only on the diagonal.

[ * 0 0 0 ]  Diagonal matrix.
[ 0 * 0 0 ]
[ 0 0 * 0 ]
[ 0 0 0 * ]

A block diagonal matrix is like a diagonal matrix, except that elements exist in the positions arranged as blocks. Example: (Where * means a non-zero element.)

[ * 0 0 0 0 0 0 0 ]
[ 0 * * 0 0 0 0 0 ]
[ 0 * * 0 0 0 0 0 ]
[ 0 0 0 * * * 0 0 ]  This is a block diagonal matrix.
[ 0 0 0 * * * 0 0 ]
[ 0 0 0 * * * 0 0 ]
[ 0 0 0 0 0 0 * * ]
[ 0 0 0 0 0 0 * * ]

A Band Matrix has numbers near the diagonal of the matrix, and nowhere else. The width of the band is called the band width of the matrix.

[ * * * 0 0 0 0 0 ]
[ * * * * 0 0 0 0 ]
[ * * * * * 0 0 0 ]
[ 0 * * * * * 0 0 ]  This is a band matrix.
[ 0 0 * * * * * 0 ]
[ 0 0 0 * * * * * ]
[ 0 0 0 0 * * * * ]
[ 0 0 0 0 0 * * * ]

A matrix is called sparse if most of the elements in it are zero.

[ 0 0 0 * 0 0 0 0 ]
[ 0 * 0 0 0 0 * 0 ]
[ 0 0 0 0 * 0 0 0 ]
[ 0 0 * 0 0 0 0 0 ]  This is a sparse matrix.
[ * 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 * ]
[ 0 0 * 0 0 * 0 0 ]
[ 0 0 0 * 0 0 0 0 ]

A matrix is called dense if it is not sparse.

The transpose of a matrix "N" (Written N') is just a matrix "P" such that N(i,j) = P(j,i). For example:

transpose ( [ a b c ] ) =  [ a  d ]   (The matrix has just been
            [ d e f ]      [ b  e ]   reflected across the
                           [ c  f ]   diagonal.)
Or with ' notation:
if A = [ a b c ]  then A' = [ a d ]
       [ d e f ]            [ b e ]
                            [ c f ]

If a matrix M has the property that

M'M = I

than the matrix is called an Orthogonal matrix

The transpose of a product (AB)' is the product of the individual transposes in reverse order. (B'A')

The transpose of a row is a column, and the transpose of a column is a row. Example:

[ a  b  c]' = [ a ]
              [ b ]
              [ c ]

It is usually easier to write column matrices using transposes as

[a b c]'

than as

[a]
[b]
[c]

If a matrix is equal to its own transpose, it is called a symmetric matrix. (See below)

A number is called a scalar when talking about matrices, and can often be thought of as a 1x1 matrix

A scalar multiplied by a matrix is a conventional way of indicating that EVERY element of the matrix is multiplied by the number. In more formal terms, the matrix is multiplied by a diagonal matrix where every diagonal element has been set to that scalar

The transpose of a scalar is just that number.

(AB)' = B'A' in general. (A transpose of a product is the product of the transposes in reverse order)

A symmetric matrix is a square matrix equal to it's transpose. For example:

[ a b c d ]  is a symmetric matrix.
[ b e f g ]     (Note that it looks like it was reflected
[ c f h i ]     across the diagonal.)
[ d g i j ]

A 1x1 matrix (just a number) or a scalar is symmetric.

A symmetric matrix has the property that A = A'.

A matrix with entries only below the diagonal, or with entries only above the diagonal, is called a (lower, upper) triangular matrix. If the diagonal in those cases consists only of 1's, then the matrix is unit triangular. Examples:

upper triangular            lower triangular
 [ * * * * * ]               [ * 0 0 0 0 ]
 [ 0 * * * * ]               [ * * 0 0 0 ]
 [ 0 0 * * * ]               [ * * * 0 0 ]
 [ 0 0 0 * * ]               [ * * * * 0 ]
 [ 0 0 0 0 * ]               [ * * * * * ]

upper unit triangular     lower unit triangular
 [ 1 * * * * ]               [ 1 0 0 0 0 ]
 [ 0 1 * * * ]               [ * 1 0 0 0 ]
 [ 0 0 1 * * ]               [ * * 1 0 0 ]
 [ 0 0 0 1 * ]               [ * * * 1 0 ]
 [ 0 0 0 0 1 ]               [ * * * * 1 ]

Matrices of the same size may be added, by making a new matrix of the same size, with elements that just add the corresponding elements from the matrices being added. For example:

[ a b c ] + [ h i j ] = [ a+h b+i c+j ]
[ d e f ]   [ k l m ]   [ d+k e+l f+m ]

A matrix may be multiplied by a scalar. The result is the same as multiplying each element by that scalar. For example:

3 * [ a b c ] = [ 3 0 ] * [ a b c ] = [ 3*a 3*b 3*c ]
    [ d e f ]   [ 0 3 ]   [ d e f ]   [ 3*d 3*e 3*f ]

Matrix Multiplication

Multiplication of matrices is not that simple. The number of columns of the first matrix must be equal to the number of rows of the second. The element (I, J) of the new matrix is the dot product of the I'th row of the first matrix and the J'th column of the second. The result of the multiplication has as many rows as the first matrix, and as many columns as the second. Example:

[ a b c ]  *  [ g h ] = [ a*g+b*i+c*k   a*h+b*j+c*l ]
[ d e f ]     [ i j ]   [ d*g+e*i+f*k   d*h+e*j+f*l ]
              [ k l ]

which can be looked at in several ways. Element (I,J) can be looked at as the dot product of the I'th row and J'th column:

[a b c] * [g h] = [ [a b c]*[g] | [a b c]*[h] ] = [ a*g+b*i+c*k   a*h+b*j+c*l ]
[d e f]   [i j]   [         [i] |         [j] ]   [ d*g+e*i+f*k   d*h+e*j+f*l ]
          [k l]   [         [k] |         [l] ]
                  [ ____________+____________ ]
                  [ [d e f]*[g] | [d e f]*[h] ]
                  [         [i] |         [j] ]
                  [         [k] |         [l] ]

Or it can be looked at as the sum of outer products. For each column of the first matrix, the J'th product is the sum of the I'th column of the first matrix forming an outer product with the I'th row of the second. (This is not a typographical error... The J'th item really is column I of the first outer producted with row I of the second matrix.) For example:

[a b c] * [g h]
[d e f]   [i j]
          [k l]
=
[a]*[g h]   + [b]*[i j]   + [c]*[k l]
[d]           [e]           [f]
 =
[ a*g a*h ] + [ b*i b*j ] + [ c*k c*l ]  =  [ a*g+b*i+c*k   a*h+b*j+c*l ]
[ d*g d*h ]   [ e*i e*j ]   [ f*k f*l ]     [ d*g+e*i+f*k   d*h+e*j+f*l ]

As you can see in this example, the two definitions give the same result.

MATRIX MULTIPlICATION DOES NOT COMMUTE except for very special cases. Unlike regular arithmetic, A*B with matrices is not the same as B*A. (The proper term for this is that "matrix multiplication does not commute") Because matrix multiplication does not commute, we have to distinguish between premultiplying by M (as in M*P) and postmultiplying by M (as in P*M).

Magnitudes

Just as numbers can have magnitudes matrices can have magnitudes matrices can have magnitudes. The magnitude of a matrix is called the "determinant". If the determinant of a matrix is 0, then the matrix is Singular and has no multiplicitive inverse. I won't even get into how determinants are computed... it is ugly and any good book on matrices goes into that in great detail.

What you need to know is that they are positive, negative, or zero, and if they are zero there is no matrix inverse. If the determinant is zero, the matrix is called singular.

Geek note: That doesn't actually mean that if the determinant is near zero the matrix is near singular. [See Golub and Van Loan for example.]

Side note: Any matrix with two identical rows is singular. Any matrix that has one row that is a multiple of another is singular. Any matrix with a zero row or zero column is singular.

Inverses

If a matrix called "A" is not singular, there is another matrix [ inverse (A) ] such that A*inverse(A) is an identity matrix. This is very much like regular multiplication, where a*inverse(a) = 1 unless a = 0. I won't go into where we get the inverse from. Any good book on matrices will go into that in great detail, and it is rather ugly.

Geek Note: Often the inverse of matrix A is represented by A', although this can be confused with the use of the single quote for transpose.

What you need to know is that they exist, and they can undo a matrix multiplication. With the matrices we will be dealing with there are special fast ways of forming them.

The inverse(AB) = inverse(B)*inverse(A). (The inverse of a product is the product of the inverses in reverse order.)

In general, matrices can be factored into matrices of special forms that are easy to work with. For example, if a matrix factors to ABCD, and A, B, C, and D are easy to invert, then the inverse of the product is just the product of those inverses in reverse order.

Quadratic Forms

A quadratic form is a very special multiplication of matrices. For two matrices A, B, the quadratic form is

A * B * A'

or

A' * B * A

This form shows up all over the descriptions of least squares problems. It is a very important part of the discussion of weights and covariance matrices.

The result of this multiplication always yields a symmetric matrix if B is symmetric.

Geek note:

A' * A

is always symmetric, since it can be looked at as

A' * I * A

and the identity matrix is symmetric.

Positive Definite

There is a very special property of some matrices. If v is a (column) vector, then the quadratic form v'Bv is a single number. (The matrix multiplications are of sizes (1xN)(NxN)(Nx1), so the result is 1x1.)

If, for any non-zero vector v, v'Bv > 0, then the B matrix is called positive definite.

Geek note: If B is positive definite, it must also be square. (Otherwise the v'Bv multiplication wouldn't have conforming matrices.)

Geek note: A 1 x 1 matrix is positive definite if the only element is positive.

There is a lot of very special mathematics that goes with this, but the short answer is that positive definite matrices are are very easy to deal with, and show up all over the place when doing least squares methods.

The most important properties of positive definite matrices are that they have inverses and are easy to deal with numericly.

Commentary

Unfamiliarity

A goodly number of forms that one is familiar with with algebra in real numbers look a lot different in matrix form because of the limits imposed by not being able to commute multiplication.

Putting something over a common denominator:
With real numbers: a/b + c/d = (ad+cb)/bd
This has hidden assumptions of multiplication commuting:
   (a/b)(d/d) equaling (ad/bd) assumes you can rearrange things to your hearts content.

You do have rules like
/(a/b) = b/a
and
a/b*b/c = a/c
but not many of the other familiar rules.

Because of this some things look unfamiliar when we play with them in matrix form.

For example, with real numbers:

a/b + c/d =        a/b   +     c/d
          = a/b( /(a/b)  +   /(c/d))c/d   [ An identity above.]
          = a/b(   b/a   +     d/c )c/d   [ /(x/y) = //y*/x = y/x ]
          = a  (/b*b/a   + /b*d/c  )c/d   [ Left distribution ]
          = a  (/b*b/a*c + /b*d/c*c) /d   [ Right distribution ]
          = a  (    /a*c + /b*d    ) /d   [ Canceling ]
          = a  (    c/a +   d/b    ) /d   [ ab = ba ]

But compare that with the matrix equivalent:

a/b + c/d  = a/b ( /(a/b)    +  /(c/d))  )  c/d     [ Identity listed above ]
           = a/b ( //b/a     +  //d/c    )  c/d     [ /(x/y) = y/x ]
           = a/b (   b/a     +    d/c    )  c/d     [ //x=x ]
           = a   (/b*b*/a    + /b*d/c    )  c/d     [ Left distribution ]
           = a   (/b*b*/a*c  + /b*d/c*c  )   /d     [ Right distribution ]
           = a   (     /a*c  + /b*d      )   /d     [ Cancelation ]

Which doesn't look particularly familiar to most folk...


Go to ...


This page is http://www.cc.utah.edu/~nahaj/math/matrices.html
© Copyright 2003-2010 by John Halleck, All Rights Reserved.
This snapshot was last modified on August 19th, 2011
And the underlying file was last modified on January 21st, 2010