## Matrix objects in R

#### A Matrix is a two dimensional array

Here is an example with 3 rows and 3 columns. I’ve assigned the values 1-9 to the cell for future reference.

       col_1 col_2 col_3
row_1   1     4     7
row_2   2     5     8
row_3   3     6     9


### Customary terminology

In math books, the rows are indexed by $$i = {1, 2, \ldots, m}$$ and the columns are indexed by $$j = {1, 2, \ldots, n}$$. That is to say, the matrix is $$m \times n$$, meaning $$m$$ rows and $$n$$ columns.

In Data Science examples, that might be confusing. If we have, say 3 predictors in a matrix, we are used to referring to the rows as observations for individual cases. I’ve numbered the rows $$1$$ through $$n$$. For the sample size, from a very young age, I was taught to use $$N$$ (and if I’m not allowed capital letters, $$n$$).

       pred_1 pred_2   pred_3
1      1       n+1     2n+1
2      2       n+2     2n+2
3      3       n+3     2n+3
...
n      n       2n      3n

Hence, although in math the author would want to refer to $$m$$ rows and $$n$$ columns, we often avoid that in data science and refer to $$n$$ rows and $$p$$ columns. This is just a matter of habit.

### More terminology

I don’t have the habit of using special letters and names for matrices. I know some people who insist that all matrices must be named with CAPITAL LETTERS. I know one R user who names every matrix object with suffix “.mat” just to remember what she has.

I think naming everything as “*.mat" makes code somewhat verbose.

I do think that using capital letters for matrices is nice and I wish I remembered to do that more regularly.

## R methods for creating matrices

#### Re-dimension an array

x <- seq(1L, 27L, 1L)
dim(x) <- c(9, 3)
x
      [,1] [,2] [,3]
[1,]    1   10   19
[2,]    2   11   20
[3,]    3   12   21
[4,]    4   13   22
[5,]    5   14   23
[6,]    6   15   24
[7,]    7   16   25
[8,]    8   17   26
[9,]    9   18   27

Hm. Why did that happen?

Now that x is a matrix, we can add names to the dimensions, either as

rownames(x) <- letters[1:9]
colnames(x) <- paste0("col", 1:3)
x
  col1 col2 col3
a    1   10   19
b    2   11   20
c    3   12   21
d    4   13   22
e    5   14   23
f    6   15   24
g    7   16   25
h    8   17   26
i    9   18   27

or the equivalent

x <- seq(1L, 27L, 1L)
dim(x) <- c(9, 3)
dimnames(x) <- list(letters[1:9], paste0("col", 1:3))
x
  col1 col2 col3
a    1   10   19
b    2   11   20
c    3   12   21
d    4   13   22
e    5   14   23
f    6   15   24
g    7   16   25
h    8   17   26
i    9   18   27

#### Use cbind() or rbind() to fit vectors together

cbind and rbind are discussed the data_structures-vectors note.

#### The matrix() function

The matrix() function will fill the columns of a matrix from top to bottom, left to right. In otherwords, it is exactly the same as re-dimensioning the vector as demonstrated above.

y <- seq(1L, 27L, 1L)
matrix(y, ncol = 3)
      [,1] [,2] [,3]
[1,]    1   10   19
[2,]    2   11   20
[3,]    3   12   21
[4,]    4   13   22
[5,]    5   14   23
[6,]    6   15   24
[7,]    7   16   25
[8,]    8   17   26
[9,]    9   18   27
y
   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
 23 24 25 26 27

There are other benefits to using the matrix function, however.

1. Wrap together the assignment of names with a single command
y <- seq(1L, 27L, 1L)
matrix(y, ncol = 3, dimnames = list(letters[1:9], paste0("col", 1:3)))
  col1 col2 col3
a    1   10   19
b    2   11   20
c    3   12   21
d    4   13   22
e    5   14   23
f    6   15   24
g    7   16   25
h    8   17   26
i    9   18   27
y
   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
 23 24 25 26 27
1. Can alter the order in which values are inserted into the matrix with byrow = TRUE.
y <- seq(1L, 27L, 1L)
matrix(y, nrow = 3, byrow = TRUE,
dimnames = list(paste0("row", 1:3), letters[1:9]))
      a  b  c  d  e  f  g  h  i
row1  1  2  3  4  5  6  7  8  9
row2 10 11 12 13 14 15 16 17 18
row3 19 20 21 22 23 24 25 26 27
y
   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
 23 24 25 26 27

Note that y is the transpose (t) of x, except that the names were not corrected similarly:

t(x)
      a  b  c  d  e  f  g  h  i
col1  1  2  3  4  5  6  7  8  9
col2 10 11 12 13 14 15 16 17 18
col3 19 20 21 22 23 24 25 26 27

## Accessing elements

1. Take out the element in the i’th row, j’th column by itself
x[2 , 3]
 20
1. Take out the i’th row(leave column index blank: keep all columns)
x[2, ]
col1 col2 col3
2   11   20 
1. Take out the j’th column (keep all rows)
x[ , 3]
 a  b  c  d  e  f  g  h  i
19 20 21 22 23 24 25 26 27 
1. Create new row that is columns 2 and 3 from row 4.
x[4, 2:3]
col2 col3
13   22 
1. If dimnames exist, then we can also choose by name, as in
x["b", "col3"]
 20
x["b", ]
col1 col2 col3
2   11   20 
x[   , "col3"]
 a  b  c  d  e  f  g  h  i
19 20 21 22 23 24 25 26 27 

Comment I believe it is much safer to access matrix elements by name, since we are less likely to make a mistake typing “col3” than we are if we use integer indexes to extract elements.

## Common functions to use with matrices

### + and - work as expected.

y <- matrix(2, nrow = 9, ncol = 3)
x + y
  col1 col2 col3
a    3   12   21
b    4   13   22
c    5   14   23
d    6   15   24
e    7   16   25
f    8   17   26
g    9   18   27
h   10   19   28
i   11   20   29
x -y
  col1 col2 col3
a   -1    8   17
b    0    9   18
c    1   10   19
d    2   11   20
e    3   12   21
f    4   13   22
g    5   14   23
h    6   15   24
i    7   16   25

### * and / conduct “term-wise” calculations

y * x
  col1 col2 col3
a    2   20   38
b    4   22   40
c    6   24   42
d    8   26   44
e   10   28   46
f   12   30   48
g   14   32   50
h   16   34   52
i   18   36   54
y / x
       col1      col2       col3
a 2.0000000 0.2000000 0.10526316
b 1.0000000 0.1818182 0.10000000
c 0.6666667 0.1666667 0.09523810
d 0.5000000 0.1538462 0.09090909
e 0.4000000 0.1428571 0.08695652
f 0.3333333 0.1333333 0.08333333
g 0.2857143 0.1250000 0.08000000
h 0.2500000 0.1176471 0.07692308
i 0.2222222 0.1111111 0.07407407

In math books, * would be called a “Hadamard” product. In basic statistics, this kind of product would not come up very often. It is symbolized as $$x \circ y$$. So far as I know, there is no math name for the / used here. Most linear algebra books will say “there is no division of matrices”. But there is in R!

### Transpose

t(x)
      a  b  c  d  e  f  g  h  i
col1  1  2  3  4  5  6  7  8  9
col2 10 11 12 13 14 15 16 17 18
col3 19 20 21 22 23 24 25 26 27

### Matrix multiplication: %*%

X %*% Y

### Diagonal access

For square matrices, diag(x) will take the main diagonal and create a new vector with it.

set.seed(234234)
Z <- matrix(rnorm(9), ncol = 3)
Z
           [,1]       [,2]      [,3]
[1,] -0.1308295 -0.4879708 0.1348567
[2,] -0.6777994 -0.1845969 0.1350134
[3,]  0.1435791  0.5976032 0.8004994
Zdiag <- diag(Z)
Zdiag
 -0.1308295 -0.1845969  0.8004994

Surprisingly, if x is a vector, then diag() will create a matrix with that vector on the main diagonal.

diag(Zdiag)
           [,1]       [,2]      [,3]
[1,] -0.1308295  0.0000000 0.0000000
[2,]  0.0000000 -0.1845969 0.0000000
[3,]  0.0000000  0.0000000 0.8004994

I say surprisingly because I’ve had very inconvenient errors caused by the flexibility of these functions.

### Access the triangles

The lower.tri() function returns a TRUE/FALSE matrix

lower.tri(Z)
      [,1]  [,2]  [,3]
[1,] FALSE FALSE FALSE
[2,]  TRUE FALSE FALSE
[3,]  TRUE  TRUE FALSE
lower.tri(Z, diag = TRUE)
     [,1]  [,2]  [,3]
[1,] TRUE FALSE FALSE
[2,] TRUE  TRUE FALSE
[3,] TRUE  TRUE  TRUE

and we can use the returned matrix to select a sub-matrix as follows.

Z[lower.tri(Z, diag = TRUE)]
 -0.1308295 -0.6777994  0.1435791 -0.1845969  0.5976032  0.8004994

Note that the return from this is a vector, which is called a “vech”. It is a vector that could re-produce the matrix, if we filled in a matrix with the vector.

### crossprod and tcrossprod

In many statistical procedures, it is necessary to calculate a product formed as the transpose of one matrix times another matrix. Often, this is

$X^T X$

Although there is a good argument (with good reasons) why it is wrong to make that calculation most of the time (numerical rounding error), it is frequently done and sometimes unavoidable.

Note that the result from $$X^T X$$ is symmetric:

t(x) %*% x
     col1 col2 col3
col1  285  690 1095
col2  690 1824 2958
col3 1095 2958 4821

Creating the product that way wastes a lot of CPU effort.

1. It is never actually necessary to create t(x) (the computer can find the pieces it needs without allocating memory and copying x in that way

2. The result is symmetric, which means we only need to calculate the upper or lower triangle and then copy up.

The crossprod() functions are included within R because the manual calculation of t(x) %*% x is the single-most-inefficient thing that most R novices do.

and tcrossprod

The syntax to calculate $$X^T X$$ is

crossprod(x)
     col1 col2 col3
col1  285  690 1095
col2  690 1824 2958
col3 1095 2958 4821

When product is $$X^T$$ times $$X$$, we only need the argument X, but if there is a transpose of $$X$$ times $$Z$$, for example, the syntax would be

crossprod(X, Z)

Similarly, if one needed to calculate $$X X^T$$ or $$X Y^T$$ the tcrossprod function should be used.

tcrossprod(X)
## or
tcrossprod(X, T)

### Matrix decompositions

R functions send the “real calculations” to general purpose matrix algebra libraries that are written in C and Fortran. These libraries are, such as “LAPACK”, are part of an internationally promulgated standard for numerical accuracy.

R has some functions that were written in the older style with less accurate matrix algebra, such as princomp(), while the documentation for that function mentiones that it is not using the currently recommended method: “.

The calculation is done using ‘eigen’ on the correlation or covariance matrix, as determined by ‘cor’. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use ‘svd’ on ‘x’, as is done in ‘prcomp’.

## The “Drop Gotcha”

R has a general policy of “demoting” objects to the simplest storage format possible.

#### Demotion does not matter if $$\ldots$$

the extracted material has more than 2 rows and more than 2 columns. Observe that x[ , 2:3] is still a matrix, for example.

x2 <- x[ , 2:3]
is.matrix(x2)
 TRUE

#### Demotion does matter if $$\ldots$$

the extracted material has only 1 row or 1 column. In that case, R’s demotion policy will convert the result into a vector, rather than a matrix.

x2 <- x[ , 2]
is.matrix(x2)
 FALSE
is.vector(x2)
 TRUE

#### I call that the “drop gotcha”.

In all of the R programming I’ve done, the worst, most frustrating problem is that R demotes a one column (or row) matrix to a vector, and some calculations behave very differently. It is not always a problem, but in that 5% of cases, it is very frustrating. Please see “R’s drop “gotcha” and the diag “curse” http://pj.freefaculty.org/blog/?p=274.

This usually happens to me if I’m constructing a vector of column names. Say I’m aiming for a result like indx <- c("col1", "col2", "col3"). Then I select vectors as

x[ , indx]

As long as indx has 2 or more elements, no problem. But if indx has just one element, then the result becomes a vector, and then all hell can break loose (see the blog post).

** drop = FALSE **

The way to avoid this is somewhat awkward. Rather than writing simply x[ , indx], insert a 3rd argument drop = FALSE, as in x[ , indx, drop = FALSE].

x3 <- x[ , 2, drop = FALSE]
x3
  col2
a   10
b   11
c   12
d   13
e   14
f   15
g   16
h   17
i   18
dim(x3)
 9 1
is.matrix(x3)
 TRUE

#### Example 1. Take a single row out of x

xb <- x["b", ]

Note that although x had dimensions (9,3), the new thing xb has no dimensions.

dim(x)
 9 3
dim(xb)
NULL

xb is an R vector now.

Mathematically, a $$3 \times 1$$ column times a $$1 \times 3$$ row ought to be a $$3 \times 3$$ matrix, but it is not what we get if we do the obvious thing:

c(1, 2, 3) %*% xb
     [,1]
[1,]   84

We are getting back the equivalent of the inner product

xb %*% c(1, 2, 3)
     [,1]
[1,]   84

If you did want the product, and you want to make sure that $$xb$$ is treated as a row vector, you’d need to explicity ask for an “outer” product, as in

outer(c(1,2,3), xb)
     col1 col2 col3
[1,]    2   11   20
[2,]    4   22   40
[3,]    6   33   60
## same as
c(1,2,3) %o% xb
     col1 col2 col3
[1,]    2   11   20
[2,]    4   22   40
[3,]    6   33   60

We could also “transpose” xb back from a column into a row, I suppose.

c(1,2,3) %*% t(xb)
     col1 col2 col3
[1,]    2   11   20
[2,]    4   22   40
[3,]    6   33   60

#### Example 2. And then there’s this frustrating thing in diag

Compare the following

x23 <- x[ , c(2, 3)]
x23
  col2 col3
a   10   19
b   11   20
c   12   21
d   13   22
e   14   23
f   15   24
g   16   25
h   17   26
i   18   27
diag(x23)
 10 20
x3 <- x[ , 3]
x3
 a  b  c  d  e  f  g  h  i
19 20 21 22 23 24 25 26 27 
diag(x3)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]   19    0    0    0    0    0    0    0    0
[2,]    0   20    0    0    0    0    0    0    0
[3,]    0    0   21    0    0    0    0    0    0
[4,]    0    0    0   22    0    0    0    0    0
[5,]    0    0    0    0   23    0    0    0    0
[6,]    0    0    0    0    0   24    0    0    0
[7,]    0    0    0    0    0    0   25    0    0
[8,]    0    0    0    0    0    0    0   26    0
[9,]    0    0    0    0    0    0    0    0   27

## Packages and matrix convenience

### the rockchalk package

In the rockchalk package, I have some “convenience” functions that make it easier to create matrices that represent correlation and covariance. These are called lazyCor() and lazyCov().

#### Examples

1. Convert a vech to fill in the lower and upper triangles
library(rockchalk)
rho <- lazyCor(c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6))
rho
     [,1] [,2] [,3] [,4]
[1,]  1.0  0.1  0.2  0.3
[2,]  0.1  1.0  0.4  0.5
[3,]  0.2  0.4  1.0  0.6
[4,]  0.3  0.5  0.6  1.0

(A “vech” is the “strictly lower triangle” vector, while “vec” would include the diagonal elements as well).

1. Create a covariance matrix.

Covariance is difficult to conceptualize and create because it has no human-understandable scale. We generally do better if we look at standard deviation and correlation matrices, from which covariance can be created. The lazyCov function will do this. I create a standard deviation vector for 4 variables, then use the previously constructed correlation matrix rho.

mysd <- c(10, 20, 30, 100)
Sigma <- lazyCov(Rho = rho, Sd = mysd)
Sigma
     [,1] [,2] [,3]  [,4]
[1,]  100   20   60   300
[2,]   20  400  240  1000
[3,]   60  240  900  1800
[4,]  300 1000 1800 10000
1. Draw observations from a multivariate-normal distribution.
mymu <- c(0, 0, 5, 20)
mvrnorm(n = 10, mu = mymu, Sigma = Sigma)
             [,1]       [,2]        [,3]      [,4]
[1,]   5.4362782  -2.516899  -5.2785903  82.76452
[2,]  18.6922374   8.807844  24.2502457  53.01847
[3,]  14.9030693  -4.908145  31.2041249 162.72298
[4,] -20.4555036 -17.764004  24.5528482 -73.22643
[5,] -11.1775987 -30.606475   0.3725076 -98.00372
[6,]  -0.8261201  -9.353697   2.8778364 -37.06540
[7,]  10.2874879   5.379217  23.9626715  41.74143
[8,]   7.4450918  20.293924  12.1095751 165.67970
[9,]   4.3242368 -23.882448 -15.6870554 -73.89816
[10,]  -7.9442281  46.442709  16.0627272 119.29198

In that resulting matrix, each row is 1 draw from an MVN process in which the mean vector is mymu, $$(0, 0, 5, 20)$$ and the covariance matrix is given by Sigma`

## Afterthought: Calculating regression coefficients (or, Does R calculate $$(X^TX)^{-1} X^Ty$$? And Why Not?)

The stats book says the slope estimates are calculated as $\hat{\beta} = (X^{T}X)^{-1} X^Ty$

Similarly, textbooks usually discuss the problem that $$X^{T}X$$ might be “difficult (or impossible) to invert” if $$X^{T}X$$ is nearly (or exactly) singular. That concern is usually manifested by the observation that the calculation of the inverse pre-supposes the calculation of the determinant, $$det(X^TX)$$ (same as \$|X^TX|) and the formation of a quotient

$\frac{1}{det(X^TX)}$

About 10 years ago, it felt like I got hit in the face by a cold, dead fish when I found out that, since the 1970s, calculations have not been done in that way.

In R, run “lm” (with no parentheses) and “lm.fit” (again, no parentheses). Look through there for calculations the textbooks lead you to expece. All statistics students are surprised that they never see

1. determinants being calculated
2. $$X^T X$$ being formed
3. $$X^T X$$ being inverted

Similarly, going by the textbook, the “variance-covariance” matrix of the $$\hat{\beta}$$ coefficients ought to be

$\widehat{\sigma_\varepsilon}^2 (X^T X)^{-1}$

Under the hood, calculations are never done that way in R. (Neither are they done in most statistics programs in that way).

Rather, R defaults to use the QR decomposition of the X matrix, which means X is numerically reduced into 2 pieces,

$X = Q R$

From those decomposed pieces, we can calculate estimates $$\hat{\beta}$$ and their variance without ever forming $$X^T X$$ explicitly and the product is never inverted.

Why? Numerical accuracy. The tendency toward “round off error” in a matrix is summarized by a coefficient $$\kappa$$. If we form the prodcut $$X^T X$$, the tendency toward round off error is $$\kappa^2$$. If we were then to “solve” that by calculating an inverse matrix, we introduce yet another layer of numerical rounding error.

I learned about this by reading Simon Wood’s Generalized Additive Models and a famous numerical linear algebra textbook by Golub and Van Loan named Matrix Computations.

I started some notes and worked out the regression estimator, if you want details please see http://pj.freefaculty.org/guides/stat/Math/Matrix-Decompositions/matrix-decompositions.pdf