Paul E. Johnson, CRMDA <pauljohn@ku.edu>

**R User Guide**. Please visit http://pj.freefaculty.org/guides/Rcourse for updates.

Keywords: R, vectors, data structures

Sep. 14 2017

Abstract

All about matrices in R!

Here is an example with 3 rows and 3 columns. I’ve assigned the values 1-9 to the cell for future reference.

```
col_1 col_2 col_3
row_1 1 4 7
row_2 2 5 8
row_3 3 6 9
```

In math books, the rows are indexed by \(i = {1, 2, \ldots, m}\) and the columns are indexed by \(j = {1, 2, \ldots, n}\). That is to say, the matrix is \(m \times n\), meaning \(m\) rows and \(n\) columns.

In Data Science examples, that might be confusing. If we have, say 3 predictors in a matrix, we are used to referring to the rows as observations for individual cases. I’ve numbered the rows \(1\) through \(n\). For the sample size, from a very young age, I was taught to use \(N\) (and if I’m not allowed capital letters, \(n\)).

```
pred_1 pred_2 pred_3
1 1 n+1 2n+1
2 2 n+2 2n+2
3 3 n+3 2n+3
...
n n 2n 3n
```

Hence, although in math the author would want to refer to \(m\) rows and \(n\) columns, we often avoid that in data science and refer to \(n\) rows and \(p\) columns. This is just a matter of habit.

I don’t have the habit of using special letters and names for matrices. I know some people who insist that all matrices must be named with CAPITAL LETTERS. I know one R user who names every matrix object with suffix “.mat” just to remember what she has.

I think naming everything as “*.mat" makes code somewhat verbose.

I do think that using capital letters for matrices is nice and I wish I remembered to do that more regularly.

```
x <- seq(1L, 27L, 1L)
dim(x) <- c(9, 3)
x
```

```
[,1] [,2] [,3]
[1,] 1 10 19
[2,] 2 11 20
[3,] 3 12 21
[4,] 4 13 22
[5,] 5 14 23
[6,] 6 15 24
[7,] 7 16 25
[8,] 8 17 26
[9,] 9 18 27
```

Hm. Why did that happen?

Now that x is a matrix, we can add names to the dimensions, either as

```
rownames(x) <- letters[1:9]
colnames(x) <- paste0("col", 1:3)
x
```

```
col1 col2 col3
a 1 10 19
b 2 11 20
c 3 12 21
d 4 13 22
e 5 14 23
f 6 15 24
g 7 16 25
h 8 17 26
i 9 18 27
```

or the equivalent

```
x <- seq(1L, 27L, 1L)
dim(x) <- c(9, 3)
dimnames(x) <- list(letters[1:9], paste0("col", 1:3))
x
```

```
col1 col2 col3
a 1 10 19
b 2 11 20
c 3 12 21
d 4 13 22
e 5 14 23
f 6 15 24
g 7 16 25
h 8 17 26
i 9 18 27
```

cbind and rbind are discussed the data_structures-vectors note.

`matrix()`

functionThe `matrix()`

function will fill the columns of a matrix from top to bottom, left to right. In otherwords, it is exactly the same as re-dimensioning the vector as demonstrated above.

```
y <- seq(1L, 27L, 1L)
matrix(y, ncol = 3)
```

```
[,1] [,2] [,3]
[1,] 1 10 19
[2,] 2 11 20
[3,] 3 12 21
[4,] 4 13 22
[5,] 5 14 23
[6,] 6 15 24
[7,] 7 16 25
[8,] 8 17 26
[9,] 9 18 27
```

`y`

```
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27
```

There are other benefits to using the matrix function, however.

- Wrap together the assignment of names with a single command

```
y <- seq(1L, 27L, 1L)
matrix(y, ncol = 3, dimnames = list(letters[1:9], paste0("col", 1:3)))
```

```
col1 col2 col3
a 1 10 19
b 2 11 20
c 3 12 21
d 4 13 22
e 5 14 23
f 6 15 24
g 7 16 25
h 8 17 26
i 9 18 27
```

`y`

```
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27
```

- Can alter the order in which values are inserted into the matrix with byrow = TRUE.

```
y <- seq(1L, 27L, 1L)
matrix(y, nrow = 3, byrow = TRUE,
dimnames = list(paste0("row", 1:3), letters[1:9]))
```

```
a b c d e f g h i
row1 1 2 3 4 5 6 7 8 9
row2 10 11 12 13 14 15 16 17 18
row3 19 20 21 22 23 24 25 26 27
```

`y`

```
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27
```

Note that y is the transpose (t) of x, except that the names were not corrected similarly:

`t(x)`

```
a b c d e f g h i
col1 1 2 3 4 5 6 7 8 9
col2 10 11 12 13 14 15 16 17 18
col3 19 20 21 22 23 24 25 26 27
```

- Take out the element in the i’th row, j’th column by itself

`x[2 , 3]`

`[1] 20`

- Take out the i’th row(leave column index blank: keep all columns)

`x[2, ]`

```
col1 col2 col3
2 11 20
```

- Take out the j’th column (keep all rows)

`x[ , 3]`

```
a b c d e f g h i
19 20 21 22 23 24 25 26 27
```

- Create new row that is columns 2 and 3 from row 4.

`x[4, 2:3]`

```
col2 col3
13 22
```

- If dimnames exist, then we can also choose by name, as in

`x["b", "col3"]`

`[1] 20`

`x["b", ]`

```
col1 col2 col3
2 11 20
```

`x[ , "col3"]`

```
a b c d e f g h i
19 20 21 22 23 24 25 26 27
```

**Comment** I believe it is much safer to access matrix elements by name, since we are less likely to make a mistake typing “col3” than we are if we use integer indexes to extract elements.

`+`

and `-`

work as expected.```
y <- matrix(2, nrow = 9, ncol = 3)
x + y
```

```
col1 col2 col3
a 3 12 21
b 4 13 22
c 5 14 23
d 6 15 24
e 7 16 25
f 8 17 26
g 9 18 27
h 10 19 28
i 11 20 29
```

`x -y`

```
col1 col2 col3
a -1 8 17
b 0 9 18
c 1 10 19
d 2 11 20
e 3 12 21
f 4 13 22
g 5 14 23
h 6 15 24
i 7 16 25
```

`*`

and `/`

conduct “term-wise” calculations`y * x`

```
col1 col2 col3
a 2 20 38
b 4 22 40
c 6 24 42
d 8 26 44
e 10 28 46
f 12 30 48
g 14 32 50
h 16 34 52
i 18 36 54
```

`y / x`

```
col1 col2 col3
a 2.0000000 0.2000000 0.10526316
b 1.0000000 0.1818182 0.10000000
c 0.6666667 0.1666667 0.09523810
d 0.5000000 0.1538462 0.09090909
e 0.4000000 0.1428571 0.08695652
f 0.3333333 0.1333333 0.08333333
g 0.2857143 0.1250000 0.08000000
h 0.2500000 0.1176471 0.07692308
i 0.2222222 0.1111111 0.07407407
```

In math books, `*`

would be called a “Hadamard” product. In basic statistics, this kind of product would not come up very often. It is symbolized as \(x \circ y\). So far as I know, there is no math name for the `/`

used here. Most linear algebra books will say “there is no division of matrices”. But there is in R!

`t(x)`

```
a b c d e f g h i
col1 1 2 3 4 5 6 7 8 9
col2 10 11 12 13 14 15 16 17 18
col3 19 20 21 22 23 24 25 26 27
```

`%*%`

`X %*% Y`

For square matrices, `diag(x)`

will take the main diagonal and create a new vector with it.

```
set.seed(234234)
Z <- matrix(rnorm(9), ncol = 3)
Z
```

```
[,1] [,2] [,3]
[1,] -0.1308295 -0.4879708 0.1348567
[2,] -0.6777994 -0.1845969 0.1350134
[3,] 0.1435791 0.5976032 0.8004994
```

```
Zdiag <- diag(Z)
Zdiag
```

`[1] -0.1308295 -0.1845969 0.8004994`

Surprisingly, if x is a vector, then `diag()`

will create a matrix with that vector on the main diagonal.

`diag(Zdiag)`

```
[,1] [,2] [,3]
[1,] -0.1308295 0.0000000 0.0000000
[2,] 0.0000000 -0.1845969 0.0000000
[3,] 0.0000000 0.0000000 0.8004994
```

I say surprisingly because I’ve had very inconvenient errors caused by the flexibility of these functions.

The `lower.tri()`

function returns a TRUE/FALSE matrix

`lower.tri(Z)`

```
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE TRUE FALSE
```

`lower.tri(Z, diag = TRUE)`

```
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] TRUE TRUE FALSE
[3,] TRUE TRUE TRUE
```

and we can use the returned matrix to select a sub-matrix as follows.

`Z[lower.tri(Z, diag = TRUE)]`

`[1] -0.1308295 -0.6777994 0.1435791 -0.1845969 0.5976032 0.8004994`

Note that the return from this is a vector, which is called a “vech”. It is a vector that could re-produce the matrix, if we filled in a matrix with the vector.

In many statistical procedures, it is necessary to calculate a product formed as the transpose of one matrix times another matrix. Often, this is

\[ X^T X \]

Although there is a good argument (with good reasons) why it is wrong to make that calculation most of the time (numerical rounding error), it is frequently done and sometimes unavoidable.

Note that the result from \(X^T X\) is symmetric:

`t(x) %*% x`

```
col1 col2 col3
col1 285 690 1095
col2 690 1824 2958
col3 1095 2958 4821
```

Creating the product that way wastes a lot of CPU effort.

It is never actually necessary to create

`t(x)`

(the computer can find the pieces it needs without allocating memory and copying`x`

in that wayThe result is symmetric, which means we only need to calculate the upper or lower triangle and then copy up.

The `crossprod()`

functions are included within R because the manual calculation of `t(x) %*% x`

is the single-most-inefficient thing that most R novices do.

and `tcrossprod`

The syntax to calculate \(X^T X\) is

`crossprod(x)`

```
col1 col2 col3
col1 285 690 1095
col2 690 1824 2958
col3 1095 2958 4821
```

When product is \(X^T\) times \(X\), we only need the argument `X`

, but if there is a transpose of \(X\) times \(Z\), for example, the syntax would be

`crossprod(X, Z)`

Similarly, if one needed to calculate \(X X^T\) or \(X Y^T\) the `tcrossprod`

function should be used.

```
tcrossprod(X)
## or
tcrossprod(X, T)
```

R functions send the “real calculations” to general purpose matrix algebra libraries that are written in C and Fortran. These libraries are, such as “LAPACK”, are part of an internationally promulgated standard for numerical accuracy.

R has some functions that were written in the older style with less accurate matrix algebra, such as `princomp()`

, while the documentation for that function mentiones that it is not using the currently recommended method: “.

The calculation is done using ‘eigen’ on the correlation or covariance matrix, as determined by ‘cor’. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use ‘svd’ on ‘x’, as is done in ‘prcomp’.

R has a general policy of “demoting” objects to the simplest storage format possible.

the extracted material has more than 2 rows and more than 2 columns. Observe that `x[ , 2:3]`

is still a matrix, for example.

```
x2 <- x[ , 2:3]
is.matrix(x2)
```

`[1] TRUE`

the extracted material has only 1 row or 1 column. In that case, R’s demotion policy will convert the result into a vector, rather than a matrix.

```
x2 <- x[ , 2]
is.matrix(x2)
```

`[1] FALSE`

`is.vector(x2)`

`[1] TRUE`

In all of the R programming I’ve done, the worst, most frustrating problem is that R demotes a one column (or row) matrix to a vector, and some calculations behave very differently. It is not always a problem, but in that 5% of cases, it is very frustrating. Please see “R’s drop “gotcha” and the diag “curse” http://pj.freefaculty.org/blog/?p=274.

This usually happens to me if I’m constructing a vector of column names. Say I’m aiming for a result like ``indx <- c("col1", "col2", "col3")`

. Then I select vectors as

`x[ , indx]`

As long as indx has 2 or more elements, no problem. But if indx has just one element, then the result becomes a vector, and then all hell can break loose (see the blog post).

** drop = FALSE **

The way to avoid this is somewhat awkward. Rather than writing simply `x[ , indx]`

, insert a 3rd argument **drop = FALSE**, as in `x[ , indx, drop = FALSE]`

.

```
x3 <- x[ , 2, drop = FALSE]
x3
```

```
col2
a 10
b 11
c 12
d 13
e 14
f 15
g 16
h 17
i 18
```

`dim(x3)`

`[1] 9 1`

`is.matrix(x3)`

`[1] TRUE`

`xb <- x["b", ]`

Note that although x had dimensions (9,3), the new thing xb has no dimensions.

`dim(x)`

`[1] 9 3`

`dim(xb)`

`NULL`

xb is an R vector now.

Mathematically, a \(3 \times 1\) column times a \(1 \times 3\) row ought to be a \(3 \times 3\) matrix, but it is not what we get if we do the obvious thing:

`c(1, 2, 3) %*% xb`

```
[,1]
[1,] 84
```

We are getting back the equivalent of the inner product

`xb %*% c(1, 2, 3)`

```
[,1]
[1,] 84
```

If you did want the product, and you want to make sure that \(xb\) is treated as a row vector, you’d need to explicity ask for an “outer” product, as in

`outer(c(1,2,3), xb)`

```
col1 col2 col3
[1,] 2 11 20
[2,] 4 22 40
[3,] 6 33 60
```

```
## same as
c(1,2,3) %o% xb
```

```
col1 col2 col3
[1,] 2 11 20
[2,] 4 22 40
[3,] 6 33 60
```

We could also “transpose” xb back from a column into a row, I suppose.

`c(1,2,3) %*% t(xb)`

```
col1 col2 col3
[1,] 2 11 20
[2,] 4 22 40
[3,] 6 33 60
```

Compare the following

```
x23 <- x[ , c(2, 3)]
x23
```

```
col2 col3
a 10 19
b 11 20
c 12 21
d 13 22
e 14 23
f 15 24
g 16 25
h 17 26
i 18 27
```

`diag(x23)`

`[1] 10 20`

```
x3 <- x[ , 3]
x3
```

```
a b c d e f g h i
19 20 21 22 23 24 25 26 27
```

`diag(x3)`

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 19 0 0 0 0 0 0 0 0
[2,] 0 20 0 0 0 0 0 0 0
[3,] 0 0 21 0 0 0 0 0 0
[4,] 0 0 0 22 0 0 0 0 0
[5,] 0 0 0 0 23 0 0 0 0
[6,] 0 0 0 0 0 24 0 0 0
[7,] 0 0 0 0 0 0 25 0 0
[8,] 0 0 0 0 0 0 0 26 0
[9,] 0 0 0 0 0 0 0 0 27
```

In the rockchalk package, I have some “convenience” functions that make it easier to create matrices that represent correlation and covariance. These are called `lazyCor()`

and `lazyCov()`

.

- Convert a vech to fill in the lower and upper triangles

```
library(rockchalk)
rho <- lazyCor(c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6))
rho
```

```
[,1] [,2] [,3] [,4]
[1,] 1.0 0.1 0.2 0.3
[2,] 0.1 1.0 0.4 0.5
[3,] 0.2 0.4 1.0 0.6
[4,] 0.3 0.5 0.6 1.0
```

(A “**vech**” is the “strictly lower triangle” vector, while “**vec**” would include the diagonal elements as well).

- Create a covariance matrix.

Covariance is difficult to conceptualize and create because it has no human-understandable scale. We generally do better if we look at standard deviation and correlation matrices, from which covariance can be created. The lazyCov function will do this. I create a standard deviation vector for 4 variables, then use the previously constructed correlation matrix `rho`

.

```
mysd <- c(10, 20, 30, 100)
Sigma <- lazyCov(Rho = rho, Sd = mysd)
Sigma
```

```
[,1] [,2] [,3] [,4]
[1,] 100 20 60 300
[2,] 20 400 240 1000
[3,] 60 240 900 1800
[4,] 300 1000 1800 10000
```

- Draw observations from a multivariate-normal distribution.

```
mymu <- c(0, 0, 5, 20)
mvrnorm(n = 10, mu = mymu, Sigma = Sigma)
```

```
[,1] [,2] [,3] [,4]
[1,] 5.4362782 -2.516899 -5.2785903 82.76452
[2,] 18.6922374 8.807844 24.2502457 53.01847
[3,] 14.9030693 -4.908145 31.2041249 162.72298
[4,] -20.4555036 -17.764004 24.5528482 -73.22643
[5,] -11.1775987 -30.606475 0.3725076 -98.00372
[6,] -0.8261201 -9.353697 2.8778364 -37.06540
[7,] 10.2874879 5.379217 23.9626715 41.74143
[8,] 7.4450918 20.293924 12.1095751 165.67970
[9,] 4.3242368 -23.882448 -15.6870554 -73.89816
[10,] -7.9442281 46.442709 16.0627272 119.29198
```

In that resulting matrix, each row is 1 draw from an MVN process in which the mean vector is `mymu`

, \((0, 0, 5, 20)\) and the covariance matrix is given by `Sigma`

The stats book says the slope estimates are calculated as \[ \hat{\beta} = (X^{T}X)^{-1} X^Ty \]

Similarly, textbooks usually discuss the problem that \(X^{T}X\) might be “difficult (or impossible) to invert” if \(X^{T}X\) is nearly (or exactly) singular. That concern is usually manifested by the observation that the calculation of the inverse pre-supposes the calculation of the determinant, \(det(X^TX)\) (same as $|X^TX|) and the formation of a quotient

\[ \frac{1}{det(X^TX)} \]

About 10 years ago, it felt like I got hit in the face by a cold, dead fish when I found out that, since the 1970s, calculations have not been done in that way.

In R, run “lm” (with no parentheses) and “lm.fit” (again, no parentheses). Look through there for calculations the textbooks lead you to expece. All statistics students are surprised that they never see

- determinants being calculated
- \(X^T X\) being formed
- \(X^T X\) being inverted

Similarly, going by the textbook, the “variance-covariance” matrix of the \(\hat{\beta}\) coefficients ought to be

\[ \widehat{\sigma_\varepsilon}^2 (X^T X)^{-1} \]

Under the hood, calculations are never done that way in R. (Neither are they done in most statistics programs in that way).

Rather, R defaults to use the QR decomposition of the X matrix, which means X is numerically reduced into 2 pieces,

\[ X = Q R \]

From those decomposed pieces, we can calculate estimates \(\hat{\beta}\) and their variance without ever forming \(X^T X\) explicitly and the product is never inverted.

Why? Numerical accuracy. The tendency toward “round off error” in a matrix is summarized by a coefficient \(\kappa\). If we form the prodcut \(X^T X\), the tendency toward round off error is \(\kappa^2\). If we were then to “solve” that by calculating an inverse matrix, we introduce yet another layer of numerical rounding error.

I learned about this by reading Simon Wood’s *Generalized Additive Models* and a famous numerical linear algebra textbook by Golub and Van Loan named *Matrix Computations*.

I started some notes and worked out the regression estimator, if you want details please see http://pj.freefaculty.org/guides/stat/Math/Matrix-Decompositions/matrix-decompositions.pdf

```
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] rockchalk_1.8.108 crmda_0.43
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 knitr_1.16 magrittr_1.5
[4] kutils_1.19 splines_3.4.1 MASS_7.3-47
[7] xtable_1.8-2 lattice_0.20-35 minqa_1.2.4
[10] stringr_1.2.0 car_2.1-4 plyr_1.8.4
[13] tools_3.4.1 parallel_3.4.1 nnet_7.3-12
[16] pbkrtest_0.4-7 grid_3.4.1 nlme_3.1-131
[19] mgcv_1.8-18 quantreg_5.33 MatrixModels_0.4-1
[22] htmltools_0.3.6 yaml_2.1.14 lme4_1.1-13
[25] rprojroot_1.2 digest_0.6.12 Matrix_1.2-10
[28] nloptr_1.0.4 evaluate_0.10 rmarkdown_1.6
[31] openxlsx_4.0.17 stringi_1.1.5 compiler_3.4.1
[34] methods_3.4.1 backports_1.1.0 SparseM_1.77
```

Available under Created Commons license 3.0