---
title: "R variable types: matrices"
subtitle: "nothing"
author:
- name: Paul E. Johnson
affiliation: CRMDA
email: pauljohn@ku.edu
advertise: >
**R User Guide**. Please visit
[http://pj.freefaculty.org/guides/Rcourse](http://pj.freefaculty.org/guides/Rcourse) for updates.
keywords: R, vectors, data structures
abstract: >
All about matrices in R!
date: "`r format(Sys.time(), '%b. %e %Y')`"
output:
crmda::crmda_html_document:
toc: true
toc_depth: 3
highlight: haddock
css: theme/kutils.css
template: theme/guide-boilerplate.html
logoleft: theme/jayhawk.png
logoright: theme/CRMDAlogo-vert.png
---
```{r setup, include=FALSE}
##This Invisible Chunk is required in all CRMDA documents
outdir <- paste0("figures-", "template")
if (!file.exists(outdir)) dir.create(outdir, recursive = TRUE)
knitr::opts_chunk$set(echo=TRUE, comment=NA, fig.path=paste0(outdir, "/p-"))
options(width = 70)
```
## Matrix objects in R
#### A Matrix is a two dimensional array{.bs-callout .bs-callout-blue}
Here is an example with 3 rows and 3 columns. I've assigned
the values 1-9 to the cell for future reference.
```
col_1 col_2 col_3
row_1 1 4 7
row_2 2 5 8
row_3 3 6 9
```
### Customary terminology
In math books, the rows are indexed by $i = {1, 2, \ldots, m}$ and
the columns are indexed by $j = {1, 2, \ldots, n}$. That is to say,
the matrix is $m \times n$, meaning $m$ rows and $n$ columns.
In Data Science examples, that might be confusing. If we have, say 3
predictors in a matrix, we are used to referring to the rows as
observations for individual cases. I've numbered the rows $1$ through
$n$. For the sample size, from a very young age, I was taught to use
$N$ (and if I'm not allowed capital letters, $n$).
```
pred_1 pred_2 pred_3
1 1 n+1 2n+1
2 2 n+2 2n+2
3 3 n+3 2n+3
...
n n 2n 3n
```
Hence, although in math the author would want to refer to $m$ rows and
$n$ columns, we often avoid that in data science and refer to $n$ rows
and $p$ columns. This is just a matter of habit.
### More terminology
I don't have the habit of using special letters and names for
matrices. I know some people who insist that all matrices must
be named with CAPITAL LETTERS. I know one R user who names every
matrix object with suffix ".mat" just to remember what she has.
I think naming everything as "*.mat" makes code somewhat verbose.
I do think that using capital letters for matrices is nice and I
wish I remembered to do that more regularly.
## R methods for creating matrices
#### Re-dimension an array {.bs-callout .bs-callout-green}
```{r}
x <- seq(1L, 27L, 1L)
dim(x) <- c(9, 3)
x
```
Hm. Why did that happen?
Now that x is a matrix, we can add names to the dimensions, either as
```{r}
rownames(x) <- letters[1:9]
colnames(x) <- paste0("col", 1:3)
x
```
or the equivalent
```{r}
x <- seq(1L, 27L, 1L)
dim(x) <- c(9, 3)
dimnames(x) <- list(letters[1:9], paste0("col", 1:3))
x
```
#### Use cbind() or rbind() to fit vectors together{.bs-callout .bs-callout-red}
cbind and rbind are discussed the data_structures-vectors note.
#### The ```matrix()``` function{.bs-callout .bs-callout-gray}
The ```matrix()``` function will fill the columns of a matrix from
top to bottom, left to right. In otherwords, it is exactly the same as
re-dimensioning the vector as demonstrated above.
```{r}
y <- seq(1L, 27L, 1L)
matrix(y, ncol = 3)
y
```
There are other benefits to using the matrix function, however.
1. Wrap together the assignment of names with a single command
```{r}
y <- seq(1L, 27L, 1L)
matrix(y, ncol = 3, dimnames = list(letters[1:9], paste0("col", 1:3)))
y
```
2. Can alter the order in which values are inserted into the matrix with
byrow = TRUE.
```{r}
y <- seq(1L, 27L, 1L)
matrix(y, nrow = 3, byrow = TRUE,
dimnames = list(paste0("row", 1:3), letters[1:9]))
y
```
Note that y is the transpose (t) of x, except that the names were not
corrected similarly:
```{r}
t(x)
```
## Accessing elements
1) Take out the element in the i'th row, j'th column by itself
```{r}
x[2 , 3]
```
2) Take out the i'th row(leave column index blank:
keep all columns)
```{r}
x[2, ]
```
3) Take out the j'th column (keep all rows)
```{r}
x[ , 3]
```
4) Create new row that is columns 2 and 3 from row 4.
```{r}
x[4, 2:3]
```
5) If dimnames exist, then we can also choose by name, as in
```{r}
x["b", "col3"]
x["b", ]
x[ , "col3"]
```
**Comment** I believe it is much safer to access matrix elements by
name, since we are less likely to make a mistake typing "col3" than
we are if we use integer indexes to extract elements.
## Common functions to use with matrices
### ```+``` and ```-``` work as expected.
```{r}
y <- matrix(2, nrow = 9, ncol = 3)
x + y
x -y
```
### ```*``` and ```/``` conduct "term-wise" calculations
```{r}
y * x
y / x
```
In math books, ```*``` would be called a "Hadamard" product. In
basic statistics, this kind of product would not come up very
often. It is symbolized as $x \circ y$. So far as I know, there is
no math name for the ```/``` used here. Most linear algebra books
will say "there is no division of matrices". But there is in R!
### Transpose
```{r}
t(x)
```
### Matrix multiplication: ```%*%```
```{r eval=FALSE}
X %*% Y
```
### Diagonal access
For square matrices, ```diag(x)``` will take the main diagonal
and create a new vector with it.
```{r}
set.seed(234234)
Z <- matrix(rnorm(9), ncol = 3)
Z
Zdiag <- diag(Z)
Zdiag
```
Surprisingly, if x is a vector, then ```diag()``` will create a matrix
with that vector on the main diagonal.
```{r}
diag(Zdiag)
```
I say surprisingly because I've had very inconvenient errors caused
by the flexibility of these functions.
### Access the triangles
The ```lower.tri()``` function returns a TRUE/FALSE matrix
```{r}
lower.tri(Z)
lower.tri(Z, diag = TRUE)
```
and we can use the returned matrix to select a sub-matrix as follows.
```{r}
Z[lower.tri(Z, diag = TRUE)]
```
Note that the return from this is a vector, which is called a
"vech". It is a vector that could re-produce the matrix, if we
filled in a matrix with the vector.
### crossprod and tcrossprod
In many statistical procedures, it is necessary to calculate a product
formed as the transpose of one matrix times another matrix. Often, this
is
$$
X^T X
$$
Although there is a good argument (with good reasons) why it is
wrong to make that calculation most of the time (numerical rounding
error), it is frequently done and sometimes unavoidable.
Note that the result from $X^T X$ is symmetric:
```{r}
t(x) %*% x
```
Creating the product that way wastes a lot of CPU effort.
1. It is
never actually necessary to create ```t(x)``` (the computer
can find the pieces it needs without allocating memory and
copying ```x``` in that way
2. The result is symmetric, which means we only need to calculate
the upper or lower triangle and then copy up.
The ```crossprod()``` functions are included
within R because the manual calculation of ```t(x) %*% x``` is the
single-most-inefficient thing that most R novices do.
and ```tcrossprod```
The syntax to calculate $X^T X$ is
```{r}
crossprod(x)
```
When product is $X^T$ times $X$, we only need the argument ```X```,
but if there is a transpose of $X$ times $Z$, for example, the syntax
would be
```{r eval=FALSE}
crossprod(X, Z)
```
Similarly, if one needed to calculate $X X^T$ or $X Y^T$ the
```tcrossprod``` function should be used.
```{r eval=FALSE}
tcrossprod(X)
## or
tcrossprod(X, T)
```
### Matrix decompositions
R functions send the "real calculations" to general purpose matrix
algebra libraries that are written in C and Fortran. These libraries
are, such as "LAPACK", are part of an internationally promulgated
standard for numerical accuracy.
R has some functions that were written in the older style with
less accurate matrix algebra, such as ```princomp()```, while the
documentation for that function mentiones that it is not
using the currently recommended method: ".
> The calculation is done using ‘eigen’ on the correlation or
> covariance matrix, as determined by ‘cor’. This is done for
> compatibility with the S-PLUS result. A preferred method of
> calculation is to use ‘svd’ on ‘x’, as is done in ‘prcomp’.
## The "Drop Gotcha"
R has a general policy of "demoting" objects to the simplest storage
format possible.
#### Demotion does not matter if $\ldots$ {.bs-callout .bs-callout-green}
the extracted material has more than 2 rows and more than 2 columns.
Observe that ```x[ , 2:3]``` is still a matrix, for example.
```{r}
x2 <- x[ , 2:3]
is.matrix(x2)
```
#### Demotion does matter if $\ldots${.bs-callout .bs-callout-red}
the extracted material has only 1 row or 1 column. In that case, R's
demotion policy will convert the result into a vector, rather than a
matrix.
```{r}
x2 <- x[ , 2]
is.matrix(x2)
is.vector(x2)
```
#### I call that the "drop gotcha".{.bs-callout .bs-callout-blue}
In all of the R programming I've done, the worst, most frustrating
problem is that R demotes a one column (or row) matrix to a vector,
and some calculations behave very differently. It is not always a
problem, but in that 5\% of cases, it is very frustrating. Please see
"R’s drop “gotcha” and the diag “curse”
.
This usually happens to me if I'm constructing a vector of column
names. Say I'm aiming for a result like ```indx <-
c("col1", "col2", "col3")``. Then I select vectors as
```
x[ , indx]
```
As long as indx has 2 or more elements, no problem. But if indx has
just one element, then the result becomes a vector, and then all hell
can break loose (see the blog post).
** drop = FALSE **
The way to avoid this is somewhat awkward. Rather than writing simply
```x[ , indx]```, insert a 3rd argument **drop = FALSE**, as in
```x[ , indx, drop = FALSE]```.
```{r}
x3 <- x[ , 2, drop = FALSE]
x3
dim(x3)
is.matrix(x3)
```
#### Example 1. Take a single row out of x{.bs-callout .bs-callout-gray}
```{r}
xb <- x["b", ]
```
Note that although x had dimensions (9,3), the new thing
xb has no dimensions.
```{r}
dim(x)
dim(xb)
```
xb is an R vector now.
Mathematically, a $3 \times 1$ column times a $1 \times 3$ row ought
to be a $3 \times 3$ matrix, but it is not what we get if we do the
obvious thing:
```{r}
c(1, 2, 3) %*% xb
```
We are getting back the equivalent of the inner product
```{r}
xb %*% c(1, 2, 3)
```
If you did want the product, and you want to make sure that
$xb$ is treated as a row vector, you'd need to explicity ask
for an "outer" product, as in
```{r}
outer(c(1,2,3), xb)
## same as
c(1,2,3) %o% xb
```
We could also "transpose" xb back from a column into a row, I suppose.
```{r}
c(1,2,3) %*% t(xb)
```
#### Example 2. And then there's this frustrating thing in diag {.bs-callout .bs-callout-orange}
Compare the following
```{r}
x23 <- x[ , c(2, 3)]
x23
diag(x23)
```
```{r}
x3 <- x[ , 3]
x3
diag(x3)
```
## Packages and matrix convenience
### the rockchalk package
In the rockchalk package, I have some "convenience" functions
that make it easier to create matrices that represent correlation and
covariance. These are called ```lazyCor()``` and ```lazyCov()```.
#### Examples
1. Convert a vech to fill in the lower and upper triangles
```{r}
library(rockchalk)
rho <- lazyCor(c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6))
rho
```
(A "**vech**" is the "strictly lower triangle" vector, while "**vec**" would
include the diagonal elements as well).
2. Create a covariance matrix.
Covariance is difficult to conceptualize and create because it has
no human-understandable scale. We generally do better if we look
at standard deviation and correlation matrices, from which covariance
can be created. The lazyCov function will do this. I create a
standard deviation vector for 4 variables, then use the previously
constructed correlation matrix ```rho```.
```{r}
mysd <- c(10, 20, 30, 100)
Sigma <- lazyCov(Rho = rho, Sd = mysd)
Sigma
```
3. Draw observations from a multivariate-normal distribution.
```{r}
mymu <- c(0, 0, 5, 20)
mvrnorm(n = 10, mu = mymu, Sigma = Sigma)
```
In that resulting matrix, each row is 1 draw from an MVN process in
which the mean vector is ```mymu```, $(0, 0, 5, 20)$ and the covariance matrix is
given by ```Sigma```
## Afterthought: Calculating regression coefficients (or, Does R calculate $(X^TX)^{-1} X^Ty$? And Why Not?)
The stats book says the slope estimates are calculated as
$$
\hat{\beta} = (X^{T}X)^{-1} X^Ty
$$
Similarly, textbooks usually discuss the problem that $X^{T}X$ might
be "difficult (or impossible) to invert" if $X^{T}X$ is nearly
(or exactly) singular. That concern is usually manifested by the
observation that the calculation of the inverse pre-supposes the
calculation of the determinant, $det(X^TX)$ (same as $|X^TX|) and the
formation of a quotient
$$
\frac{1}{det(X^TX)}
$$
About 10 years ago, it felt like I got hit in the face by a cold, dead
fish when I found out that, since the 1970s, calculations have not
been done in that way.
In R, run "lm" (with no parentheses) and "lm.fit" (again, no
parentheses). Look through there for calculations the textbooks lead
you to expece. All statistics students are surprised that they never
see
1. determinants being calculated
2. $X^T X$ being formed
3. $X^T X$ being inverted
Similarly, going by the textbook, the "variance-covariance" matrix of
the $\hat{\beta}$ coefficients ought to be
$$ \widehat{\sigma_\varepsilon}^2
(X^T X)^{-1}
$$
Under the hood, calculations are never done that way in R. (Neither
are they done in most statistics programs in that way).
Rather, R defaults to use the QR decomposition of the X matrix, which
means X is numerically reduced into 2 pieces,
$$
X = Q R
$$
From those decomposed pieces, we can calculate estimates $\hat{\beta}$
and their variance without ever forming $X^T X$ explicitly and the
product is never inverted.
Why? Numerical accuracy. The tendency toward "round off error" in a
matrix is summarized by a coefficient $\kappa$. If we form the prodcut
$X^T X$, the tendency toward round off error is $\kappa^2$. If we were
then to "solve" that by calculating an inverse matrix, we introduce
yet another layer of numerical rounding error.
I learned about this by reading Simon Wood's *Generalized Additive
Models* and a famous numerical linear algebra textbook by Golub and
Van Loan named *Matrix Computations*.
I started some notes and worked out the regression estimator, if you
want details please see
# Session Info
```{r sessionInfo, echo = FALSE}
sessionInfo()
```
Available under
[Created Commons license 3.0 ](http://creativecommons.org/licenses/by/3.0/)