# List introduction

List is a diverse collection of R objects. Any R object can be inserted in a list.

A list is highly flexible. In versatility, a list is the complete opposite of an R vector or a matrix.

Recall a vector or matrix must be made up of homogeneous elements. If we add an element in a vector (or matrix), it can happen that the entire vector (or matrix) changes as a result. (Recall inserting a character into a numeric vector?)

Below several methods of inserting elements in lists and extracting them will be discussed.

# Named and Unnamed lists

First, we create a small example list for inspection. This is a named list because I insert a name with each element.

mylist <- list("x" = c(1, 2, 3), "y" = matrix(rnorm(16), 4), "z" = "Paul")
names(mylist)
[1] "x" "y" "z"
length(mylist)
[1] 3

This is an unnamed list:

nonamelist <- list(c(1, 2, 3),  matrix(rnorm(16), 4), "Paul")
length(nonamelist)
[1] 3
nonamelist
[[1]]
[1] 1 2 3

[[2]]
[,1]       [,2]       [,3]       [,4]
[1,]  0.60657960  0.8595099  0.8719513  0.4199838
[2,]  0.06135987 -0.9039587 -0.9960967 -0.9790216
[3,] -0.41282739 -0.1912054  1.3025675 -1.0063148
[4,]  0.80153090 -0.9225425 -0.5622973 -0.6288113

[[3]]
[1] "Paul"

You agree it has no names, right?

names(nonamelist)
NULL

The elements of a named list can be accessed either by their name or their index number, while an unnamed list allows access only by the index number.

One will find comments here and there in the literature to suggest that lists will be processed more quickly in R if they do not have named elements.

If you want to remove the names from an object, there are two ways.

unname(mylist)
[[1]]
[1] 1 2 3

[[2]]
[,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012

[[3]]
[1] "Paul"

or, equivalently,

names(mylist) <- NULL
mylist
[[1]]
[1] 1 2 3

[[2]]
[,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012

[[3]]
[1] "Paul"

But the gosh darned names are needed for the rest of the presentation, so

names(mylist) <- c("x", "y", "z")

# List element access: [[ or [ ?

#### The single [

A single-bracket is used to extract subsets from the list, and keep the result as a new list.

mylist2 <- mylist[c(1,3)]
mylist2
$x [1] 1 2 3$z
[1] "Paul"
class(mylist2)
[1] "list"
length(mylist2)
[1] 2

#### The double [[

The double-bracket [[ is used to copy an object from the list and the result is not a list anymore, it is the object’s type.

I’ll access that element by name first:

mymat1 <- mylist[["y"]]
mymat1
           [,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
class(mymat1)
[1] "matrix"

Then I access that by list position with an integer index:

mymat2 <- mylist[[2]]
mymat2
           [,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
class(mymat2)
[1] "matrix"
identical(mymat1, mymat2)
[1] TRUE

# Ways to allocate list storage

There are two ways to do this. The first is the common, easy way. The second is the faster, more structured way.

#### 1 The Easy way

1. Initiate an empty list that will grow as items are added to it.
mylist1 <- list()

#### 2 The More Disciplined & Better Way

1. Initialize a list of a given size (for example, 6).
mylist2 <- vector(mode = "list", length = 6)

## What’s the difference? Speed, possibly

The major difference between the two types arises when we want to put the lists to use. In the case of mylist1, we are allowed to add items one by one, either by name or position in the list:

x1 <- c(1, 2, 3)
x2 <- matrix(rnorm(9), ncol = 3)
mylist1[[1]] <- x1
mylist1[["x1"]] <- x1
mylist1[[3]] <- x1

Note that, as far as “mylist1” is concerned, the first item is [[1]], the second item can be found either as [[2]] or [[“x1”]], and the third item is [[3]]:

mylist1
[[1]]
[1] 1 2 3

$x1 [1] 1 2 3 [[3]] [1] 1 2 3 mylist1[["x1"]] [1] 1 2 3 mylist1[[2]] [1] 1 2 3 The list only had 3 elements, but if we insert a 6th element, then R creates NULL elements 4 through 5: mylist1[[6]] <- x2 mylist1 [[1]] [1] 1 2 3$x1
[1] 1 2 3

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
[,1]        [,2]       [,3]
[1,] -1.200697  1.37605055  1.1203431
[2,]  1.716917 -0.29286273  0.2108199
[3,] -1.341655  0.07065126 -0.3953875

Rememember that the absence of an element in a list is referred to by the symbol NULL, not NA (as for vectors and matrices).

We find the difference in mylist2 is that we are not allowed to insert named elements into the middle of the list in the same way. Observe that because the list was allocated with elements 1 through 6 as NULL, then inserting a named thing “x1” adds a 7th element in the list:

mylist2[[1]] <- x1
mylist2[["x1"]] <- x1
mylist2[[3]] <- x1
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

$x1 [1] 1 2 3 If we want to insert the matrix in the 6th element we can, of course: mylist2[[6]] <- x2 mylist2 [[1]] [1] 1 2 3 [[2]] NULL [[3]] [1] 1 2 3 [[4]] NULL [[5]] NULL [[6]] [,1] [,2] [,3] [1,] -1.200697 1.37605055 1.1203431 [2,] 1.716917 -0.29286273 0.2108199 [3,] -1.341655 0.07065126 -0.3953875$x1
[1] 1 2 3

If we decide we want the elements to be named, we can do so with the names function:

# only insert names for 6th and 7th items:
names(mylist2)[6:7] <- c("x1", "x2")
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

$x1 [,1] [,2] [,3] [1,] -1.200697 1.37605055 1.1203431 [2,] 1.716917 -0.29286273 0.2108199 [3,] -1.341655 0.07065126 -0.3953875$x2
[1] 1 2 3
names(mylist2)
[1] ""   ""   ""   ""   ""   "x1" "x2"

Conclusion: If you are going to generate a lot of objects for a list, it is best to allocate the whole list first and fill in the elements with [[index_number]] <- ....

If you want a more flexible list, in which you can insert things with names as you go, it is necessary to initiate the list with list() but insertion of items is slower.

## Is that really faster? Maybe!

Allocation of memory is slow, so one argument in favor of the second strategy is that we allocate storage in one step. This is more efficient.

I wondered if it really is more efficient. The right thing would be to formalize this as a microbenchmark experiment, but the system.time function gives a quick snapshot:

alist <- list()
system.time(
for(i in 1:10000){
alist[[i]] <- matrix(rnorm(9), ncol = 3)
})
   user  system elapsed
0.048   0.000   0.048 
alist2 <- vector("list", 10000)
system.time(
for(i in 1:10000){
alist2[[i]] <- matrix(rnorm(9), ncol = 3)
})
   user  system elapsed
0.048   0.000   0.048 

# Brief Simulation Example

There is a middle ground with the second style. We can create a list with 10 elements and then name them. If we do that, then we can insert things by name. Example, create a list with 10 named things for 10 models:

mylist3 <- vector(mode = "list", length = 10)
names(mylist3) <- paste0("mod", 1:10)
mylist3
$mod1 NULL$mod2
NULL

$mod3 NULL$mod4
NULL

$mod5 NULL$mod6
NULL

$mod7 NULL$mod8
NULL

$mod9 NULL$mod10
NULL

Now lets run a data-generator 10 times and fill those in:

set.seed(234234)
mdg <- function(N = 100, beta = c(0.1, 0.3, 0.1), stde = 7)
{
e <- rnorm(N, m = 0, sd = stde)
## oops, don't know parm for predictors
x1 <- rnorm(N, m = 40, sd = 10)
x2 <- rnorm(N, m = 20, sd = 40)
y <- beta[1] + beta[2] * x1 + beta[3] * x2 + e
invisible(data.frame(x1, x2, y))
}

for (i in 1:10){
amodel <- lm(y ~ x1 + x2, data = adf)
mylist3[[paste0("mod", i)]] <- summary(amodel)
}

It is pretty easy to verify that each element in this list is a summary object from the fitted regression.

mylist3[[7]]

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min       1Q   Median       3Q      Max
-20.1051  -5.7792  -0.0997   3.9366  17.7399

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.84084    2.95784  -0.284    0.777
x1           0.31784    0.07355   4.322 3.75e-05 ***
x2           0.11070    0.01814   6.103 2.13e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.424 on 97 degrees of freedom
Multiple R-squared:  0.3831,    Adjusted R-squared:  0.3704
F-statistic: 30.12 on 2 and 97 DF,  p-value: 6.666e-11
class(mylist3[[7]])
[1] "summary.lm"

# lapply: Iterate on lists

## 1. Basic lapply with one-argument function

A function, such as “class” or “print”, can be applied to each element in the list in this way.

lapply(mylist3, class)
$mod1 [1] "summary.lm"$mod2
[1] "summary.lm"

$mod3 [1] "summary.lm"$mod4
[1] "summary.lm"

$mod5 [1] "summary.lm"$mod6
[1] "summary.lm"

$mod7 [1] "summary.lm"$mod8
[1] "summary.lm"

$mod9 [1] "summary.lm"$mod10
[1] "summary.lm"

For practical purposes, that is the same as “looping” over the elements like this:

for(i in seq_along(mylist3)){
print(class(mylist3[[i]]))
}
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"

(The “print()” is needed because, without it, the for loop does not display the output from commands).

#### Watch out about using for loops.

There is social stigma! If you go to StackExchange or the “r-help” list with example code that uses a for loop, you will often be shouted at because for loops are slow in R.

While this is a slight exaggeration, there are cases where clever use of the lapply() iteration structure is faster. Generally, the reason is that R can look at the request and plan ahead for its calculations, while the for loop hides the long-run details from R. Chores like memory allocation cannot be managed so efficiently. Another fact is that “[” and “[[” are decidely slow operators. We are forcing R to talk back and forth from the R runtime, which is written in C, and the user workspace, which is slowed down by the fact that it interactive.

## 2. Additional arguments can be named

lapply(mylist3, print, digits = 10)

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-12.5207964322  -4.3545461801   0.1230453526   3.7729012944
Max
16.5242736595

Coefficients:
Estimate     Std. Error  t value   Pr(>|t|)
(Intercept) -2.54020229551  2.24498674920 -1.13150    0.26063
x1           0.34347579963  0.05184432045  6.62514 1.9447e-09 ***
x2           0.10611372768  0.01379967990  7.68958 1.2183e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.847931 on 97 degrees of freedom
Multiple R-squared:  0.5416601613,  Adjusted R-squared:  0.5322098554
F-statistic: 57.31667991 on 2 and 97 DF,  p-value: < 2.220446e-16

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-17.7285432215  -4.7209716896   0.0023519827   5.7563917722
Max
17.1113795685

Coefficients:
Estimate    Std. Error t value   Pr(>|t|)
(Intercept) 0.32389337806 3.04772533267 0.10627    0.91558
x1          0.31019989021 0.07297288819 4.25089 4.8996e-05 ***
x2          0.10328758440 0.02125322870 4.85985 4.5181e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.430645 on 97 degrees of freedom
Multiple R-squared:  0.2731372661,  Adjusted R-squared:  0.2581504056
F-statistic: 18.22511568 on 2 and 97 DF,  p-value: 1.907399024e-07

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-22.8096528641  -4.7922920227   0.0900718621   4.4053471276
Max
12.9871034518

Coefficients:
Estimate     Std. Error  t value   Pr(>|t|)
(Intercept) -5.76945875885  2.93551549170 -1.96540   0.052228 .
x1           0.48123950081  0.07103720942  6.77447 9.6731e-10 ***
x2           0.07685267720  0.01726573985  4.45117 2.2829e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.577699 on 97 degrees of freedom
Multiple R-squared:  0.399031682,   Adjusted R-squared:  0.3866405827
F-statistic: 32.20308958 on 2 and 97 DF,  p-value: 1.88062195e-11

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-21.1627423403  -5.0391173329  -0.5532557879   6.1804420259
Max
20.3814234919

Coefficients:
Estimate     Std. Error  t value   Pr(>|t|)
(Intercept) -1.02870232843  3.40006163247 -0.30255    0.76288
x1           0.33704875107  0.08098420772  4.16191 6.8327e-05 ***
x2           0.10473331942  0.01993670695  5.25329 8.8469e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.129571 on 97 degrees of freedom
Multiple R-squared:  0.3123826193,  Adjusted R-squared:  0.2982049414
F-statistic: 22.03341198 on 2 and 97 DF,  p-value: 1.292177026e-08

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-13.0677467265  -4.0955268627  -0.1315745805   3.3604074166
Max
16.8711932169

Coefficients:
Estimate    Std. Error t value   Pr(>|t|)
(Intercept) 0.17783502535 2.61778484206 0.06793    0.94598
x1          0.29867931513 0.06409512781 4.65994 1.0079e-05 ***
x2          0.10085730097 0.01594194217 6.32654 7.7310e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.370148 on 97 degrees of freedom
Multiple R-squared:  0.4085839063,  Adjusted R-squared:  0.3963897601
F-statistic: 33.50656106 on 2 and 97 DF,  p-value: 8.646039455e-12

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-17.5779512368  -4.8335654643  -0.7930015582   5.1781659117
Max
15.7758303981

Coefficients:
Estimate    Std. Error t value   Pr(>|t|)
(Intercept) 1.79694698649 2.88730731950 0.62236 0.53516491
x1          0.27616306559 0.07225450057 3.82209 0.00023372 ***
x2          0.08253350365 0.01768284080 4.66743 9.7835e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.884836 on 97 degrees of freedom
Multiple R-squared:  0.3177255272,  Adjusted R-squared:  0.3036580123
F-statistic: 22.58576084 on 2 and 97 DF,  p-value: 8.851511609e-09

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-20.1051445948  -5.7791625639  -0.0997409435   3.9366461291
Max
17.7399387787

Coefficients:
Estimate     Std. Error  t value   Pr(>|t|)
(Intercept) -0.84084271381  2.95783720623 -0.28428     0.7768
x1           0.31784154261  0.07354669839  4.32163 3.7501e-05 ***
x2           0.11069634381  0.01813677286  6.10342 2.1349e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.424475 on 97 degrees of freedom
Multiple R-squared:  0.3831450917,  Adjusted R-squared:  0.3704264338
F-statistic: 30.12464795 on 2 and 97 DF,  p-value: 6.666174853e-11

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-20.3242313289  -4.8456073587   0.6412547465   4.8697169217
Max
17.1791398939

Coefficients:
Estimate     Std. Error  t value   Pr(>|t|)
(Intercept) -3.62973559274  3.61312879174 -1.00460    0.31759
x1           0.36803966757  0.08512798336  4.32337 3.7254e-05 ***
x2           0.09960903877  0.02175250260  4.57920 1.3862e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.838168 on 97 degrees of freedom
Multiple R-squared:  0.2881965449,  Adjusted R-squared:  0.273520185
F-statistic: 19.63678643 on 2 and 97 DF,  p-value: 6.909783663e-08

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min             1Q         Median             3Q
-14.3711632770  -4.9939855809  -0.1881222648   4.5687920600
Max
17.3394915591

Coefficients:
Estimate     Std. Error  t value   Pr(>|t|)
(Intercept) -5.92328380760  2.53171722266 -2.33963   0.021353 *
x1           0.39824272060  0.06069001052  6.56192 2.6095e-09 ***
x2           0.13581236923  0.01589113029  8.54643 1.8273e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.57516 on 97 degrees of freedom
Multiple R-squared:  0.5535587265,  Adjusted R-squared:  0.5443537518
F-statistic: 60.13690899 on 2 and 97 DF,  p-value: < 2.220446e-16

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min            1Q        Median            3Q           Max
-18.868301749  -4.121793320   1.102904359   4.086905736  14.506413612

Coefficients:
Estimate    Std. Error t value   Pr(>|t|)
(Intercept) 4.10980533787 2.80416447540 1.46561  0.1459893
x1          0.18211781664 0.06569888406 2.77201  0.0066799 **
x2          0.10008376147 0.01698862019 5.89122 5.5335e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.73159 on 97 degrees of freedom
Multiple R-squared:  0.3094511839,  Adjusted R-squared:  0.295213064
F-statistic: 21.73399196 on 2 and 97 DF,  p-value: 1.58828222e-08
$mod1 Call: lm(formula = y ~ x1 + x2, data = adf) Residuals: Min 1Q Median 3Q Max -12.521 -4.354 0.123 3.773 16.524 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.54020 2.24499 -1.131 0.261 x1 0.34348 0.05184 6.625 1.94e-09 *** x2 0.10611 0.01380 7.690 1.22e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 5.848 on 97 degrees of freedom Multiple R-squared: 0.5417, Adjusted R-squared: 0.5322 F-statistic: 57.32 on 2 and 97 DF, p-value: < 2.2e-16$mod2

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min       1Q   Median       3Q      Max
-17.7285  -4.7210   0.0024   5.7564  17.1114

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.32389    3.04773   0.106    0.916
x1           0.31020    0.07297   4.251 4.90e-05 ***
x2           0.10329    0.02125   4.860 4.52e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.431 on 97 degrees of freedom
Multiple R-squared:  0.2731,    Adjusted R-squared:  0.2582
F-statistic: 18.23 on 2 and 97 DF,  p-value: 1.907e-07

$mod3 Call: lm(formula = y ~ x1 + x2, data = adf) Residuals: Min 1Q Median 3Q Max -22.8097 -4.7923 0.0901 4.4053 12.9871 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.76946 2.93552 -1.965 0.0522 . x1 0.48124 0.07104 6.774 9.67e-10 *** x2 0.07685 0.01727 4.451 2.28e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.578 on 97 degrees of freedom Multiple R-squared: 0.399, Adjusted R-squared: 0.3866 F-statistic: 32.2 on 2 and 97 DF, p-value: 1.881e-11$mod4

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min       1Q   Median       3Q      Max
-21.1627  -5.0391  -0.5533   6.1804  20.3814

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.02870    3.40006  -0.303    0.763
x1           0.33705    0.08098   4.162 6.83e-05 ***
x2           0.10473    0.01994   5.253 8.85e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.13 on 97 degrees of freedom
Multiple R-squared:  0.3124,    Adjusted R-squared:  0.2982
F-statistic: 22.03 on 2 and 97 DF,  p-value: 1.292e-08

$mod5 Call: lm(formula = y ~ x1 + x2, data = adf) Residuals: Min 1Q Median 3Q Max -13.0677 -4.0955 -0.1316 3.3604 16.8712 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.17784 2.61778 0.068 0.946 x1 0.29868 0.06410 4.660 1.01e-05 *** x2 0.10086 0.01594 6.327 7.73e-09 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.37 on 97 degrees of freedom Multiple R-squared: 0.4086, Adjusted R-squared: 0.3964 F-statistic: 33.51 on 2 and 97 DF, p-value: 8.646e-12$mod6

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min      1Q  Median      3Q     Max
-17.578  -4.834  -0.793   5.178  15.776

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.79695    2.88731   0.622 0.535165
x1           0.27616    0.07225   3.822 0.000234 ***
x2           0.08253    0.01768   4.667 9.78e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.885 on 97 degrees of freedom
Multiple R-squared:  0.3177,    Adjusted R-squared:  0.3037
F-statistic: 22.59 on 2 and 97 DF,  p-value: 8.852e-09

$mod7 Call: lm(formula = y ~ x1 + x2, data = adf) Residuals: Min 1Q Median 3Q Max -20.1051 -5.7792 -0.0997 3.9366 17.7399 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.84084 2.95784 -0.284 0.777 x1 0.31784 0.07355 4.322 3.75e-05 *** x2 0.11070 0.01814 6.103 2.13e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.424 on 97 degrees of freedom Multiple R-squared: 0.3831, Adjusted R-squared: 0.3704 F-statistic: 30.12 on 2 and 97 DF, p-value: 6.666e-11$mod8

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min       1Q   Median       3Q      Max
-20.3242  -4.8456   0.6413   4.8697  17.1791

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.62974    3.61313  -1.005    0.318
x1           0.36804    0.08513   4.323 3.73e-05 ***
x2           0.09961    0.02175   4.579 1.39e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.838 on 97 degrees of freedom
Multiple R-squared:  0.2882,    Adjusted R-squared:  0.2735
F-statistic: 19.64 on 2 and 97 DF,  p-value: 6.91e-08

$mod9 Call: lm(formula = y ~ x1 + x2, data = adf) Residuals: Min 1Q Median 3Q Max -14.3712 -4.9940 -0.1881 4.5688 17.3395 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.92328 2.53172 -2.340 0.0214 * x1 0.39824 0.06069 6.562 2.61e-09 *** x2 0.13581 0.01589 8.546 1.83e-13 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.575 on 97 degrees of freedom Multiple R-squared: 0.5536, Adjusted R-squared: 0.5444 F-statistic: 60.14 on 2 and 97 DF, p-value: < 2.2e-16$mod10

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
Min      1Q  Median      3Q     Max
-18.868  -4.122   1.103   4.087  14.506

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.10981    2.80416   1.466  0.14599
x1           0.18212    0.06570   2.772  0.00668 **
x2           0.10008    0.01699   5.891 5.53e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.732 on 97 degrees of freedom
Multiple R-squared:  0.3095,    Adjusted R-squared:  0.2952
F-statistic: 21.73 on 2 and 97 DF,  p-value: 1.588e-08

The first argument to print MUST BE the element pulled from mylist3, while arguments passed to print are named after.

The digits argument is COMMON across the calls to print. It is not “vectorized”.

## 3. List is a return list

One reason we use lapply is not simply to print things, but to create a new list that has the result of calculations, with each list element treated one-by-one.

coeflist <- lapply(mylist3, coef)
coeflist[1:3]
$mod1 Estimate Std. Error t value Pr(>|t|) (Intercept) -2.5402023 2.24498675 -1.131500 2.606344e-01 x1 0.3434758 0.05184432 6.625138 1.944652e-09 x2 0.1061137 0.01379968 7.689579 1.218261e-11$mod2
Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 0.3238934 3.04772533 0.1062738 9.155846e-01
x1          0.3101999 0.07297289 4.2508923 4.899589e-05
x2          0.1032876 0.02125323 4.8598538 4.518067e-06

$mod3 Estimate Std. Error t value Pr(>|t|) (Intercept) -5.76945876 2.93551549 -1.965399 5.222797e-02 x1 0.48123950 0.07103721 6.774471 9.673084e-10 x2 0.07685268 0.01726574 4.451166 2.282910e-05 ## 4. Example using “inline” (anonymous) function. Somebody said they only want to keep the P values. pvallist <- lapply(mylist3, function(x){ mycoefs <- coef(x) pvals <- mycoefs[ , "Pr(>|t|)"] pvals }) pvallist $mod1
(Intercept)           x1           x2
2.606344e-01 1.944652e-09 1.218261e-11

$mod2 (Intercept) x1 x2 9.155846e-01 4.899589e-05 4.518067e-06$mod3
(Intercept)           x1           x2
5.222797e-02 9.673084e-10 2.282910e-05

$mod4 (Intercept) x1 x2 7.628776e-01 6.832719e-05 8.846921e-07$mod5
(Intercept)           x1           x2
9.459785e-01 1.007897e-05 7.730988e-09

$mod6 (Intercept) x1 x2 5.351649e-01 2.337192e-04 9.783522e-06$mod7
(Intercept)           x1           x2
7.768047e-01 3.750110e-05 2.134851e-08

$mod8 (Intercept) x1 x2 3.175908e-01 3.725406e-05 1.386217e-05$mod9
(Intercept)           x1           x2
2.135304e-02 2.609510e-09 1.827298e-13

$mod10 (Intercept) x1 x2 1.459893e-01 6.679944e-03 5.533486e-08  ## 5. sapply and vapply Return from list is always a list. Sometimes it can be • a list in which each item is a vector with just one element. We’d like to just have a vector. • a list of vectors, we might like to have the vector stacked as columns or rows in a matrix. Many authors suggest the use of R’s “sapply” for that: sapply(mylist3, function(x){ mycoefs <- coef(x) pvals <- mycoefs[ , "Pr(>|t|)"] pvals })  mod1 mod2 mod3 mod4 (Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01 x1 1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05 x2 1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07 mod5 mod6 mod7 mod8 (Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01 x1 1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05 x2 7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05 mod9 mod10 (Intercept) 2.135304e-02 1.459893e-01 x1 2.609510e-09 6.679944e-03 x2 1.827298e-13 5.533486e-08 IMPORTANT Note the return is a 3 x 10 matrix, one column for each element. Did you expect that? I expected the transpose. Although sapply() is widely used, Hadley Wickam suggests instead we focus on learning to use vapply() in Advanced R: vapply(mylist3, function(x){ mycoefs <- coef(x) pvals <- mycoefs[ , "Pr(>|t|)"] pvals }, FUN.VALUE = numeric(3))  mod1 mod2 mod3 mod4 (Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01 x1 1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05 x2 1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07 mod5 mod6 mod7 mod8 (Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01 x1 1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05 x2 7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05 mod9 mod10 (Intercept) 2.135304e-02 1.459893e-01 x1 2.609510e-09 6.679944e-03 x2 1.827298e-13 5.533486e-08 Note the difference is the argument FUN.VALUE, where we specify the structure of an individual returned element. vapply() is preferred because it is less likely to give us a result we don’t expect. We told it we think each iteration should return a numeric vector with 3 elements, so R knew what to watch for. If the return did not match that criterion, we would have received an error. Admittedly, the documentation for vapply is poor and I would never have understood the point of this function without reading Advanced R. ## 5. vapply Example: Lets Get the R-squares! rsq <- vapply(mylist3, function(x){ x$r.square
}, FUN.VALUE = numeric(1))
rsq
     mod1      mod2      mod3      mod4      mod5      mod6      mod7
0.5416602 0.2731373 0.3990317 0.3123826 0.4085839 0.3177255 0.3831451
mod8      mod9     mod10
0.2881965 0.5535587 0.3094512 
hist(rsq, main = "R Square is the only thing I care about",
xlab = expression(R^2), prob = TRUE)

# List miscellaneous

#### The unlist() function

If a list is a collection of vectors, unlist will take them apart:

alist <- list(1:4, 32:44, rnorm(10))
avec <- unlist(alist)
avec
 [1]  1.00000000  2.00000000  3.00000000  4.00000000 32.00000000
[6] 33.00000000 34.00000000 35.00000000 36.00000000 37.00000000
[11] 38.00000000 39.00000000 40.00000000 41.00000000 42.00000000
[16] 43.00000000 44.00000000  0.26628675  1.64484304 -0.91627126
[21]  0.41936098 -0.23667887 -1.88187556 -1.57610338 -0.19895519
[26]  1.17037463 -0.07369298
class(avec)
[1] "numeric"
alist <- list(1:4, 32:44, c("Paul", "Joe"))
avec <- unlist(alist)
avec
 [1] "1"    "2"    "3"    "4"    "32"   "33"   "34"   "35"   "36"
[10] "37"   "38"   "39"   "40"   "41"   "42"   "43"   "44"   "Paul"
[19] "Joe" 
class(avec)
[1] "character"

Sometimes unlisting is more aggressive than we expect. Run unlist(mylist3) and you’ll see what 10 regressions look like when all of their numbers are flattened into a single vector.

1. Delete a list element

To remove an element from a list, it must be assigned the NULL value:

nonamelist[[3]] <- NULL
nonamelist
[[1]]
[1] 1 2 3

[[2]]
[,1]       [,2]       [,3]       [,4]
[1,]  0.60657960  0.8595099  0.8719513  0.4199838
[2,]  0.06135987 -0.9039587 -0.9960967 -0.9790216
[3,] -0.41282739 -0.1912054  1.3025675 -1.0063148
[4,]  0.80153090 -0.9225425 -0.5622973 -0.6288113

# Session Info

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  base

other attached packages:
[1] crmda_0.45

loaded via a namespace (and not attached):
[1] Rcpp_0.12.12    digest_0.6.12   rprojroot_1.2   plyr_1.8.4
[5] xtable_1.8-2    backports_1.1.0 magrittr_1.5    evaluate_0.10.1
[9] stringi_1.1.5   openxlsx_4.0.17 rmarkdown_1.6   tools_3.4.1
[13] stringr_1.2.0   kutils_1.21     yaml_2.1.14     compiler_3.4.1
[17] htmltools_0.3.6 knitr_1.17      methods_3.4.1  

Available under Created Commons license 3.0