List introduction

List is a diverse collection of R objects. Any R object can be inserted in a list.

A list is highly flexible. In versatility, a list is the complete opposite of an R vector or a matrix.

Recall a vector or matrix must be made up of homogeneous elements. If we add an element in a vector (or matrix), it can happen that the entire vector (or matrix) changes as a result. (Recall inserting a character into a numeric vector?)

Below several methods of inserting elements in lists and extracting them will be discussed.

Named and Unnamed lists

First, we create a small example list for inspection. This is a named list because I insert a name with each element.

mylist <- list("x" = c(1, 2, 3), "y" = matrix(rnorm(16), 4), "z" = "Paul")
names(mylist)
[1] "x" "y" "z"
length(mylist)
[1] 3

This is an unnamed list:

nonamelist <- list(c(1, 2, 3),  matrix(rnorm(16), 4), "Paul")
length(nonamelist)
[1] 3
nonamelist
[[1]]
[1] 1 2 3

[[2]]
            [,1]       [,2]       [,3]       [,4]
[1,]  0.60657960  0.8595099  0.8719513  0.4199838
[2,]  0.06135987 -0.9039587 -0.9960967 -0.9790216
[3,] -0.41282739 -0.1912054  1.3025675 -1.0063148
[4,]  0.80153090 -0.9225425 -0.5622973 -0.6288113

[[3]]
[1] "Paul"

You agree it has no names, right?

names(nonamelist)
NULL

The elements of a named list can be accessed either by their name or their index number, while an unnamed list allows access only by the index number.

One will find comments here and there in the literature to suggest that lists will be processed more quickly in R if they do not have named elements.

If you want to remove the names from an object, there are two ways.

unname(mylist)
[[1]]
[1] 1 2 3

[[2]]
           [,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012

[[3]]
[1] "Paul"

or, equivalently,

names(mylist) <- NULL
mylist
[[1]]
[1] 1 2 3

[[2]]
           [,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012

[[3]]
[1] "Paul"

But the gosh darned names are needed for the rest of the presentation, so

names(mylist) <- c("x", "y", "z")

List element access: [[ or [ ?

The single [

A single-bracket is used to extract subsets from the list, and keep the result as a new list.

mylist2 <- mylist[c(1,3)]
mylist2
$x
[1] 1 2 3

$z
[1] "Paul"
class(mylist2)
[1] "list"
length(mylist2)
[1] 2

The double [[

The double-bracket [[ is used to copy an object from the list and the result is not a list anymore, it is the object’s type.

I’ll access that element by name first:

mymat1 <- mylist[["y"]]
mymat1
           [,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
class(mymat1)
[1] "matrix"

Then I access that by list position with an integer index:

mymat2 <- mylist[[2]]
mymat2
           [,1]       [,2]        [,3]       [,4]
[1,]  1.3458888 0.03575494  0.04551809  1.6742535
[2,] -1.4894749 0.88974889  0.43397985  0.2414254
[3,]  1.3384985 1.37851213 -0.67408865  1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
class(mymat2)
[1] "matrix"
identical(mymat1, mymat2)
[1] TRUE

Ways to allocate list storage

There are two ways to do this. The first is the common, easy way. The second is the faster, more structured way.

1 The Easy way

  1. Initiate an empty list that will grow as items are added to it.
mylist1 <- list()

2 The More Disciplined & Better Way

  1. Initialize a list of a given size (for example, 6).
mylist2 <- vector(mode = "list", length = 6)

What’s the difference? Speed, possibly

The major difference between the two types arises when we want to put the lists to use. In the case of mylist1, we are allowed to add items one by one, either by name or position in the list:

x1 <- c(1, 2, 3)
x2 <- matrix(rnorm(9), ncol = 3)
mylist1[[1]] <- x1
mylist1[["x1"]] <- x1
mylist1[[3]] <- x1

Note that, as far as “mylist1” is concerned, the first item is [[1]], the second item can be found either as [[2]] or [[“x1”]], and the third item is [[3]]:

mylist1
[[1]]
[1] 1 2 3

$x1
[1] 1 2 3

[[3]]
[1] 1 2 3
mylist1[["x1"]]
[1] 1 2 3
mylist1[[2]]
[1] 1 2 3

The list only had 3 elements, but if we insert a 6th element, then R creates NULL elements 4 through 5:

mylist1[[6]] <- x2
mylist1
[[1]]
[1] 1 2 3

$x1
[1] 1 2 3

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
          [,1]        [,2]       [,3]
[1,] -1.200697  1.37605055  1.1203431
[2,]  1.716917 -0.29286273  0.2108199
[3,] -1.341655  0.07065126 -0.3953875

Rememember that the absence of an element in a list is referred to by the symbol NULL, not NA (as for vectors and matrices).

We find the difference in mylist2 is that we are not allowed to insert named elements into the middle of the list in the same way. Observe that because the list was allocated with elements 1 through 6 as NULL, then inserting a named thing “x1” adds a 7th element in the list:

mylist2[[1]] <- x1
mylist2[["x1"]] <- x1
mylist2[[3]] <- x1
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

$x1
[1] 1 2 3

If we want to insert the matrix in the 6th element we can, of course:

mylist2[[6]] <- x2
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
          [,1]        [,2]       [,3]
[1,] -1.200697  1.37605055  1.1203431
[2,]  1.716917 -0.29286273  0.2108199
[3,] -1.341655  0.07065126 -0.3953875

$x1
[1] 1 2 3

If we decide we want the elements to be named, we can do so with the names function:

# only insert names for 6th and 7th items:
names(mylist2)[6:7] <- c("x1", "x2")
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

$x1
          [,1]        [,2]       [,3]
[1,] -1.200697  1.37605055  1.1203431
[2,]  1.716917 -0.29286273  0.2108199
[3,] -1.341655  0.07065126 -0.3953875

$x2
[1] 1 2 3
names(mylist2)
[1] ""   ""   ""   ""   ""   "x1" "x2"

Conclusion: If you are going to generate a lot of objects for a list, it is best to allocate the whole list first and fill in the elements with [[index_number]] <- ....

If you want a more flexible list, in which you can insert things with names as you go, it is necessary to initiate the list with list() but insertion of items is slower.

Is that really faster? Maybe!

Allocation of memory is slow, so one argument in favor of the second strategy is that we allocate storage in one step. This is more efficient.

I wondered if it really is more efficient. The right thing would be to formalize this as a microbenchmark experiment, but the system.time function gives a quick snapshot:

alist <- list()
system.time(
for(i in 1:10000){
    alist[[i]] <- matrix(rnorm(9), ncol = 3)
})
   user  system elapsed 
  0.048   0.000   0.048 
alist2 <- vector("list", 10000)
system.time(
for(i in 1:10000){
    alist2[[i]] <- matrix(rnorm(9), ncol = 3)
})
   user  system elapsed 
  0.048   0.000   0.048 

Brief Simulation Example

There is a middle ground with the second style. We can create a list with 10 elements and then name them. If we do that, then we can insert things by name. Example, create a list with 10 named things for 10 models:

mylist3 <- vector(mode = "list", length = 10)
names(mylist3) <- paste0("mod", 1:10)
mylist3
$mod1
NULL

$mod2
NULL

$mod3
NULL

$mod4
NULL

$mod5
NULL

$mod6
NULL

$mod7
NULL

$mod8
NULL

$mod9
NULL

$mod10
NULL

Now lets run a data-generator 10 times and fill those in:

set.seed(234234)
mdg <- function(N = 100, beta = c(0.1, 0.3, 0.1), stde = 7)
{
    e <- rnorm(N, m = 0, sd = stde)
    ## oops, don't know parm for predictors
    x1 <- rnorm(N, m = 40, sd = 10)
    x2 <- rnorm(N, m = 20, sd = 40)
    y <- beta[1] + beta[2] * x1 + beta[3] * x2 + e
    invisible(data.frame(x1, x2, y))
}

for (i in 1:10){
    adf <- mdg()
    amodel <- lm(y ~ x1 + x2, data = adf)
    mylist3[[paste0("mod", i)]] <- summary(amodel)
}

It is pretty easy to verify that each element in this list is a summary object from the fitted regression.

mylist3[[7]]

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.1051  -5.7792  -0.0997   3.9366  17.7399 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.84084    2.95784  -0.284    0.777    
x1           0.31784    0.07355   4.322 3.75e-05 ***
x2           0.11070    0.01814   6.103 2.13e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.424 on 97 degrees of freedom
Multiple R-squared:  0.3831,    Adjusted R-squared:  0.3704 
F-statistic: 30.12 on 2 and 97 DF,  p-value: 6.666e-11
class(mylist3[[7]])
[1] "summary.lm"

lapply: Iterate on lists

1. Basic lapply with one-argument function

A function, such as “class” or “print”, can be applied to each element in the list in this way.

lapply(mylist3, class)
$mod1
[1] "summary.lm"

$mod2
[1] "summary.lm"

$mod3
[1] "summary.lm"

$mod4
[1] "summary.lm"

$mod5
[1] "summary.lm"

$mod6
[1] "summary.lm"

$mod7
[1] "summary.lm"

$mod8
[1] "summary.lm"

$mod9
[1] "summary.lm"

$mod10
[1] "summary.lm"

For practical purposes, that is the same as “looping” over the elements like this:

for(i in seq_along(mylist3)){
    print(class(mylist3[[i]]))
}
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"

(The “print()” is needed because, without it, the for loop does not display the output from commands).

Watch out about using for loops.

There is social stigma! If you go to StackExchange or the “r-help” list with example code that uses a for loop, you will often be shouted at because for loops are slow in R.

While this is a slight exaggeration, there are cases where clever use of the lapply() iteration structure is faster. Generally, the reason is that R can look at the request and plan ahead for its calculations, while the for loop hides the long-run details from R. Chores like memory allocation cannot be managed so efficiently. Another fact is that “[” and “[[” are decidely slow operators. We are forcing R to talk back and forth from the R runtime, which is written in C, and the user workspace, which is slowed down by the fact that it interactive.

2. Additional arguments can be named

lapply(mylist3, print, digits = 10)

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-12.5207964322  -4.3545461801   0.1230453526   3.7729012944 
           Max 
 16.5242736595 

Coefficients:
                  Estimate     Std. Error  t value   Pr(>|t|)    
(Intercept) -2.54020229551  2.24498674920 -1.13150    0.26063    
x1           0.34347579963  0.05184432045  6.62514 1.9447e-09 ***
x2           0.10611372768  0.01379967990  7.68958 1.2183e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.847931 on 97 degrees of freedom
Multiple R-squared:  0.5416601613,  Adjusted R-squared:  0.5322098554 
F-statistic: 57.31667991 on 2 and 97 DF,  p-value: < 2.220446e-16


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-17.7285432215  -4.7209716896   0.0023519827   5.7563917722 
           Max 
 17.1113795685 

Coefficients:
                 Estimate    Std. Error t value   Pr(>|t|)    
(Intercept) 0.32389337806 3.04772533267 0.10627    0.91558    
x1          0.31019989021 0.07297288819 4.25089 4.8996e-05 ***
x2          0.10328758440 0.02125322870 4.85985 4.5181e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.430645 on 97 degrees of freedom
Multiple R-squared:  0.2731372661,  Adjusted R-squared:  0.2581504056 
F-statistic: 18.22511568 on 2 and 97 DF,  p-value: 1.907399024e-07


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-22.8096528641  -4.7922920227   0.0900718621   4.4053471276 
           Max 
 12.9871034518 

Coefficients:
                  Estimate     Std. Error  t value   Pr(>|t|)    
(Intercept) -5.76945875885  2.93551549170 -1.96540   0.052228 .  
x1           0.48123950081  0.07103720942  6.77447 9.6731e-10 ***
x2           0.07685267720  0.01726573985  4.45117 2.2829e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.577699 on 97 degrees of freedom
Multiple R-squared:  0.399031682,   Adjusted R-squared:  0.3866405827 
F-statistic: 32.20308958 on 2 and 97 DF,  p-value: 1.88062195e-11


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-21.1627423403  -5.0391173329  -0.5532557879   6.1804420259 
           Max 
 20.3814234919 

Coefficients:
                  Estimate     Std. Error  t value   Pr(>|t|)    
(Intercept) -1.02870232843  3.40006163247 -0.30255    0.76288    
x1           0.33704875107  0.08098420772  4.16191 6.8327e-05 ***
x2           0.10473331942  0.01993670695  5.25329 8.8469e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.129571 on 97 degrees of freedom
Multiple R-squared:  0.3123826193,  Adjusted R-squared:  0.2982049414 
F-statistic: 22.03341198 on 2 and 97 DF,  p-value: 1.292177026e-08


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-13.0677467265  -4.0955268627  -0.1315745805   3.3604074166 
           Max 
 16.8711932169 

Coefficients:
                 Estimate    Std. Error t value   Pr(>|t|)    
(Intercept) 0.17783502535 2.61778484206 0.06793    0.94598    
x1          0.29867931513 0.06409512781 4.65994 1.0079e-05 ***
x2          0.10085730097 0.01594194217 6.32654 7.7310e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.370148 on 97 degrees of freedom
Multiple R-squared:  0.4085839063,  Adjusted R-squared:  0.3963897601 
F-statistic: 33.50656106 on 2 and 97 DF,  p-value: 8.646039455e-12


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-17.5779512368  -4.8335654643  -0.7930015582   5.1781659117 
           Max 
 15.7758303981 

Coefficients:
                 Estimate    Std. Error t value   Pr(>|t|)    
(Intercept) 1.79694698649 2.88730731950 0.62236 0.53516491    
x1          0.27616306559 0.07225450057 3.82209 0.00023372 ***
x2          0.08253350365 0.01768284080 4.66743 9.7835e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.884836 on 97 degrees of freedom
Multiple R-squared:  0.3177255272,  Adjusted R-squared:  0.3036580123 
F-statistic: 22.58576084 on 2 and 97 DF,  p-value: 8.851511609e-09


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-20.1051445948  -5.7791625639  -0.0997409435   3.9366461291 
           Max 
 17.7399387787 

Coefficients:
                  Estimate     Std. Error  t value   Pr(>|t|)    
(Intercept) -0.84084271381  2.95783720623 -0.28428     0.7768    
x1           0.31784154261  0.07354669839  4.32163 3.7501e-05 ***
x2           0.11069634381  0.01813677286  6.10342 2.1349e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.424475 on 97 degrees of freedom
Multiple R-squared:  0.3831450917,  Adjusted R-squared:  0.3704264338 
F-statistic: 30.12464795 on 2 and 97 DF,  p-value: 6.666174853e-11


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-20.3242313289  -4.8456073587   0.6412547465   4.8697169217 
           Max 
 17.1791398939 

Coefficients:
                  Estimate     Std. Error  t value   Pr(>|t|)    
(Intercept) -3.62973559274  3.61312879174 -1.00460    0.31759    
x1           0.36803966757  0.08512798336  4.32337 3.7254e-05 ***
x2           0.09960903877  0.02175250260  4.57920 1.3862e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.838168 on 97 degrees of freedom
Multiple R-squared:  0.2881965449,  Adjusted R-squared:  0.273520185 
F-statistic: 19.63678643 on 2 and 97 DF,  p-value: 6.909783663e-08


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
           Min             1Q         Median             3Q 
-14.3711632770  -4.9939855809  -0.1881222648   4.5687920600 
           Max 
 17.3394915591 

Coefficients:
                  Estimate     Std. Error  t value   Pr(>|t|)    
(Intercept) -5.92328380760  2.53171722266 -2.33963   0.021353 *  
x1           0.39824272060  0.06069001052  6.56192 2.6095e-09 ***
x2           0.13581236923  0.01589113029  8.54643 1.8273e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.57516 on 97 degrees of freedom
Multiple R-squared:  0.5535587265,  Adjusted R-squared:  0.5443537518 
F-statistic: 60.13690899 on 2 and 97 DF,  p-value: < 2.220446e-16


Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
          Min            1Q        Median            3Q           Max 
-18.868301749  -4.121793320   1.102904359   4.086905736  14.506413612 

Coefficients:
                 Estimate    Std. Error t value   Pr(>|t|)    
(Intercept) 4.10980533787 2.80416447540 1.46561  0.1459893    
x1          0.18211781664 0.06569888406 2.77201  0.0066799 ** 
x2          0.10008376147 0.01698862019 5.89122 5.5335e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.73159 on 97 degrees of freedom
Multiple R-squared:  0.3094511839,  Adjusted R-squared:  0.295213064 
F-statistic: 21.73399196 on 2 and 97 DF,  p-value: 1.58828222e-08
$mod1

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.521  -4.354   0.123   3.773  16.524 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.54020    2.24499  -1.131    0.261    
x1           0.34348    0.05184   6.625 1.94e-09 ***
x2           0.10611    0.01380   7.690 1.22e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.848 on 97 degrees of freedom
Multiple R-squared:  0.5417,    Adjusted R-squared:  0.5322 
F-statistic: 57.32 on 2 and 97 DF,  p-value: < 2.2e-16


$mod2

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.7285  -4.7210   0.0024   5.7564  17.1114 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.32389    3.04773   0.106    0.916    
x1           0.31020    0.07297   4.251 4.90e-05 ***
x2           0.10329    0.02125   4.860 4.52e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.431 on 97 degrees of freedom
Multiple R-squared:  0.2731,    Adjusted R-squared:  0.2582 
F-statistic: 18.23 on 2 and 97 DF,  p-value: 1.907e-07


$mod3

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-22.8097  -4.7923   0.0901   4.4053  12.9871 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -5.76946    2.93552  -1.965   0.0522 .  
x1           0.48124    0.07104   6.774 9.67e-10 ***
x2           0.07685    0.01727   4.451 2.28e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.578 on 97 degrees of freedom
Multiple R-squared:  0.399, Adjusted R-squared:  0.3866 
F-statistic:  32.2 on 2 and 97 DF,  p-value: 1.881e-11


$mod4

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-21.1627  -5.0391  -0.5533   6.1804  20.3814 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.02870    3.40006  -0.303    0.763    
x1           0.33705    0.08098   4.162 6.83e-05 ***
x2           0.10473    0.01994   5.253 8.85e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.13 on 97 degrees of freedom
Multiple R-squared:  0.3124,    Adjusted R-squared:  0.2982 
F-statistic: 22.03 on 2 and 97 DF,  p-value: 1.292e-08


$mod5

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.0677  -4.0955  -0.1316   3.3604  16.8712 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.17784    2.61778   0.068    0.946    
x1           0.29868    0.06410   4.660 1.01e-05 ***
x2           0.10086    0.01594   6.327 7.73e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.37 on 97 degrees of freedom
Multiple R-squared:  0.4086,    Adjusted R-squared:  0.3964 
F-statistic: 33.51 on 2 and 97 DF,  p-value: 8.646e-12


$mod6

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
    Min      1Q  Median      3Q     Max 
-17.578  -4.834  -0.793   5.178  15.776 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.79695    2.88731   0.622 0.535165    
x1           0.27616    0.07225   3.822 0.000234 ***
x2           0.08253    0.01768   4.667 9.78e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.885 on 97 degrees of freedom
Multiple R-squared:  0.3177,    Adjusted R-squared:  0.3037 
F-statistic: 22.59 on 2 and 97 DF,  p-value: 8.852e-09


$mod7

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.1051  -5.7792  -0.0997   3.9366  17.7399 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.84084    2.95784  -0.284    0.777    
x1           0.31784    0.07355   4.322 3.75e-05 ***
x2           0.11070    0.01814   6.103 2.13e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.424 on 97 degrees of freedom
Multiple R-squared:  0.3831,    Adjusted R-squared:  0.3704 
F-statistic: 30.12 on 2 and 97 DF,  p-value: 6.666e-11


$mod8

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.3242  -4.8456   0.6413   4.8697  17.1791 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.62974    3.61313  -1.005    0.318    
x1           0.36804    0.08513   4.323 3.73e-05 ***
x2           0.09961    0.02175   4.579 1.39e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.838 on 97 degrees of freedom
Multiple R-squared:  0.2882,    Adjusted R-squared:  0.2735 
F-statistic: 19.64 on 2 and 97 DF,  p-value: 6.91e-08


$mod9

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.3712  -4.9940  -0.1881   4.5688  17.3395 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -5.92328    2.53172  -2.340   0.0214 *  
x1           0.39824    0.06069   6.562 2.61e-09 ***
x2           0.13581    0.01589   8.546 1.83e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.575 on 97 degrees of freedom
Multiple R-squared:  0.5536,    Adjusted R-squared:  0.5444 
F-statistic: 60.14 on 2 and 97 DF,  p-value: < 2.2e-16


$mod10

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.868  -4.122   1.103   4.087  14.506 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.10981    2.80416   1.466  0.14599    
x1           0.18212    0.06570   2.772  0.00668 ** 
x2           0.10008    0.01699   5.891 5.53e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.732 on 97 degrees of freedom
Multiple R-squared:  0.3095,    Adjusted R-squared:  0.2952 
F-statistic: 21.73 on 2 and 97 DF,  p-value: 1.588e-08

The first argument to print MUST BE the element pulled from mylist3, while arguments passed to print are named after.

The digits argument is COMMON across the calls to print. It is not “vectorized”.

3. List is a return list

One reason we use lapply is not simply to print things, but to create a new list that has the result of calculations, with each list element treated one-by-one.

coeflist <- lapply(mylist3, coef)
coeflist[1:3]
$mod1
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) -2.5402023 2.24498675 -1.131500 2.606344e-01
x1           0.3434758 0.05184432  6.625138 1.944652e-09
x2           0.1061137 0.01379968  7.689579 1.218261e-11

$mod2
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 0.3238934 3.04772533 0.1062738 9.155846e-01
x1          0.3101999 0.07297289 4.2508923 4.899589e-05
x2          0.1032876 0.02125323 4.8598538 4.518067e-06

$mod3
               Estimate Std. Error   t value     Pr(>|t|)
(Intercept) -5.76945876 2.93551549 -1.965399 5.222797e-02
x1           0.48123950 0.07103721  6.774471 9.673084e-10
x2           0.07685268 0.01726574  4.451166 2.282910e-05

4. Example using “inline” (anonymous) function.

Somebody said they only want to keep the P values.

pvallist <- lapply(mylist3, function(x){
    mycoefs <- coef(x)
    pvals <- mycoefs[ , "Pr(>|t|)"]
    pvals
})
pvallist
$mod1
 (Intercept)           x1           x2 
2.606344e-01 1.944652e-09 1.218261e-11 

$mod2
 (Intercept)           x1           x2 
9.155846e-01 4.899589e-05 4.518067e-06 

$mod3
 (Intercept)           x1           x2 
5.222797e-02 9.673084e-10 2.282910e-05 

$mod4
 (Intercept)           x1           x2 
7.628776e-01 6.832719e-05 8.846921e-07 

$mod5
 (Intercept)           x1           x2 
9.459785e-01 1.007897e-05 7.730988e-09 

$mod6
 (Intercept)           x1           x2 
5.351649e-01 2.337192e-04 9.783522e-06 

$mod7
 (Intercept)           x1           x2 
7.768047e-01 3.750110e-05 2.134851e-08 

$mod8
 (Intercept)           x1           x2 
3.175908e-01 3.725406e-05 1.386217e-05 

$mod9
 (Intercept)           x1           x2 
2.135304e-02 2.609510e-09 1.827298e-13 

$mod10
 (Intercept)           x1           x2 
1.459893e-01 6.679944e-03 5.533486e-08 

5. sapply and vapply

Return from list is always a list. Sometimes it can be

Many authors suggest the use of R’s “sapply” for that:

sapply(mylist3, function(x){
    mycoefs <- coef(x)
    pvals <- mycoefs[ , "Pr(>|t|)"]
    pvals
    })
                    mod1         mod2         mod3         mod4
(Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01
x1          1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05
x2          1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07
                    mod5         mod6         mod7         mod8
(Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01
x1          1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05
x2          7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05
                    mod9        mod10
(Intercept) 2.135304e-02 1.459893e-01
x1          2.609510e-09 6.679944e-03
x2          1.827298e-13 5.533486e-08

IMPORTANT Note the return is a 3 x 10 matrix, one column for each element. Did you expect that? I expected the transpose.

Although sapply() is widely used, Hadley Wickam suggests instead we focus on learning to use vapply() in Advanced R:

vapply(mylist3, function(x){
    mycoefs <- coef(x)
    pvals <- mycoefs[ , "Pr(>|t|)"]
    pvals
    }, FUN.VALUE = numeric(3))
                    mod1         mod2         mod3         mod4
(Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01
x1          1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05
x2          1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07
                    mod5         mod6         mod7         mod8
(Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01
x1          1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05
x2          7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05
                    mod9        mod10
(Intercept) 2.135304e-02 1.459893e-01
x1          2.609510e-09 6.679944e-03
x2          1.827298e-13 5.533486e-08

Note the difference is the argument FUN.VALUE, where we specify the structure of an individual returned element.

vapply() is preferred because it is less likely to give us a result we don’t expect. We told it we think each iteration should return a numeric vector with 3 elements, so R knew what to watch for. If the return did not match that criterion, we would have received an error.

Admittedly, the documentation for vapply is poor and I would never have understood the point of this function without reading Advanced R.

5. vapply Example: Lets Get the R-squares!

rsq <- vapply(mylist3, function(x){
    x$r.square
}, FUN.VALUE = numeric(1))
rsq
     mod1      mod2      mod3      mod4      mod5      mod6      mod7 
0.5416602 0.2731373 0.3990317 0.3123826 0.4085839 0.3177255 0.3831451 
     mod8      mod9     mod10 
0.2881965 0.5535587 0.3094512 
hist(rsq, main = "R Square is the only thing I care about",
     xlab = expression(R^2), prob = TRUE)

List miscellaneous

The unlist() function

If a list is a collection of vectors, unlist will take them apart:

alist <- list(1:4, 32:44, rnorm(10))
avec <- unlist(alist)
avec
 [1]  1.00000000  2.00000000  3.00000000  4.00000000 32.00000000
 [6] 33.00000000 34.00000000 35.00000000 36.00000000 37.00000000
[11] 38.00000000 39.00000000 40.00000000 41.00000000 42.00000000
[16] 43.00000000 44.00000000  0.26628675  1.64484304 -0.91627126
[21]  0.41936098 -0.23667887 -1.88187556 -1.57610338 -0.19895519
[26]  1.17037463 -0.07369298
class(avec)
[1] "numeric"
alist <- list(1:4, 32:44, c("Paul", "Joe"))
avec <- unlist(alist)
avec
 [1] "1"    "2"    "3"    "4"    "32"   "33"   "34"   "35"   "36"  
[10] "37"   "38"   "39"   "40"   "41"   "42"   "43"   "44"   "Paul"
[19] "Joe" 
class(avec)
[1] "character"

Sometimes unlisting is more aggressive than we expect. Run unlist(mylist3) and you’ll see what 10 regressions look like when all of their numbers are flattened into a single vector.

  1. Delete a list element

To remove an element from a list, it must be assigned the NULL value:

nonamelist[[3]] <- NULL
nonamelist
[[1]]
[1] 1 2 3

[[2]]
            [,1]       [,2]       [,3]       [,4]
[1,]  0.60657960  0.8595099  0.8719513  0.4199838
[2,]  0.06135987 -0.9039587 -0.9960967 -0.9790216
[3,] -0.41282739 -0.1912054  1.3025675 -1.0063148
[4,]  0.80153090 -0.9225425 -0.5622973 -0.6288113

Session Info

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  base     

other attached packages:
[1] crmda_0.45

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12    digest_0.6.12   rprojroot_1.2   plyr_1.8.4     
 [5] xtable_1.8-2    backports_1.1.0 magrittr_1.5    evaluate_0.10.1
 [9] stringi_1.1.5   openxlsx_4.0.17 rmarkdown_1.6   tools_3.4.1    
[13] stringr_1.2.0   kutils_1.21     yaml_2.1.14     compiler_3.4.1 
[17] htmltools_0.3.6 knitr_1.17      methods_3.4.1  

Available under Created Commons license 3.0 CC BY