List introduction

List is a diverse collection of R objects. Any R object can be inserted in a list.

A list is highly flexible. In versatility, a list is the complete opposite of an R vector or a matrix.

Recall a vector or matrix must be made up of homogeneous elements. If we add an element in a vector (or matrix), it can happen that the entire vector (or matrix) changes as a result. (Recall inserting a character into a numeric vector?)

Below several methods of inserting elements in lists and extracting them will be discussed.

Named and Unnamed lists

First, we create a small example list for inspection. This is a named list because I insert a name with each element.

mylist <- list("x" = c(1, 2, 3), "y" = matrix(rnorm(16), 4), "z" = "Paul")
names(mylist)
[1] "x" "y" "z"
length(mylist)
[1] 3

This is an unnamed list:

nonamelist <- list(c(1, 2, 3),  matrix(rnorm(16), 4), "Paul")
length(nonamelist)
[1] 3
nonamelist
[[1]]
[1] 1 2 3

[[2]]
           [,1]        [,2]       [,3]       [,4]
[1,]  0.9078962  0.88526266  0.6956525  1.0344293
[2,]  1.0670807 -0.93670030  1.5539388 -0.9817459
[3,] -0.4885171 -0.01102196  1.7898020  0.2659287
[4,] -1.0072824  0.12531943 -0.7290195 -1.6771190

[[3]]
[1] "Paul"

You agree it has no names, right?

names(nonamelist)
NULL

The elements of a named list can be accessed either by their name or their index number, while an unnamed list allows access only by the index number.

One will find comments here and there in the literature to suggest that lists will be processed more quickly in R if they do not have named elements.

If you want to remove the names from an object, there are two ways.

unname(mylist)
[[1]]
[1] 1 2 3

[[2]]
           [,1]       [,2]      [,3]        [,4]
[1,]  1.0247571  0.2222761 0.5916056  0.17307361
[2,] -0.9177907 -0.8432722 0.4910802 -0.09515716
[3,] -0.1960174  1.0017147 0.9647557 -0.16671735
[4,]  1.0909467 -0.7307711 0.4965228  0.33657169

[[3]]
[1] "Paul"

or, equivalently,

names(mylist) <- NULL
mylist
[[1]]
[1] 1 2 3

[[2]]
           [,1]       [,2]      [,3]        [,4]
[1,]  1.0247571  0.2222761 0.5916056  0.17307361
[2,] -0.9177907 -0.8432722 0.4910802 -0.09515716
[3,] -0.1960174  1.0017147 0.9647557 -0.16671735
[4,]  1.0909467 -0.7307711 0.4965228  0.33657169

[[3]]
[1] "Paul"

But the gosh darned names are needed for the rest of the presentation, so

names(mylist) <- c("x", "y", "z")

List element access: [[ or [ ?

The single [

A single-bracket is used to extract subsets from the list, and keep the result as a new list.

mylist2 <- mylist[c(1,3)]
mylist2
$x
[1] 1 2 3

$z
[1] "Paul"
class(mylist2)
[1] "list"
length(mylist2)
[1] 2

The double [[

The double-bracket [[ is used to copy an object from the list and the result is not a list anymore, it is the object’s type.

I’ll access that element by name first:

mymat1 <- mylist[["y"]]
mymat1
           [,1]       [,2]      [,3]        [,4]
[1,]  1.0247571  0.2222761 0.5916056  0.17307361
[2,] -0.9177907 -0.8432722 0.4910802 -0.09515716
[3,] -0.1960174  1.0017147 0.9647557 -0.16671735
[4,]  1.0909467 -0.7307711 0.4965228  0.33657169
class(mymat1)
[1] "matrix"

Then I access that by list position with an integer index:

mymat2 <- mylist[[2]]
mymat2
           [,1]       [,2]      [,3]        [,4]
[1,]  1.0247571  0.2222761 0.5916056  0.17307361
[2,] -0.9177907 -0.8432722 0.4910802 -0.09515716
[3,] -0.1960174  1.0017147 0.9647557 -0.16671735
[4,]  1.0909467 -0.7307711 0.4965228  0.33657169
class(mymat2)
[1] "matrix"
identical(mymat1, mymat2)
[1] TRUE

Ways to allocate list storage

There are two ways to do this. The first is the common, easy way. The second is the faster, more structured way.

  1. Initiate an empty list that will grow as items are added to it.
mylist1 <- list()
  1. Initialize a list of a given size (for example, 6).
mylist2 <- vector(mode = "list", length = 6)

The major difference between the two types arises when we want to put the lists to use. In the case of mylist1, we are allowed to add items one by one, either by name or position in the list:

x1 <- c(1, 2, 3)
x2 <- matrix(rnorm(9), ncol = 3)
mylist1[[1]] <- x1
mylist1[["x1"]] <- x1
mylist1[[3]] <- x1

Note that, as far as “mylist1” is concerned, the first item is [[1]], the second item can be found either as [[2]] or [[“x1”]], and the third item is [[3]]:

mylist1
[[1]]
[1] 1 2 3

$x1
[1] 1 2 3

[[3]]
[1] 1 2 3
mylist1[["x1"]]
[1] 1 2 3
mylist1[[2]]
[1] 1 2 3

The list only had 3 elements, but if we insert a 6th element, then R creates NULL elements 4 through 5:

mylist1[[6]] <- x2
mylist1
[[1]]
[1] 1 2 3

$x1
[1] 1 2 3

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
           [,1]       [,2]       [,3]
[1,] -0.2335816  0.6475242 -1.0499809
[2,]  1.4948868 -0.6139656 -0.9746328
[3,] -0.6158786  0.3947409 -1.0132724

Rememember that the absence of an element in a list is referred to by the symbol NULL, not NA (as for vectors and matrices).

We find the difference in mylist2 is that we are not allowed to insert named elements into the middle of the list in the same way. Observe that because the list was allocated with elements 1 through 6 as NULL, then inserting a named thing “x1” adds a 7th element in the list:

mylist2[[1]] <- x1
mylist2[["x1"]] <- x1
mylist2[[3]] <- x1
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

$x1
[1] 1 2 3

If we want to insert the matrix in the 6th element we can, of course:

mylist2[[6]] <- x2
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

[[6]]
           [,1]       [,2]       [,3]
[1,] -0.2335816  0.6475242 -1.0499809
[2,]  1.4948868 -0.6139656 -0.9746328
[3,] -0.6158786  0.3947409 -1.0132724

$x1
[1] 1 2 3

If we decide we want the elements to be named, we can do so with the names function:

## only insert names for 6th and 7th items:
names(mylist2)[6:7] <- c("x1", "x2")
mylist2
[[1]]
[1] 1 2 3

[[2]]
NULL

[[3]]
[1] 1 2 3

[[4]]
NULL

[[5]]
NULL

$x1
           [,1]       [,2]       [,3]
[1,] -0.2335816  0.6475242 -1.0499809
[2,]  1.4948868 -0.6139656 -0.9746328
[3,] -0.6158786  0.3947409 -1.0132724

$x2
[1] 1 2 3
names(mylist2)
[1] ""   ""   ""   ""   ""   "x1" "x2"

Conclusion: If you are going to generate a lot of objects for a list, it is best to allocate the whole list first and fill in the elements with [[index_number]] <- ....

If you want a more flexible list, in which you can insert things with names as you go, it is necessary to initiate the list with list() but insertion of items is slower.

Is that really faster? Maybe!

Allocation of memory is slow, so one argument in favor of the second strategy is that we allocate storage in one step. This is more efficient.

I wondered if it really is more efficient. The right thing would be to formalize this as a microbenchmark experiment, but the system.time function gives a quick snapshot:

alist <- list()
system.time(
for(i in 1:10000){
    alist[[i]] <- matrix(rnorm(9), ncol = 3)
})
   user  system elapsed 
  0.284   0.008   0.293 
alist2 <- vector("list", 10000)
system.time(
for(i in 1:10000){
    alist2[[i]] <- matrix(rnorm(9), ncol = 3)
})
   user  system elapsed 
  0.056   0.000   0.053 

Brief Simulation Example

There is a middle ground with the second style. We can create a list with 10 elements and then name them. If we do that, then we can insert things by name. Example, create a list with 10 named things for 10 models:

mylist3 <- vector(mode = "list", length = 10)
names(mylist3) <- paste0("mod", 1:10)
mylist3
$mod1
NULL

$mod2
NULL

$mod3
NULL

$mod4
NULL

$mod5
NULL

$mod6
NULL

$mod7
NULL

$mod8
NULL

$mod9
NULL

$mod10
NULL

Now lets run a data-generator 10 times and fill those in:

set.seed(234234)
mdg <- function(N = 100, beta = c(0.1, 0.3, 0.1), stde = 7)
{
    e <- rnorm(N, m = 0, sd = stde)
    ## oops, don't know parm for predictors
    x1 <- rnorm(N, m = 40, sd = 10)
    x2 <- rnorm(N, m = 20, sd = 40)
    y <- beta[1] + beta[2] * x1 + beta[3] * x2 + e
    invisible(data.frame(x1, x2, y))
}

for (i in 1:10){
    adf <- mdg()
    amodel <- lm(y ~ x1 + x2, data = adf)
    mylist3[[paste0("mod", i)]] <- summary(amodel)
}

It is pretty easy to verify that each element in this list is a summary object from the fitted regression.

mylist3[[7]]

Call:
lm(formula = y ~ x1 + x2, data = adf)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.1051  -5.7792  -0.0997   3.9366  17.7399 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.84084    2.95784  -0.284    0.777    
x1           0.31784    0.07355   4.322 3.75e-05 ***
x2           0.11070    0.01814   6.103 2.13e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.424 on 97 degrees of freedom
Multiple R-squared:  0.3831,    Adjusted R-squared:  0.3704 
F-statistic: 30.12 on 2 and 97 DF,  p-value: 6.666e-11
class(mylist3[[7]])
[1] "summary.lm"

Example uses of lapply:

A function, such as “class” or “print”, can be applied to each element in the list in this way.

lapply(mylist3, class)
$mod1
[1] "summary.lm"

$mod2
[1] "summary.lm"

$mod3
[1] "summary.lm"

$mod4
[1] "summary.lm"

$mod5
[1] "summary.lm"

$mod6
[1] "summary.lm"

$mod7
[1] "summary.lm"

$mod8
[1] "summary.lm"

$mod9
[1] "summary.lm"

$mod10
[1] "summary.lm"

For practical purposes, that is the same as “looping” over the elements like this:

for(i in seq_along(mylist3)){
    print(class(mylist3[[i]]))
}
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"

(The “print()” is needed because, without it, the for loop does not display the output from commands).

Watch out about using for loops. There is social stigma! If you go to StackExchange or the “r-help” list with example code that uses a for loop, you will often be shouted at because for loops are slow in R.

While this is a slight exaggeration, there are cases where clever use of the lapply() iteration structure is faster. Generally, the reason is that R can look at the request and plan ahead for its calculations, while the for loop hides the long-run details from R. Chores like memory allocation cannot be managed so efficiently. Another fact is that “[” and “[[” are decidely slow operators. We are forcing R to talk back and forth from the R runtime, which is written in C, and the user workspace, which is slowed down by the fact that it interactive.

One reason we use lapply is not simply to print things, but to create a new list that has the result of calculations, with each list element treated one-by-one.

coeflist <- lapply(mylist3, coef)
coeflist[1:3]
$mod1
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) -2.5402023 2.24498675 -1.131500 2.606344e-01
x1           0.3434758 0.05184432  6.625138 1.944652e-09
x2           0.1061137 0.01379968  7.689579 1.218261e-11

$mod2
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 0.3238934 3.04772533 0.1062738 9.155846e-01
x1          0.3101999 0.07297289 4.2508923 4.899589e-05
x2          0.1032876 0.02125323 4.8598538 4.518067e-06

$mod3
               Estimate Std. Error   t value     Pr(>|t|)
(Intercept) -5.76945876 2.93551549 -1.965399 5.222797e-02
x1           0.48123950 0.07103721  6.774471 9.673084e-10
x2           0.07685268 0.01726574  4.451166 2.282910e-05
  1. Example using “inline” (anonymous) function.

Somebody said they only want to keep the P values.

pvallist <- lapply(mylist3, function(x){
    mycoefs <- coef(x)
    pvals <- mycoefs[ , "Pr(>|t|)"]
    pvals
})
pvallist
$mod1
 (Intercept)           x1           x2 
2.606344e-01 1.944652e-09 1.218261e-11 

$mod2
 (Intercept)           x1           x2 
9.155846e-01 4.899589e-05 4.518067e-06 

$mod3
 (Intercept)           x1           x2 
5.222797e-02 9.673084e-10 2.282910e-05 

$mod4
 (Intercept)           x1           x2 
7.628776e-01 6.832719e-05 8.846921e-07 

$mod5
 (Intercept)           x1           x2 
9.459785e-01 1.007897e-05 7.730988e-09 

$mod6
 (Intercept)           x1           x2 
5.351649e-01 2.337192e-04 9.783522e-06 

$mod7
 (Intercept)           x1           x2 
7.768047e-01 3.750110e-05 2.134851e-08 

$mod8
 (Intercept)           x1           x2 
3.175908e-01 3.725406e-05 1.386217e-05 

$mod9
 (Intercept)           x1           x2 
2.135304e-02 2.609510e-09 1.827298e-13 

$mod10
 (Intercept)           x1           x2 
1.459893e-01 6.679944e-03 5.533486e-08 
  1. sapply and vapply

The return from that is a series of vectors, we might like to have it as a matrix instead. Many authors suggest the use of R’s “sapply” for that:

sapply(mylist3, function(x){
    mycoefs <- coef(x)
    pvals <- mycoefs[ , "Pr(>|t|)"]
    pvals
    })
                    mod1         mod2         mod3         mod4
(Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01
x1          1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05
x2          1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07
                    mod5         mod6         mod7         mod8
(Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01
x1          1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05
x2          7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05
                    mod9        mod10
(Intercept) 2.135304e-02 1.459893e-01
x1          2.609510e-09 6.679944e-03
x2          1.827298e-13 5.533486e-08

IMPORTANT Note the return is a 3 x 10 matrix, one column for each element. Did you expect that? I expected the transpose.

Although sapply() is widely used, Hadley Wickam suggests instead we focus on learning to use vapply() in Advanced R:

vapply(mylist3, function(x){
    mycoefs <- coef(x)
    pvals <- mycoefs[ , "Pr(>|t|)"]
    pvals
    }, FUN.VALUE = numeric(3))
                    mod1         mod2         mod3         mod4
(Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01
x1          1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05
x2          1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07
                    mod5         mod6         mod7         mod8
(Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01
x1          1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05
x2          7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05
                    mod9        mod10
(Intercept) 2.135304e-02 1.459893e-01
x1          2.609510e-09 6.679944e-03
x2          1.827298e-13 5.533486e-08

Note the difference is the argument FUN.VALUE, where we specify the structure of an individual returned element.

`vapply() is preferred because it is less likely to give us a result we don’t expect. We told it we think each iteration should return a numeric vector with 3 elements, so R knew what to watch for. If the return did not match that criterion, we would have received an error.

Admittedly, the documentation for vapply is poor and I would never have understood the point of this function without reading Advanced R.

  1. Lets get the R-squares!
rsq <- vapply(mylist3, function(x){
    x$r.square
}, FUN.VALUE = numeric(1))
rsq
     mod1      mod2      mod3      mod4      mod5      mod6      mod7 
0.5416602 0.2731373 0.3990317 0.3123826 0.4085839 0.3177255 0.3831451 
     mod8      mod9     mod10 
0.2881965 0.5535587 0.3094512 
hist(rsq, main = "R Square is the only thing I care about",
     xlab = expression(R^2), prob = TRUE)

List miscellaneous

  1. The unlist() function

If a list is a collection of vectors, unlist will take them apart:

alist <- list(1:4, 32:44, rnorm(10))
avec <- unlist(alist)
avec
 [1]  1.00000000  2.00000000  3.00000000  4.00000000 32.00000000
 [6] 33.00000000 34.00000000 35.00000000 36.00000000 37.00000000
[11] 38.00000000 39.00000000 40.00000000 41.00000000 42.00000000
[16] 43.00000000 44.00000000  0.26628675  1.64484304 -0.91627126
[21]  0.41936098 -0.23667887 -1.88187556 -1.57610338 -0.19895519
[26]  1.17037463 -0.07369298
class(avec)
[1] "numeric"
alist <- list(1:4, 32:44, c("Paul", "Joe"))
avec <- unlist(alist)
avec
 [1] "1"    "2"    "3"    "4"    "32"   "33"   "34"   "35"   "36"  
[10] "37"   "38"   "39"   "40"   "41"   "42"   "43"   "44"   "Paul"
[19] "Joe" 
class(avec)
[1] "character"

Sometimes unlisting is more aggressive than we expect. Run unlist(mylist3) and you’ll see what 10 regressions look like when all of their numbers are flattened into a single vector.

  1. Delete a list element

To remove an element from a list, it must be assigned the NULL value:

nonamelist[[3]] <- NULL
nonamelist
[[1]]
[1] 1 2 3

[[2]]
           [,1]        [,2]       [,3]       [,4]
[1,]  0.9078962  0.88526266  0.6956525  1.0344293
[2,]  1.0670807 -0.93670030  1.5539388 -0.9817459
[3,] -0.4885171 -0.01102196  1.7898020  0.2659287
[4,] -1.0072824  0.12531943 -0.7290195 -1.6771190

Session Info

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] knitr_1.14    rmarkdown_1.0

loaded via a namespace (and not attached):
 [1] magrittr_1.5    formatR_1.4     tools_3.3.1     htmltools_0.3.5
 [5] yaml_2.1.13     Rcpp_0.12.6     stringi_1.1.1   stringr_1.1.0  
 [9] digest_0.6.10   evaluate_0.9   

Available under Created Commons license 3.0 CC BY