Paul Johnson, CRMDA <pauljohn@ku.edu>
Please visit http://pj.freefaculty.org/guides
Keywords: R,vectors
September 20 2017
Abstract
A list is an information container. It can hold, literally, any kind of R object. With freedom, you also have responsibility.
List is a diverse collection of R objects. Any R object can be inserted in a list.
A list is highly flexible. In versatility, a list is the complete opposite of an R vector or a matrix.
Recall a vector or matrix must be made up of homogeneous elements. If we add an element in a vector (or matrix), it can happen that the entire vector (or matrix) changes as a result. (Recall inserting a character into a numeric vector?)
Below several methods of inserting elements in lists and extracting them will be discussed.
First, we create a small example list for inspection. This is a named list because I insert a name with each element.
mylist <- list("x" = c(1, 2, 3), "y" = matrix(rnorm(16), 4), "z" = "Paul")
names(mylist)
[1] "x" "y" "z"
length(mylist)
[1] 3
This is an unnamed list:
nonamelist <- list(c(1, 2, 3), matrix(rnorm(16), 4), "Paul")
length(nonamelist)
[1] 3
nonamelist
[[1]]
[1] 1 2 3
[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.60657960 0.8595099 0.8719513 0.4199838
[2,] 0.06135987 -0.9039587 -0.9960967 -0.9790216
[3,] -0.41282739 -0.1912054 1.3025675 -1.0063148
[4,] 0.80153090 -0.9225425 -0.5622973 -0.6288113
[[3]]
[1] "Paul"
You agree it has no names, right?
names(nonamelist)
NULL
The elements of a named list can be accessed either by their name or their index number, while an unnamed list allows access only by the index number.
One will find comments here and there in the literature to suggest that lists will be processed more quickly in R if they do not have named elements.
If you want to remove the names from an object, there are two ways.
unname(mylist)
[[1]]
[1] 1 2 3
[[2]]
[,1] [,2] [,3] [,4]
[1,] 1.3458888 0.03575494 0.04551809 1.6742535
[2,] -1.4894749 0.88974889 0.43397985 0.2414254
[3,] 1.3384985 1.37851213 -0.67408865 1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
[[3]]
[1] "Paul"
or, equivalently,
names(mylist) <- NULL
mylist
[[1]]
[1] 1 2 3
[[2]]
[,1] [,2] [,3] [,4]
[1,] 1.3458888 0.03575494 0.04551809 1.6742535
[2,] -1.4894749 0.88974889 0.43397985 0.2414254
[3,] 1.3384985 1.37851213 -0.67408865 1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
[[3]]
[1] "Paul"
But the gosh darned names are needed for the rest of the presentation, so
names(mylist) <- c("x", "y", "z")
A single-bracket is used to extract subsets from the list, and keep the result as a new list.
mylist2 <- mylist[c(1,3)]
mylist2
$x
[1] 1 2 3
$z
[1] "Paul"
class(mylist2)
[1] "list"
length(mylist2)
[1] 2
The double-bracket [[ is used to copy an object from the list and the result is not a list anymore, it is the object’s type.
I’ll access that element by name first:
mymat1 <- mylist[["y"]]
mymat1
[,1] [,2] [,3] [,4]
[1,] 1.3458888 0.03575494 0.04551809 1.6742535
[2,] -1.4894749 0.88974889 0.43397985 0.2414254
[3,] 1.3384985 1.37851213 -0.67408865 1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
class(mymat1)
[1] "matrix"
Then I access that by list position with an integer index:
mymat2 <- mylist[[2]]
mymat2
[,1] [,2] [,3] [,4]
[1,] 1.3458888 0.03575494 0.04551809 1.6742535
[2,] -1.4894749 0.88974889 0.43397985 0.2414254
[3,] 1.3384985 1.37851213 -0.67408865 1.3984339
[4,] -0.2171958 0.14372388 -0.46434974 -0.1506012
class(mymat2)
[1] "matrix"
identical(mymat1, mymat2)
[1] TRUE
There are two ways to do this. The first is the common, easy way. The second is the faster, more structured way.
mylist1 <- list()
mylist2 <- vector(mode = "list", length = 6)
The major difference between the two types arises when we want to put the lists to use. In the case of mylist1
, we are allowed to add items one by one, either by name or position in the list:
x1 <- c(1, 2, 3)
x2 <- matrix(rnorm(9), ncol = 3)
mylist1[[1]] <- x1
mylist1[["x1"]] <- x1
mylist1[[3]] <- x1
Note that, as far as “mylist1” is concerned, the first item is [[1]], the second item can be found either as [[2]] or [[“x1”]], and the third item is [[3]]:
mylist1
[[1]]
[1] 1 2 3
$x1
[1] 1 2 3
[[3]]
[1] 1 2 3
mylist1[["x1"]]
[1] 1 2 3
mylist1[[2]]
[1] 1 2 3
The list only had 3 elements, but if we insert a 6th element, then R creates NULL elements 4 through 5:
mylist1[[6]] <- x2
mylist1
[[1]]
[1] 1 2 3
$x1
[1] 1 2 3
[[3]]
[1] 1 2 3
[[4]]
NULL
[[5]]
NULL
[[6]]
[,1] [,2] [,3]
[1,] -1.200697 1.37605055 1.1203431
[2,] 1.716917 -0.29286273 0.2108199
[3,] -1.341655 0.07065126 -0.3953875
Rememember that the absence of an element in a list is referred to by the symbol NULL, not NA (as for vectors and matrices).
We find the difference in mylist2 is that we are not allowed to insert named elements into the middle of the list in the same way. Observe that because the list was allocated with elements 1 through 6 as NULL, then inserting a named thing “x1” adds a 7th element in the list:
mylist2[[1]] <- x1
mylist2[["x1"]] <- x1
mylist2[[3]] <- x1
mylist2
[[1]]
[1] 1 2 3
[[2]]
NULL
[[3]]
[1] 1 2 3
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
$x1
[1] 1 2 3
If we want to insert the matrix in the 6th element we can, of course:
mylist2[[6]] <- x2
mylist2
[[1]]
[1] 1 2 3
[[2]]
NULL
[[3]]
[1] 1 2 3
[[4]]
NULL
[[5]]
NULL
[[6]]
[,1] [,2] [,3]
[1,] -1.200697 1.37605055 1.1203431
[2,] 1.716917 -0.29286273 0.2108199
[3,] -1.341655 0.07065126 -0.3953875
$x1
[1] 1 2 3
If we decide we want the elements to be named, we can do so with the names function:
# only insert names for 6th and 7th items:
names(mylist2)[6:7] <- c("x1", "x2")
mylist2
[[1]]
[1] 1 2 3
[[2]]
NULL
[[3]]
[1] 1 2 3
[[4]]
NULL
[[5]]
NULL
$x1
[,1] [,2] [,3]
[1,] -1.200697 1.37605055 1.1203431
[2,] 1.716917 -0.29286273 0.2108199
[3,] -1.341655 0.07065126 -0.3953875
$x2
[1] 1 2 3
names(mylist2)
[1] "" "" "" "" "" "x1" "x2"
Conclusion: If you are going to generate a lot of objects for a list, it is best to allocate the whole list first and fill in the elements with [[index_number]] <- ...
.
If you want a more flexible list, in which you can insert things with names as you go, it is necessary to initiate the list with list()
but insertion of items is slower.
Allocation of memory is slow, so one argument in favor of the second strategy is that we allocate storage in one step. This is more efficient.
I wondered if it really is more efficient. The right thing would be to formalize this as a microbenchmark experiment, but the system.time function gives a quick snapshot:
alist <- list()
system.time(
for(i in 1:10000){
alist[[i]] <- matrix(rnorm(9), ncol = 3)
})
user system elapsed
0.048 0.000 0.048
alist2 <- vector("list", 10000)
system.time(
for(i in 1:10000){
alist2[[i]] <- matrix(rnorm(9), ncol = 3)
})
user system elapsed
0.048 0.000 0.048
There is a middle ground with the second style. We can create a list with 10 elements and then name them. If we do that, then we can insert things by name. Example, create a list with 10 named things for 10 models:
mylist3 <- vector(mode = "list", length = 10)
names(mylist3) <- paste0("mod", 1:10)
mylist3
$mod1
NULL
$mod2
NULL
$mod3
NULL
$mod4
NULL
$mod5
NULL
$mod6
NULL
$mod7
NULL
$mod8
NULL
$mod9
NULL
$mod10
NULL
Now lets run a data-generator 10 times and fill those in:
set.seed(234234)
mdg <- function(N = 100, beta = c(0.1, 0.3, 0.1), stde = 7)
{
e <- rnorm(N, m = 0, sd = stde)
## oops, don't know parm for predictors
x1 <- rnorm(N, m = 40, sd = 10)
x2 <- rnorm(N, m = 20, sd = 40)
y <- beta[1] + beta[2] * x1 + beta[3] * x2 + e
invisible(data.frame(x1, x2, y))
}
for (i in 1:10){
adf <- mdg()
amodel <- lm(y ~ x1 + x2, data = adf)
mylist3[[paste0("mod", i)]] <- summary(amodel)
}
It is pretty easy to verify that each element in this list is a summary object from the fitted regression.
mylist3[[7]]
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-20.1051 -5.7792 -0.0997 3.9366 17.7399
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.84084 2.95784 -0.284 0.777
x1 0.31784 0.07355 4.322 3.75e-05 ***
x2 0.11070 0.01814 6.103 2.13e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.424 on 97 degrees of freedom
Multiple R-squared: 0.3831, Adjusted R-squared: 0.3704
F-statistic: 30.12 on 2 and 97 DF, p-value: 6.666e-11
class(mylist3[[7]])
[1] "summary.lm"
A function, such as “class” or “print”, can be applied to each element in the list in this way.
lapply(mylist3, class)
$mod1
[1] "summary.lm"
$mod2
[1] "summary.lm"
$mod3
[1] "summary.lm"
$mod4
[1] "summary.lm"
$mod5
[1] "summary.lm"
$mod6
[1] "summary.lm"
$mod7
[1] "summary.lm"
$mod8
[1] "summary.lm"
$mod9
[1] "summary.lm"
$mod10
[1] "summary.lm"
For practical purposes, that is the same as “looping” over the elements like this:
for(i in seq_along(mylist3)){
print(class(mylist3[[i]]))
}
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
[1] "summary.lm"
(The “print()” is needed because, without it, the for loop does not display the output from commands).
There is social stigma! If you go to StackExchange or the “r-help” list with example code that uses a for loop, you will often be shouted at because for loops are slow in R.
While this is a slight exaggeration, there are cases where clever use of the lapply()
iteration structure is faster. Generally, the reason is that R can look at the request and plan ahead for its calculations, while the for loop hides the long-run details from R. Chores like memory allocation cannot be managed so efficiently. Another fact is that “[” and “[[” are decidely slow operators. We are forcing R to talk back and forth from the R runtime, which is written in C, and the user workspace, which is slowed down by the fact that it interactive.
lapply(mylist3, print, digits = 10)
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-12.5207964322 -4.3545461801 0.1230453526 3.7729012944
Max
16.5242736595
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.54020229551 2.24498674920 -1.13150 0.26063
x1 0.34347579963 0.05184432045 6.62514 1.9447e-09 ***
x2 0.10611372768 0.01379967990 7.68958 1.2183e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.847931 on 97 degrees of freedom
Multiple R-squared: 0.5416601613, Adjusted R-squared: 0.5322098554
F-statistic: 57.31667991 on 2 and 97 DF, p-value: < 2.220446e-16
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-17.7285432215 -4.7209716896 0.0023519827 5.7563917722
Max
17.1113795685
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.32389337806 3.04772533267 0.10627 0.91558
x1 0.31019989021 0.07297288819 4.25089 4.8996e-05 ***
x2 0.10328758440 0.02125322870 4.85985 4.5181e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.430645 on 97 degrees of freedom
Multiple R-squared: 0.2731372661, Adjusted R-squared: 0.2581504056
F-statistic: 18.22511568 on 2 and 97 DF, p-value: 1.907399024e-07
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-22.8096528641 -4.7922920227 0.0900718621 4.4053471276
Max
12.9871034518
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.76945875885 2.93551549170 -1.96540 0.052228 .
x1 0.48123950081 0.07103720942 6.77447 9.6731e-10 ***
x2 0.07685267720 0.01726573985 4.45117 2.2829e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.577699 on 97 degrees of freedom
Multiple R-squared: 0.399031682, Adjusted R-squared: 0.3866405827
F-statistic: 32.20308958 on 2 and 97 DF, p-value: 1.88062195e-11
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-21.1627423403 -5.0391173329 -0.5532557879 6.1804420259
Max
20.3814234919
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.02870232843 3.40006163247 -0.30255 0.76288
x1 0.33704875107 0.08098420772 4.16191 6.8327e-05 ***
x2 0.10473331942 0.01993670695 5.25329 8.8469e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.129571 on 97 degrees of freedom
Multiple R-squared: 0.3123826193, Adjusted R-squared: 0.2982049414
F-statistic: 22.03341198 on 2 and 97 DF, p-value: 1.292177026e-08
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-13.0677467265 -4.0955268627 -0.1315745805 3.3604074166
Max
16.8711932169
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.17783502535 2.61778484206 0.06793 0.94598
x1 0.29867931513 0.06409512781 4.65994 1.0079e-05 ***
x2 0.10085730097 0.01594194217 6.32654 7.7310e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.370148 on 97 degrees of freedom
Multiple R-squared: 0.4085839063, Adjusted R-squared: 0.3963897601
F-statistic: 33.50656106 on 2 and 97 DF, p-value: 8.646039455e-12
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-17.5779512368 -4.8335654643 -0.7930015582 5.1781659117
Max
15.7758303981
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.79694698649 2.88730731950 0.62236 0.53516491
x1 0.27616306559 0.07225450057 3.82209 0.00023372 ***
x2 0.08253350365 0.01768284080 4.66743 9.7835e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.884836 on 97 degrees of freedom
Multiple R-squared: 0.3177255272, Adjusted R-squared: 0.3036580123
F-statistic: 22.58576084 on 2 and 97 DF, p-value: 8.851511609e-09
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-20.1051445948 -5.7791625639 -0.0997409435 3.9366461291
Max
17.7399387787
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.84084271381 2.95783720623 -0.28428 0.7768
x1 0.31784154261 0.07354669839 4.32163 3.7501e-05 ***
x2 0.11069634381 0.01813677286 6.10342 2.1349e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.424475 on 97 degrees of freedom
Multiple R-squared: 0.3831450917, Adjusted R-squared: 0.3704264338
F-statistic: 30.12464795 on 2 and 97 DF, p-value: 6.666174853e-11
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-20.3242313289 -4.8456073587 0.6412547465 4.8697169217
Max
17.1791398939
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.62973559274 3.61312879174 -1.00460 0.31759
x1 0.36803966757 0.08512798336 4.32337 3.7254e-05 ***
x2 0.09960903877 0.02175250260 4.57920 1.3862e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.838168 on 97 degrees of freedom
Multiple R-squared: 0.2881965449, Adjusted R-squared: 0.273520185
F-statistic: 19.63678643 on 2 and 97 DF, p-value: 6.909783663e-08
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q
-14.3711632770 -4.9939855809 -0.1881222648 4.5687920600
Max
17.3394915591
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.92328380760 2.53171722266 -2.33963 0.021353 *
x1 0.39824272060 0.06069001052 6.56192 2.6095e-09 ***
x2 0.13581236923 0.01589113029 8.54643 1.8273e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.57516 on 97 degrees of freedom
Multiple R-squared: 0.5535587265, Adjusted R-squared: 0.5443537518
F-statistic: 60.13690899 on 2 and 97 DF, p-value: < 2.220446e-16
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-18.868301749 -4.121793320 1.102904359 4.086905736 14.506413612
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.10980533787 2.80416447540 1.46561 0.1459893
x1 0.18211781664 0.06569888406 2.77201 0.0066799 **
x2 0.10008376147 0.01698862019 5.89122 5.5335e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.73159 on 97 degrees of freedom
Multiple R-squared: 0.3094511839, Adjusted R-squared: 0.295213064
F-statistic: 21.73399196 on 2 and 97 DF, p-value: 1.58828222e-08
$mod1
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-12.521 -4.354 0.123 3.773 16.524
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.54020 2.24499 -1.131 0.261
x1 0.34348 0.05184 6.625 1.94e-09 ***
x2 0.10611 0.01380 7.690 1.22e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.848 on 97 degrees of freedom
Multiple R-squared: 0.5417, Adjusted R-squared: 0.5322
F-statistic: 57.32 on 2 and 97 DF, p-value: < 2.2e-16
$mod2
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-17.7285 -4.7210 0.0024 5.7564 17.1114
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.32389 3.04773 0.106 0.916
x1 0.31020 0.07297 4.251 4.90e-05 ***
x2 0.10329 0.02125 4.860 4.52e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.431 on 97 degrees of freedom
Multiple R-squared: 0.2731, Adjusted R-squared: 0.2582
F-statistic: 18.23 on 2 and 97 DF, p-value: 1.907e-07
$mod3
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-22.8097 -4.7923 0.0901 4.4053 12.9871
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.76946 2.93552 -1.965 0.0522 .
x1 0.48124 0.07104 6.774 9.67e-10 ***
x2 0.07685 0.01727 4.451 2.28e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.578 on 97 degrees of freedom
Multiple R-squared: 0.399, Adjusted R-squared: 0.3866
F-statistic: 32.2 on 2 and 97 DF, p-value: 1.881e-11
$mod4
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-21.1627 -5.0391 -0.5533 6.1804 20.3814
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.02870 3.40006 -0.303 0.763
x1 0.33705 0.08098 4.162 6.83e-05 ***
x2 0.10473 0.01994 5.253 8.85e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.13 on 97 degrees of freedom
Multiple R-squared: 0.3124, Adjusted R-squared: 0.2982
F-statistic: 22.03 on 2 and 97 DF, p-value: 1.292e-08
$mod5
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-13.0677 -4.0955 -0.1316 3.3604 16.8712
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.17784 2.61778 0.068 0.946
x1 0.29868 0.06410 4.660 1.01e-05 ***
x2 0.10086 0.01594 6.327 7.73e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.37 on 97 degrees of freedom
Multiple R-squared: 0.4086, Adjusted R-squared: 0.3964
F-statistic: 33.51 on 2 and 97 DF, p-value: 8.646e-12
$mod6
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-17.578 -4.834 -0.793 5.178 15.776
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.79695 2.88731 0.622 0.535165
x1 0.27616 0.07225 3.822 0.000234 ***
x2 0.08253 0.01768 4.667 9.78e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.885 on 97 degrees of freedom
Multiple R-squared: 0.3177, Adjusted R-squared: 0.3037
F-statistic: 22.59 on 2 and 97 DF, p-value: 8.852e-09
$mod7
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-20.1051 -5.7792 -0.0997 3.9366 17.7399
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.84084 2.95784 -0.284 0.777
x1 0.31784 0.07355 4.322 3.75e-05 ***
x2 0.11070 0.01814 6.103 2.13e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.424 on 97 degrees of freedom
Multiple R-squared: 0.3831, Adjusted R-squared: 0.3704
F-statistic: 30.12 on 2 and 97 DF, p-value: 6.666e-11
$mod8
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-20.3242 -4.8456 0.6413 4.8697 17.1791
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.62974 3.61313 -1.005 0.318
x1 0.36804 0.08513 4.323 3.73e-05 ***
x2 0.09961 0.02175 4.579 1.39e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.838 on 97 degrees of freedom
Multiple R-squared: 0.2882, Adjusted R-squared: 0.2735
F-statistic: 19.64 on 2 and 97 DF, p-value: 6.91e-08
$mod9
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-14.3712 -4.9940 -0.1881 4.5688 17.3395
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.92328 2.53172 -2.340 0.0214 *
x1 0.39824 0.06069 6.562 2.61e-09 ***
x2 0.13581 0.01589 8.546 1.83e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.575 on 97 degrees of freedom
Multiple R-squared: 0.5536, Adjusted R-squared: 0.5444
F-statistic: 60.14 on 2 and 97 DF, p-value: < 2.2e-16
$mod10
Call:
lm(formula = y ~ x1 + x2, data = adf)
Residuals:
Min 1Q Median 3Q Max
-18.868 -4.122 1.103 4.087 14.506
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.10981 2.80416 1.466 0.14599
x1 0.18212 0.06570 2.772 0.00668 **
x2 0.10008 0.01699 5.891 5.53e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.732 on 97 degrees of freedom
Multiple R-squared: 0.3095, Adjusted R-squared: 0.2952
F-statistic: 21.73 on 2 and 97 DF, p-value: 1.588e-08
The first argument to print MUST BE the element pulled from mylist3, while arguments passed to print are named after.
The digits argument is COMMON across the calls to print. It is not “vectorized”.
One reason we use lapply is not simply to print things, but to create a new list that has the result of calculations, with each list element treated one-by-one.
coeflist <- lapply(mylist3, coef)
coeflist[1:3]
$mod1
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.5402023 2.24498675 -1.131500 2.606344e-01
x1 0.3434758 0.05184432 6.625138 1.944652e-09
x2 0.1061137 0.01379968 7.689579 1.218261e-11
$mod2
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3238934 3.04772533 0.1062738 9.155846e-01
x1 0.3101999 0.07297289 4.2508923 4.899589e-05
x2 0.1032876 0.02125323 4.8598538 4.518067e-06
$mod3
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.76945876 2.93551549 -1.965399 5.222797e-02
x1 0.48123950 0.07103721 6.774471 9.673084e-10
x2 0.07685268 0.01726574 4.451166 2.282910e-05
Somebody said they only want to keep the P values.
pvallist <- lapply(mylist3, function(x){
mycoefs <- coef(x)
pvals <- mycoefs[ , "Pr(>|t|)"]
pvals
})
pvallist
$mod1
(Intercept) x1 x2
2.606344e-01 1.944652e-09 1.218261e-11
$mod2
(Intercept) x1 x2
9.155846e-01 4.899589e-05 4.518067e-06
$mod3
(Intercept) x1 x2
5.222797e-02 9.673084e-10 2.282910e-05
$mod4
(Intercept) x1 x2
7.628776e-01 6.832719e-05 8.846921e-07
$mod5
(Intercept) x1 x2
9.459785e-01 1.007897e-05 7.730988e-09
$mod6
(Intercept) x1 x2
5.351649e-01 2.337192e-04 9.783522e-06
$mod7
(Intercept) x1 x2
7.768047e-01 3.750110e-05 2.134851e-08
$mod8
(Intercept) x1 x2
3.175908e-01 3.725406e-05 1.386217e-05
$mod9
(Intercept) x1 x2
2.135304e-02 2.609510e-09 1.827298e-13
$mod10
(Intercept) x1 x2
1.459893e-01 6.679944e-03 5.533486e-08
sapply
and vapply
Return from list is always a list. Sometimes it can be
Many authors suggest the use of R’s “sapply” for that:
sapply(mylist3, function(x){
mycoefs <- coef(x)
pvals <- mycoefs[ , "Pr(>|t|)"]
pvals
})
mod1 mod2 mod3 mod4
(Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01
x1 1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05
x2 1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07
mod5 mod6 mod7 mod8
(Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01
x1 1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05
x2 7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05
mod9 mod10
(Intercept) 2.135304e-02 1.459893e-01
x1 2.609510e-09 6.679944e-03
x2 1.827298e-13 5.533486e-08
IMPORTANT Note the return is a 3 x 10 matrix, one column for each element. Did you expect that? I expected the transpose.
Although sapply()
is widely used, Hadley Wickam suggests instead we focus on learning to use vapply()
in Advanced R:
vapply(mylist3, function(x){
mycoefs <- coef(x)
pvals <- mycoefs[ , "Pr(>|t|)"]
pvals
}, FUN.VALUE = numeric(3))
mod1 mod2 mod3 mod4
(Intercept) 2.606344e-01 9.155846e-01 5.222797e-02 7.628776e-01
x1 1.944652e-09 4.899589e-05 9.673084e-10 6.832719e-05
x2 1.218261e-11 4.518067e-06 2.282910e-05 8.846921e-07
mod5 mod6 mod7 mod8
(Intercept) 9.459785e-01 5.351649e-01 7.768047e-01 3.175908e-01
x1 1.007897e-05 2.337192e-04 3.750110e-05 3.725406e-05
x2 7.730988e-09 9.783522e-06 2.134851e-08 1.386217e-05
mod9 mod10
(Intercept) 2.135304e-02 1.459893e-01
x1 2.609510e-09 6.679944e-03
x2 1.827298e-13 5.533486e-08
Note the difference is the argument FUN.VALUE, where we specify the structure of an individual returned element.
vapply()
is preferred because it is less likely to give us a result we don’t expect. We told it we think each iteration should return a numeric vector with 3 elements, so R knew what to watch for. If the return did not match that criterion, we would have received an error.
Admittedly, the documentation for vapply is poor and I would never have understood the point of this function without reading Advanced R.
rsq <- vapply(mylist3, function(x){
x$r.square
}, FUN.VALUE = numeric(1))
rsq
mod1 mod2 mod3 mod4 mod5 mod6 mod7
0.5416602 0.2731373 0.3990317 0.3123826 0.4085839 0.3177255 0.3831451
mod8 mod9 mod10
0.2881965 0.5535587 0.3094512
hist(rsq, main = "R Square is the only thing I care about",
xlab = expression(R^2), prob = TRUE)
If a list is a collection of vectors, unlist will take them apart:
alist <- list(1:4, 32:44, rnorm(10))
avec <- unlist(alist)
avec
[1] 1.00000000 2.00000000 3.00000000 4.00000000 32.00000000
[6] 33.00000000 34.00000000 35.00000000 36.00000000 37.00000000
[11] 38.00000000 39.00000000 40.00000000 41.00000000 42.00000000
[16] 43.00000000 44.00000000 0.26628675 1.64484304 -0.91627126
[21] 0.41936098 -0.23667887 -1.88187556 -1.57610338 -0.19895519
[26] 1.17037463 -0.07369298
class(avec)
[1] "numeric"
alist <- list(1:4, 32:44, c("Paul", "Joe"))
avec <- unlist(alist)
avec
[1] "1" "2" "3" "4" "32" "33" "34" "35" "36"
[10] "37" "38" "39" "40" "41" "42" "43" "44" "Paul"
[19] "Joe"
class(avec)
[1] "character"
Sometimes unlisting is more aggressive than we expect. Run unlist(mylist3)
and you’ll see what 10 regressions look like when all of their numbers are flattened into a single vector.
To remove an element from a list, it must be assigned the NULL value:
nonamelist[[3]] <- NULL
nonamelist
[[1]]
[1] 1 2 3
[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.60657960 0.8595099 0.8719513 0.4199838
[2,] 0.06135987 -0.9039587 -0.9960967 -0.9790216
[3,] -0.41282739 -0.1912054 1.3025675 -1.0063148
[4,] 0.80153090 -0.9225425 -0.5622973 -0.6288113
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] crmda_0.45
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 digest_0.6.12 rprojroot_1.2 plyr_1.8.4
[5] xtable_1.8-2 backports_1.1.0 magrittr_1.5 evaluate_0.10.1
[9] stringi_1.1.5 openxlsx_4.0.17 rmarkdown_1.6 tools_3.4.1
[13] stringr_1.2.0 kutils_1.21 yaml_2.1.14 compiler_3.4.1
[17] htmltools_0.3.6 knitr_1.17 methods_3.4.1
Available under Created Commons license 3.0