Paul Johnson, CRMDA <pauljohn@ku.edu>

Please visit http://pj.freefaculty.org/guides

Keywords: R,vectors

September 14 2017

Abstract

Vectors are a vital data storage tool in R. They can be created, extracted, and re-organized in a number of ways.

- integer
- double (floating point numeric)
- character
- logical (TRUE or FALSE represent 1 and 0) AKA Boolean

- complex
- raw

**Atomic**: a single “column” of information

**Factors**: are not vectors in R, they are more complicated variables. For same reason, ordered factors are not vectors.

I guessed that Date objects are vectors, but not according to the help page `?vector`

- matrix
- list
- data.frame

A matrix is still “atomic” because we can conceptualize it as a vector that is broken into columns. Not true of lists or data frames.

Authors often introduce vectors by the c() function. Here’s a vector, for example

`x <- c(13, 2, 33, 4, 35)`

x is a column vector, mathematically speaking, even though it prints out horizontally to “save space”.

`x`

`[1] 13 2 33 4 35`

```
y <- 5
is.vector(y)
```

`[1] TRUE`

It is not necessary to type y[1] to obtain the value, however. But is allowed:

`identical(y[1], y)`

`[1] TRUE`

Retrieve elements one a time.

`x[1]`

`[1] 13`

`x[5]`

`[1] 35`

Use an index vector

`x[c(3, 4, 5)]`

`[1] 33 4 35`

Can separate calculation of the index (make 2 steps)

```
indx <- c(2, 4)
x[indx]
```

`[1] 2 4`

`x[-4]`

`[1] 13 2 33 35`

`x[c(-3,-4)]`

`[1] 13 2 35`

Pull out items 1 and 4 by setting them as true

```
indx <- c(TRUE, FALSE, FALSE, TRUE, FALSE)
x[indx]
```

`[1] 13 4`

I’ll use those “TRUE” values to filter the values from x which are greater than 0

```
xgt0 <- x > 0
x2 <- x[xgt0]
x2
```

`[1] 13 2 33 4 35`

Often, we’d do that selection in one step, but you don’t understand what’s happening unless you do the 2 separate steps (good both for novices and bug-checkers).

```
x2 <- x[x>0]
x2
```

`[1] 13 2 33 4 35`

Could use `which()`

to achieve same

```
xwh <- which(x > 0)
x[xwh]
```

`[1] 13 2 33 4 35`

```
#or
x[which(x > 0)]
```

`[1] 13 2 33 4 35`

`c()`

is a friend and an enemy.Reasons why necomers like c()

- convenient
- hyper-flexible: can throw together anything
- often does what we want
- can create named vector easily

`c()`

is brief, easy to remember

`c()`

might stand for “combine”, “collect”, “concatenate”

Often works as expected, saves work that might be boring/repetitive.

When I said `c()`

is flexible, I had in mind that

- it asks for additional memory and combines vectors gracefully).

```
x1 <- c(33, 22)
x2 <- c(55.1, 55, 58, 11, 12)
x3 <- c(x1, x2)
x3
```

`[1] 33.0 22.0 55.1 55.0 58.0 11.0 12.0`

- The number of elements in x1 and x2 must be counted
- Memory must be requested for a vector equal to the requirement.
- The individual elements must be copied into the newly allocated values.

`c()`

is very helpful because it can, literally, combine completely different kinds of things and give a sensible result. (That’s*pleasant*and*dangerous*)

`z <- c("beta0" = 0.1, "beta1" = 1.1, "beta2" = 0.04)`

Note the quotations are not necessary on the names, I am just accustomed to typing them. Previous is equivalent to running one command to create the vector and then using the assignment version of `names(z2)`

to attach the names.

```
z2 <- c(0.1, 1.1, 0.04)
names(z2) <- c("beta0", "beta1", "beta2")
z2
```

```
beta0 beta1 beta2
0.10 1.10 0.04
```

In real life, I’d avoid so much typing by pasting the names together with a statement like

```
z2 <- c(0.1, 1.1, 0.04)
names(z2) <- paste0("beta", 0:2)
z2
```

```
beta0 beta1 beta2
0.10 1.10 0.04
```

Named vectors cause some calculations to go slower in R, we would not make a huge structure with named elements. However, for small-medium vectors, named vectors are often very convenient. Naming the elements reduces the danger of accessing the wrong value by a numeric index. We also benefit by keeping a cleaner workspace. We avoid creating separate values for \(\beta_0\), \(\beta_1\) and so forth, we just retrieve them by name if we need them:

`z["beta0"]`

```
beta0
0.1
```

If the names get in your way, use the `unname`

function

`unname(z)`

`[1] 0.10 1.10 0.04`

The `c()`

function also has a superpower feature, the recursive argument. If recursive is true, then c() will dig through lists (not discussed here) and pull out their individual elements.

`c()`

?There’s a difference between an integer and a floating point number, right? The difference is much bigger in computer math than in pencil and paper math.

Why the difference? Computers use 0’s and 1’s to record numbers. The integer \(1\) is \(63\) \(0\)’s followed by a \(1\). The integer \(3\) is \(62\) \(0\)’s followed by \(11\). Integers are exact!

Floating point numbers are approximations built on, say, 64 bit values. A number which appears as 3 on the screen might in fact be 2.999999999234 because of *rounding error*.

- Integer comparisons are OK, can use “==” and “!=” for equal and not equal.

```
x <- c(5L, 10L, 15L, 20L, 25L, 30L)
y <- seq(5L, 30L, 5L)
x == y
```

`[1] TRUE TRUE TRUE TRUE TRUE TRUE`

The “L” means “long integer”. In R, all integers are “long” (64 bits).

The `identical()`

function can be used to compare entire vectors.

`identical(x, y)`

`[1] TRUE`

- declare x as an integer before assigning values.

```
x <- integer(5)
## same as
## x <- vector(mode = "integer", length = 5)
```

Then we have a somewhat stupid chore of putting values into x

```
x[1] <- 13L
x[2] <- 2L
x[3] <- 33L
x[4] <- 4L
x[5] <- 35L
```

`is.integer(x)`

`[1] TRUE`

That is tedious.

Are the “L”’s needed? Apparently yes. Observe:

```
x <- integer(5)
x[1] <- 13
x[2] <- 2
x[3] <- 33
x[4] <- 4
x[5] <- 35
is.integer(x)
```

`[1] FALSE`

- In the usual situation, people might use “
*coercion*” after creating x.

```
x <- c(13, 2, 33, 4, 35)
x <- as.integer(x)
```

In this case, the *coercion* appears to be harmless.

Sometimes, the coercion is not so harmless. In effect, it “rounds down”.

```
x <- c(13, 2, 33, 4, 35.0001)
x <- as.integer(x)
x
```

`[1] 13 2 33 4 35`

R has functions `floor()`

and `round()`

if you really do intend that to happen.

The computer treats math with integers in a different way than with floating point values. If values truly are integers, OK! If one is a float, watch out!

- Floating point number problems

We can’t feel too terrifically confident that a number which appears as 1.0 (a floating point) is equal to 1L (an integer).

This example seems not too worrisome

```
x <- 5
y <- c(4L, 5L, 6L)
x == y
```

`[1] FALSE TRUE FALSE`

```
z <- c(4, 5, 6)
y == z
```

`[1] TRUE TRUE TRUE`

I don’t know why z is seen as equal to y, it seems to me it is not, as we deduce from

`identical(y, z)`

`[1] FALSE`

But look at this horrifying example from the help page `?all.equal`

```
x <- pi*(1/4 + 1:10)
xtan <- tan(x)
## Looks like integers
xtan
```

` [1] 1 1 1 1 1 1 1 1 1 1`

`is.integer(xtan)`

`[1] FALSE`

`xtan == 1L`

` [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE`

As a result, we conclude **comparisons between floating point numbers are strongly discouraged**. R’s `all.equal()`

and `zapsmall()`

functions are intended to help with comparison of floating point values.

`zapsmall(xtan) == 1L`

` [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE`

If we use it unthinkingly, c() will destroy data (or, well, alter it unexpectedly).

Suppose we have some values and there is a missing score, which we accidentally represent as “NA”.

`x3 <- c(1, 2, 3, "NA", 5)`

What is x now?

`is.integer(x3)`

`[1] FALSE`

`is.double(x3)`

`[1] FALSE`

`is.character(x3)`

`[1] TRUE`

`x3`

`[1] "1" "2" "3" "NA" "5" `

What did I mean to do? Use the R symbol NA, without quotes, to indicate that the fourth score was missing.

```
x4 <- c(1, 2, 3, NA, 5)
is.character(x4)
```

`[1] FALSE`

`x4`

`[1] 1 2 3 NA 5`

`is.na(x4)`

`[1] FALSE FALSE FALSE TRUE FALSE`

The return value from `is.na()`

is an example of a logical vector, the values are either TRUE or FALSE. Those are symbolic equivalents of 0 and 1. See?

```
x4missing <- is.na(x4)
x4missing == 1
```

`[1] FALSE FALSE FALSE TRUE FALSE`

Many (not all) functions in R are vectorized. It is not necessary to apply a function individually to the elements (say, in a “for loop”). Instead, we handle a whole vector in one step.

```
x1 <- 1:10
3 * x1
```

` [1] 3 6 9 12 15 18 21 24 27 30`

`log(x1)`

```
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595
[7] 1.9459101 2.0794415 2.1972246 2.3025851
```

`sqrt(x1)`

```
[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751
[8] 2.828427 3.000000 3.162278
```

`exp(x1)`

```
[1] 2.718282 7.389056 20.085537 54.598150 148.413159
[6] 403.428793 1096.633158 2980.957987 8103.083928 22026.465795
```

Similarly, addition, subtraction, and multiplication are vectorized

```
x2 <- 55:64
x1 + x2
```

` [1] 56 58 60 62 64 66 68 70 72 74`

`x2 - x1`

` [1] 54 54 54 54 54 54 54 54 54 54`

`0.1 * x2 - x1`

` [1] 4.5 3.6 2.7 1.8 0.9 0.0 -0.9 -1.8 -2.7 -3.6`

The symbol “*" indicates ‘term wise’ multiplication. It is not an “inner product” or “dot product” as in linear algebra.

- Random number generators

```
set.seed(234234)
x <- rnorm(10)
head(x)
```

`[1] -0.1308295 -0.6777994 0.1435791 -0.4879708 -0.1845969 0.5976032`

`is.vector(x)`

`[1] TRUE`

`is.double(x)`

`[1] TRUE`

Head is shortcut for `x[1:6]`

, see `?head`

Sequence

`seq()`

Replicate

`rep()`

Logical comparisons create logical vectors.

```
xgt0 <- x > 0
head(xgt0)
```

`[1] FALSE FALSE TRUE FALSE FALSE TRUE`

`is.logical(xgt0)`

`[1] TRUE`

The `cbind`

and `rbind`

functions are the vector-wise equivalents of `c()`

. These are both 1) handy and 2) dangerous.

A vector is, by definition, a column structure. Lets make 2 columns and bind them together.

```
x1 <- 1:5
x2 <- seq(100, 180, by = 20)
X <- cbind(x1, x2)
X
```

```
x1 x2
[1,] 1 100
[2,] 2 120
[3,] 3 140
[4,] 4 160
[5,] 5 180
```

The object `X`

is a matrix, which we will discuss in a separate set of notes.

`class(X)`

`[1] "matrix"`

We don’t want go get bogged-down now here about what a matrix is, or what a “class” is in R, or how a matrix is different from a vector. We will get bogged-down in that later.

- The unintended “demotion” or “promotion” of variable types occurs, as in
`c()`

. All of the columns may be altered by a single character in one of them.

```
x1 <- c(1, 2, 3, "NA", 5)
x2 <- seq(100, 180, by = 20)
X <- cbind(x1, x2)
X
```

```
x1 x2
[1,] "1" "100"
[2,] "2" "120"
[3,] "3" "140"
[4,] "NA" "160"
[5,] "5" "180"
```

`mode(X)`

`[1] "character"`

- “Recycling” will re-use values in a sometimes unexpected way:

```
x1 <- c(1, 2, NA)
x2 <- seq(100, 180, by = 20)
X <- cbind(x1, x2)
```

```
Warning in cbind(x1, x2): number of rows of result is not a multiple
of vector length (arg 1)
```

`X`

```
x1 x2
[1,] 1 100
[2,] 2 120
[3,] NA 140
[4,] 1 160
[5,] 2 180
```

We do see the warning there, but this is very dangerous behavior. It is an example of why it is not recommended to turn off warnings (or develop the habit of ignoring them).

`rbind`

stands for “row” bind.

When I first applied `rbind`

to two (column) vectors,

```
x <- c(1, 2, 3)
y <- c(4, 5, 6)
```

I expected the result would be a column `(1, 2, 3, 4, 5, 6)`

. I was (mistakenly) expecting that, since both x and y are (column) vectors, R would treat them that way.

However, the behavior of rbind is different, entirely!

`rbind(x, y)`

```
[,1] [,2] [,3]
x 1 2 3
y 4 5 6
```

That’s was a surprise to me. What happened? When we gave the two vectors to `rbind()`

, R was thinking to itself “Ah, they must want me to treat those two vectors as rows!”.

And why would R have a right to think so? If I want to “stack together” two column vectors, I ought to use the `c()`

function. That’s what `c()`

is actually intended for, after all!

`c(x, y)`

`[1] 1 2 3 4 5 6`

The other lesson in this is that although vectors in R are generally thought of as column vectors, *you can’t take that to the bank*. Simply put, always do your best to double-check calculations to make sure you are getting what you expect.

Vectors are columns. In R, they are a separate type of storage. Remember they are columns.

**Question** What is the transpose of a column?

**Answer** A row.

But in R there is no such thing as a “row vector”. So what do we receive if we use the “transpose” operator on a column vector?

```
x <- c(10, 11, 12, 13, 14, 15, 16)
x
```

`[1] 10 11 12 13 14 15 16`

```
xt <- t(x)
xt
```

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 10 11 12 13 14 15 16
```

`class(xt)`

`[1] "matrix"`

In R, the only way to represent a “row” is to talk about a matrix with just one row. That’s an important technical difference because R has vectors as columns, but, as we shall see, it also has matrices with only one column in them, but those one column matrices are not equivalent to an R vector in many ways.

The actual storage work is handled in C, where the term “type” is used for variables. The types are “int” (integer), “long” (integer that can hold more values), “float” (floating point real number), “double” (a double-precision floating point number), and so forth. In the R documentation, these are referred to as “types” (or the very closely related “storage modes”).

The reason for inserting this section is the ambiguity between the terms “numeric” and “double” (or double width floating point value) in various contexts.

Many R users will never concern themselves with type or storage mode, but they will be interested in the “**class**” of an R object. The idea of object “class” frames almost all of the R user’s day-to-day interaction with R.

R marks each object with an attribute called “class” and that attribute is used by the R runtime system to make good guesses about what users need when they make requests. The term “class” embraces a much wider type of data structures than the “integer” “double”, “character” storage mode family. These classes are the structures that have made S and R famous, including factors, Dates, lists, data frames, and matrices. These things, of course, have to exist in memory with a certain structure, but since there are no built-in C equivalents of lists or dates, there is no danger of confusion.

There is no confusion about the meaning of storage mode or class in the cases of “character” and “logical” variables. The R classes “character” and “logical” are exactly what you expect. They are vectors for which the storage mode is “character” or “logical”. There’s no trouble.

Consider a logical vector. I believe the output from coercion into other types is mostly understandable.

```
x <- c(TRUE, FALSE, FALSE, TRUE)
is.logical(x)
```

`[1] TRUE`

`as.character(x)`

`[1] "TRUE" "FALSE" "FALSE" "TRUE" `

`as.integer(x)`

`[1] 1 0 0 1`

See the “Note on names” in the help page “?numeric”. The confusion is that the name “numeric” sometimes means “floating point double precision numbers” while sometimes it includes both integers and floating-point numbers. The treatment is different in the older family of S3 functions. In S4 family, numeric means double-precision floating point values.

We will demonstrate the difference by starting with that logical vector.

```
x <- c(TRUE, FALSE, FALSE, TRUE)
z <- as.numeric(x)
z
```

`[1] 1 0 0 1`

`is.integer(z)`

`[1] FALSE`

`is.double(z)`

`[1] TRUE`

The 0’s and 1’s in z represent floating point values, not integer 0L and 1L. The `as.numeric`

function always generates a floating point value, even though we might wish we could have integer 0L and 1L.

Now lets try the same exercise from another direction. The ambiguity of “numeric” will reveal itself.

```
x <- c(TRUE, FALSE, FALSE, TRUE)
z <- as.integer(x)
z
```

`[1] 1 0 0 1`

`is.numeric(z)`

`[1] TRUE`

`is.double(z)`

`[1] FALSE`

The difference between “is.numeric” and “as.numeric” flows from the fact that as.numeric always creates floating point numbers, while “is.numeric” returns TRUE if the storage mode of the vector is integer or floating point. Those are all “numbers”, especially when we need to differentiate them from character or logical variables.

`c`

General purpose concatenator often used to allocate vectors`vector()`

: allocates space for a vector of given type. Same as functions`double()`

, `integer()``, and so forth.`is.___`

functions are for checking a thing’s`as.___`

family is for coercing a variable of one type into another class.`as.integer()`

,`as.double()`

,`as.logical()`

. A general purpose “as()” function can be used instead, with arguments.1:10 is shorthand for seq(1L, 10L, 1L)

```
x1 <- 1:10
is.integer(x1)
```

`[1] TRUE`

```
x2 <- seq(1L, 10L, 1L)
identical(x1, x2)
```

`[1] TRUE`

```
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] crmda_0.44
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 digest_0.6.12 rprojroot_1.2 plyr_1.8.4
[5] xtable_1.8-2 backports_1.1.0 magrittr_1.5 evaluate_0.10
[9] stringi_1.1.5 openxlsx_4.0.17 rmarkdown_1.6 tools_3.4.1
[13] stringr_1.2.0 kutils_1.19 yaml_2.1.14 compiler_3.4.1
[17] htmltools_0.3.6 knitr_1.16 methods_3.4.1
```

Available under Created Commons license 3.0