6  Logical Vectors

6.1 Warm-Up

Run each line of code. What does each line do?

x <- runif(5)
mean(x)
x > mean(x)
mean(x > mean(x))

6.2 Outcomes

Objective: To compare vectors with logical operators and use logical-to-numeric coercion for summarizing data.

Why it matters: Logical vectors are used in common data wrangling applications, such as creating new variables, recoding existing variables, and subsetting datasets.

Learning outcomes:

Fundamental Skills Extended Skills
  • Compare two vectors.

  • Combine multiple comparisons with Boolean operators.

  • Coerce logical values to numeric values to summarize data.

  • Index a vector with a logical vector.

Key values and operators:

TRUE
FALSE
NA
>
>=
<
<=
==
!=
&
|
!
%in%
ifelse()

6.3 Logical Values

R has three logical values:

  • TRUE
  • FALSE
  • NA (missing)

TRUE and FALSE can also be written as T and F. You will often see or write this shorthand within function arguments:

sample(1:10, 100, replace = T)

In modeling, logical vectors are called indicator or dummy variables. They dichotomize continuous or categorical vectors into binary vectors: where a condition holds, and where it does not hold.

6.4 Comparison Operators

Comparison operators help us apply a test to a vector to produce a logical vector.

Create a vector with the numbers 1-5 and compare it to 3.

x <- 1:5
x
[1] 1 2 3 4 5
x > 3 # greater than
[1] FALSE FALSE FALSE  TRUE  TRUE
x >= 3 # greater than or equal to
[1] FALSE FALSE  TRUE  TRUE  TRUE
x < 3 # less than
[1]  TRUE  TRUE FALSE FALSE FALSE
x <= 3 # less than or equal to
[1]  TRUE  TRUE  TRUE FALSE FALSE
x == 3 # equal to
[1] FALSE FALSE  TRUE FALSE FALSE
x != 3 # not equal to
[1]  TRUE  TRUE FALSE  TRUE  TRUE

Our operators for “greater/less than or equal to” are literally the greater/less than symbol (< or >) with an equals sign (=).

To test equality, we need to use two equals signs (==). This is because assignment can be done not only with the assignment operator (<-) but also the single equals sign (=). If we run x = 3, we would assign the value of 3 to x!

6.5 Boolean Operators

Boolean operators are used to combine multiple logical vectors.

The three main Boolean operators are AND (&), OR (|), and NOT (!).

AND checks whether all are TRUE.

T & T
[1] TRUE
T & F
[1] FALSE
F & F
[1] FALSE

OR checks whether at least one is TRUE.

T | T
[1] TRUE
T | F
[1] TRUE
F | F
[1] FALSE

In these examples, AND and OR agree when all values are T or F, but they disagree when there is a mix. For T & F, AND returns FALSE because not all of them are TRUE. For T | F, OR returns TRUE because there is at least one TRUE.

NOT switches TRUE and FALSE.

!T
[1] FALSE
!F
[1] TRUE

We can combine multiple comparison operators with Boolean operators. For example, if we want to find values of x in the range 2-4, we would combine two statements about the minimum and maximum of the range: values greater than or equal to 2 AND less than or equal to 4:

x >= 2 & x <= 4
[1] FALSE  TRUE  TRUE  TRUE FALSE

Or we can find where x is equal to 1 or 3:

x == 1 | x == 3
[1]  TRUE FALSE  TRUE FALSE FALSE

Note we have to repeat the entire condition. Something like x == 1 | 3 would not work. If we want to check if the values in x are in some set of values, our code can start to become very long:

x == 1 | x == 2 | x == 3 | x == 4
[1]  TRUE  TRUE  TRUE  TRUE FALSE

6.6 %in%

Instead, use the value matching operator, %in%. This operator takes one argument on the left and one on the right. For each value on the left, it checks whether it is found anywhere in the set on the right. That is, there is no recycling of the argument on the right. Each element on the left is checked against every element on the right.

Rewrite the last example with %in%:

x %in% 1:4
[1]  TRUE  TRUE  TRUE  TRUE FALSE

On the right, create a vector with c() or seq() (or anything else):

x %in% c(1, 3, 5)
[1]  TRUE FALSE  TRUE FALSE  TRUE
x %in% seq(1, 5, 2)
[1]  TRUE FALSE  TRUE FALSE  TRUE

Logical operators work with more than numbers. The built-in vector LETTERS has the uppercase letters A-Z (see ?Constants). Find those that are vowels:

LETTERS %in% c("A", "E", "I", "O", "U")
 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
[13] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
[25] FALSE FALSE

This returns a logical vector, but we may instead want the values associated with each of those TRUEs. We can index a vector with a logical vector with the format vector_to_index[logical_vector]:

LETTERS[LETTERS %in% c("A", "E", "I", "O", "U")]
[1] "A" "E" "I" "O" "U"

Where the condition is TRUE, the value is returned. Where it is FALSE, nothing is returned.

We can also index a different vector with that logical vector, provided it is the same length. Find elements of the lowercase letters where LETTERS has a vowel:

letters[LETTERS %in% c("A", "E", "I", "O", "U")]
[1] "a" "e" "i" "o" "u"

6.7 ifelse()

We can use logical operators to not just identify elements where a condition is TRUE, but recode it. This allows us to dichotomize continuous and categorical variables, top-code and/or bottom-code variables, and change certain values to other values or to missing.

The ifelse() function takes three arguments:

  • test: a logical test that returns TRUE or FALSE
  • yes: the value to return if test is TRUE
  • no: the value to return if test is FALSE

Continue to use x, which has the numbers 1-5.

Test whether x is equal to 5:

x == 5
[1] FALSE FALSE FALSE FALSE  TRUE

Now insert that as the test argument in ifelse(). Where there is a TRUE, R will return the yes value. Where there is a FALSE, R will return the no value. Return 1 for TRUE and 0 for FALSE:

ifelse(x == 5, 1, 0)
[1] 0 0 0 0 1

Top-code x so that where x is greater than 4, 4 is returned. Otherwise, return the current value. To do that, set the no argument to be x:

ifelse(x > 4, 4, x)
[1] 1 2 3 4 4

Similarly, bottom-code the data to impose a minimum value of 3:

ifelse(x < 3, 3, x)
[1] 3 3 3 4 5

Combine multiple conditions with Boolean operators to selectively recode some values. If x is outside the range 2-4, return NA:

ifelse(x >= 2 & x <= 4, x, NA)
[1] NA  2  3  4 NA

Instead of recoding known values as missing, we may want to recode missing values as something else:

y <- c(1, 2, NA)
ifelse(y == NA, 0, y)
[1] NA NA NA

Uh-oh.

6.8 Missing Values

In R, we can think of NA as “I don’t know.”

Imagine I hold up two fingers on each hand and show them to you. Are the two numbers equal?

2 == 2
[1] TRUE

Yes.

I hold up two fingers on one hand, and I put my other hand behind my back with some unknown number of fingers held up (NA, missing). Are the two numbers equal?

2 == NA
[1] NA

“I don’t know.”

I put both hands behind my back. Am I holding up the same number of fingers on each hand?

NA == NA
[1] NA

“I don’t know.”

NA represents some real but unknown value, a hidden hand with some unknown number of fingers raised. NA is nowhere on the number line (in contrast to languages like SAS and Stata, which code missing as the minimum or maximum value, respectively), so using NA in a logical comparison or other function results in NA:

mean(y)
[1] NA

To check if a value is missing, instead of y == NA, use the is.na() function to return TRUE where a value is NA and FALSE where a value is observed:

is.na(y)
[1] FALSE FALSE  TRUE

Use the NOT operator to find where values are not missing, where they were observed:

!is.na(y)
[1]  TRUE  TRUE FALSE

Use that vector to index y to find the observed values:

y[!is.na(y)]
[1] 1 2

6.9 Logical to Numeric Coercion

Coercing logical vectors to numeric vectors turns TRUE into 1 and FALSE into 0:

as.numeric(c(T, F, T, F))
[1] 1 0 1 0

Using the sum() function will return the count of TRUEs since it implicitly coerces the logical vector to a numeric vector before computing:

sum(c(T, F, T, F))
[1] 2

And mean() will return the proportion of TRUEs:

mean(c(T, F, T, F))
[1] 0.5

We can use this property of logical values to summarize variable distributions.

Take a sample of 10,000 values from a normal distribution:

z <- rnorm(1e4)

We know that about 5% of the data is more than 1.96 standard deviations (here, the default is sd = 1) away from the mean (here, the default is mean = 0).

Test whether z is less than -1.96 or greater than 1.96. Then, take the mean to find the proportion of TRUEs:

mean(z < -1.96 | z > 1.96)
[1] 0.0523

6.10 Exercises

6.10.1 Fundamental

  1. Create a vector, x, with 15 random numbers between 0 and 1 (see runif()). Create a logical vector that indicates whether a value is greater than 0.5.

  2. Test whether elements of x are greater than 0.3 and less than 0.6, or are greater than 0.9. Compare your logical vector to the actual values to ensure your test was specified correctly.

  3. Create a new vector, y, with 1000 random numbers between 0 and 1. About 50% of values should be less than 0.5. Verify this.

6.10.2 Extended

  1. A typical use of logical comparison is to create an indicator variable. Create an object called high_mpg that indicates whether a given value of mtcars$mpg has a value greater than the mean of mtcars$mpg. Start by creating a vector with mpg <- mtcars$mpg.

  2. What proportion of values in mtcars$mpg are greater than the mean?

  3. Using high_mpg, create a vector of the weights (mtcars$wt) of cars with high MPG. Start by creating a vector with wt <- mtcars$wt.