<- runif(5)
x mean(x)
> mean(x)
x mean(x > mean(x))
6 Logical Vectors
6.1 Warm-Up
Run each line of code. What does each line do?
6.2 Outcomes
Objective: To compare vectors with logical operators and use logical-to-numeric coercion for summarizing data.
Why it matters: Logical vectors are used in common data wrangling applications, such as creating new variables, recoding existing variables, and subsetting datasets.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key values and operators:
TRUE
FALSE
NA
>
>=
<
<=
==
!=
&
|
!
%in%
ifelse()
6.3 Logical Values
R has three logical values:
TRUE
FALSE
NA
(missing)
TRUE
and FALSE
can also be written as T
and F
. You will often see or write this shorthand within function arguments:
sample(1:10, 100, replace = T)
In modeling, logical vectors are called indicator or dummy variables. They dichotomize continuous or categorical vectors into binary vectors: where a condition holds, and where it does not hold.
6.4 Comparison Operators
Comparison operators help us apply a test to a vector to produce a logical vector.
Create a vector with the numbers 1-5 and compare it to 3.
<- 1:5
x x
[1] 1 2 3 4 5
> 3 # greater than x
[1] FALSE FALSE FALSE TRUE TRUE
>= 3 # greater than or equal to x
[1] FALSE FALSE TRUE TRUE TRUE
< 3 # less than x
[1] TRUE TRUE FALSE FALSE FALSE
<= 3 # less than or equal to x
[1] TRUE TRUE TRUE FALSE FALSE
== 3 # equal to x
[1] FALSE FALSE TRUE FALSE FALSE
!= 3 # not equal to x
[1] TRUE TRUE FALSE TRUE TRUE
Our operators for “greater/less than or equal to” are literally the greater/less than symbol (<
or >
) with an equals sign (=
).
To test equality, we need to use two equals signs (==
). This is because assignment can be done not only with the assignment operator (<-
) but also the single equals sign (=
). If we run x = 3
, we would assign the value of 3 to x
!
6.5 Boolean Operators
Boolean operators are used to combine multiple logical vectors.
The three main Boolean operators are AND
(&
), OR
(|
), and NOT
(!
).
AND
checks whether all are TRUE
.
& T T
[1] TRUE
& F T
[1] FALSE
& F F
[1] FALSE
OR
checks whether at least one is TRUE
.
| T T
[1] TRUE
| F T
[1] TRUE
| F F
[1] FALSE
In these examples, AND
and OR
agree when all values are T
or F
, but they disagree when there is a mix. For T & F
, AND
returns FALSE
because not all of them are TRUE
. For T | F
, OR
returns TRUE
because there is at least one TRUE
.
NOT
switches TRUE
and FALSE
.
!T
[1] FALSE
!F
[1] TRUE
We can combine multiple comparison operators with Boolean operators. For example, if we want to find values of x
in the range 2-4, we would combine two statements about the minimum and maximum of the range: values greater than or equal to 2 AND
less than or equal to 4:
>= 2 & x <= 4 x
[1] FALSE TRUE TRUE TRUE FALSE
Or we can find where x
is equal to 1 or 3:
== 1 | x == 3 x
[1] TRUE FALSE TRUE FALSE FALSE
Note we have to repeat the entire condition. Something like x == 1 | 3
would not work. If we want to check if the values in x
are in some set of values, our code can start to become very long:
== 1 | x == 2 | x == 3 | x == 4 x
[1] TRUE TRUE TRUE TRUE FALSE
6.6 %in%
Instead, use the value matching operator, %in%
. This operator takes one argument on the left and one on the right. For each value on the left, it checks whether it is found anywhere in the set on the right. That is, there is no recycling of the argument on the right. Each element on the left is checked against every element on the right.
Rewrite the last example with %in%
:
%in% 1:4 x
[1] TRUE TRUE TRUE TRUE FALSE
On the right, create a vector with c()
or seq()
(or anything else):
%in% c(1, 3, 5) x
[1] TRUE FALSE TRUE FALSE TRUE
%in% seq(1, 5, 2) x
[1] TRUE FALSE TRUE FALSE TRUE
Logical operators work with more than numbers. The built-in vector LETTERS
has the uppercase letters A-Z (see ?Constants
). Find those that are vowels:
%in% c("A", "E", "I", "O", "U") LETTERS
[1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
[13] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
[25] FALSE FALSE
This returns a logical vector, but we may instead want the values associated with each of those TRUE
s. We can index a vector with a logical vector with the format vector_to_index[logical_vector]
:
%in% c("A", "E", "I", "O", "U")] LETTERS[LETTERS
[1] "A" "E" "I" "O" "U"
Where the condition is TRUE
, the value is returned. Where it is FALSE
, nothing is returned.
We can also index a different vector with that logical vector, provided it is the same length. Find elements of the lowercase letters
where LETTERS
has a vowel:
%in% c("A", "E", "I", "O", "U")] letters[LETTERS
[1] "a" "e" "i" "o" "u"
6.7 ifelse()
We can use logical operators to not just identify elements where a condition is TRUE
, but recode it. This allows us to dichotomize continuous and categorical variables, top-code and/or bottom-code variables, and change certain values to other values or to missing.
The ifelse()
function takes three arguments:
test
: a logical test that returnsTRUE
orFALSE
yes
: the value to return iftest
isTRUE
no
: the value to return iftest
isFALSE
Continue to use x
, which has the numbers 1-5.
Test whether x
is equal to 5:
== 5 x
[1] FALSE FALSE FALSE FALSE TRUE
Now insert that as the test
argument in ifelse()
. Where there is a TRUE
, R will return the yes
value. Where there is a FALSE
, R will return the no
value. Return 1 for TRUE
and 0
for FALSE
:
ifelse(x == 5, 1, 0)
[1] 0 0 0 0 1
Top-code x
so that where x
is greater than 4, 4 is returned. Otherwise, return the current value. To do that, set the no
argument to be x
:
ifelse(x > 4, 4, x)
[1] 1 2 3 4 4
Similarly, bottom-code the data to impose a minimum value of 3:
ifelse(x < 3, 3, x)
[1] 3 3 3 4 5
Combine multiple conditions with Boolean operators to selectively recode some values. If x
is outside the range 2-4, return NA
:
ifelse(x >= 2 & x <= 4, x, NA)
[1] NA 2 3 4 NA
Instead of recoding known values as missing, we may want to recode missing values as something else:
<- c(1, 2, NA)
y ifelse(y == NA, 0, y)
[1] NA NA NA
Uh-oh.
6.8 Missing Values
In R, we can think of NA
as “I don’t know.”
Imagine I hold up two fingers on each hand and show them to you. Are the two numbers equal?
2 == 2
[1] TRUE
Yes.
I hold up two fingers on one hand, and I put my other hand behind my back with some unknown number of fingers held up (NA
, missing). Are the two numbers equal?
2 == NA
[1] NA
“I don’t know.”
I put both hands behind my back. Am I holding up the same number of fingers on each hand?
NA == NA
[1] NA
“I don’t know.”
NA
represents some real but unknown value, a hidden hand with some unknown number of fingers raised. NA
is nowhere on the number line (in contrast to languages like SAS and Stata, which code missing as the minimum or maximum value, respectively), so using NA
in a logical comparison or other function results in NA
:
mean(y)
[1] NA
To check if a value is missing, instead of y == NA
, use the is.na()
function to return TRUE
where a value is NA
and FALSE
where a value is observed:
is.na(y)
[1] FALSE FALSE TRUE
Use the NOT
operator to find where values are not missing, where they were observed:
!is.na(y)
[1] TRUE TRUE FALSE
Use that vector to index y
to find the observed values:
!is.na(y)] y[
[1] 1 2
6.9 Logical to Numeric Coercion
Coercing logical vectors to numeric vectors turns TRUE
into 1 and FALSE
into 0:
as.numeric(c(T, F, T, F))
[1] 1 0 1 0
Using the sum()
function will return the count of TRUE
s since it implicitly coerces the logical vector to a numeric vector before computing:
sum(c(T, F, T, F))
[1] 2
And mean()
will return the proportion of TRUE
s:
mean(c(T, F, T, F))
[1] 0.5
We can use this property of logical values to summarize variable distributions.
Take a sample of 10,000 values from a normal distribution:
<- rnorm(1e4) z
We know that about 5% of the data is more than 1.96 standard deviations (here, the default is sd = 1
) away from the mean (here, the default is mean = 0
).
Test whether z
is less than -1.96 or greater than 1.96. Then, take the mean to find the proportion of TRUE
s:
mean(z < -1.96 | z > 1.96)
[1] 0.0523
6.10 Exercises
6.10.1 Fundamental
Create a vector,
x
, with 15 random numbers between 0 and 1 (seerunif()
). Create a logical vector that indicates whether a value is greater than 0.5.Test whether elements of
x
are greater than 0.3 and less than 0.6, or are greater than 0.9. Compare your logical vector to the actual values to ensure your test was specified correctly.Create a new vector,
y
, with 1000 random numbers between 0 and 1. About 50% of values should be less than 0.5. Verify this.
6.10.2 Extended
A typical use of logical comparison is to create an indicator variable. Create an object called
high_mpg
that indicates whether a given value ofmtcars$mpg
has a value greater than the mean ofmtcars$mpg
. Start by creating a vector withmpg <- mtcars$mpg
.What proportion of values in
mtcars$mpg
are greater than the mean?Using
high_mpg
, create a vector of the weights (mtcars$wt
) of cars with high MPG. Start by creating a vector withwt <- mtcars$wt
.