
5.2 Character variables
These exercises use the mtcars.csv
data set.
Import the
mtcars.csv
data set.mtcars_path <- file.path("..", "datasets", "mtcars.csv") mtcars_in <- read_csv(mtcars_path, col_types = cols())
Warning: Missing column names filled in: 'X1' [1]
mtcars_in <- rename(mtcars_in, make_model = X1)
mtcars <- mtcars_in glimpse(mtcars)
Observations: 32 Variables: 12 $ make_model <chr> "Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet... $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.... $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, ... $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 14... $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123,... $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.9... $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.... $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20... $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, ... $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, ... $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, ...
Divide the column that has the car name into columns that contain the make and model of the car.
mtcars <- mtcars %>% separate(make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE)
Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [6].
mtcars %>% select(make_model, make, model) %>% head()
# A tibble: 6 x 3 make_model make model <chr> <chr> <chr> 1 Mazda RX4 Mazda RX4 2 Mazda RX4 Wag Mazda RX4 Wag 3 Datsun 710 Datsun 710 4 Hornet 4 Drive Hornet 4 Drive 5 Hornet Sportabout Hornet Sportabout 6 Valiant Valiant <NA>
Do all observations have a make and model value? If there are missing values, can you fix them? (Hint, use Google to help you.)
filter(mtcars, rlang::are_na(make) | rlang::are_na(model))
# A tibble: 1 x 14 make_model make model mpg cyl disp hp drat wt qsec vs <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Valiant Vali~ <NA> 18.1 6 225 105 2.76 3.46 20.2 1 # ... with 3 more variables: am <dbl>, gear <dbl>, carb <dbl>
There is a missing
model
name for themake
ofValiant
Googling shows that the
Valiant
was produced byPlymouth
. We can correct theValiant
observation(s) to includePlymouth
.mtcars <- mtcars_in mtcars <- mtcars %>% mutate(make_model = recode(make_model, "Valiant" = "Plymouth Valiant")) %>% separate(make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE) mtcars %>% select(make_model, make, model) %>% head()
# A tibble: 6 x 3 make_model make model <chr> <chr> <chr> 1 Mazda RX4 Mazda RX4 2 Mazda RX4 Wag Mazda RX4 Wag 3 Datsun 710 Datsun 710 4 Hornet 4 Drive Hornet 4 Drive 5 Hornet Sportabout Hornet Sportabout 6 Plymouth Valiant Plymouth Valiant
Some car companies have more than one make. In this data
Chrysler
,Plymouth
, andDodge
were all made byChrysler
. LikewiseCadillac
andPontiac
are made byGM
andLincoln
andFord
are both made byFord
. Create a company variable based on the data in themake
variablemtcars <- mtcars %>% mutate( company = make, company = recode(company, "Plymouth" = "Chrysler", "Dodge" = "Chrysler", "Lincoln" = "Ford", "Cadillac" = "GM", "Pontiac" = "GM" ) )
Putting together all the code to create the company, make, and model would provide the following.
mtcars <- mtcars_in mtcars <- mtcars %>% mutate( make_model = recode(make_model, "Valiant" = "Plymouth Valiant") ) %>% separate( make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE ) %>% mutate( company = make, company = recode(company, "Plymouth" = "Chrysler", "Dodge" = "Chrysler", "Lincoln" = "Ford", "Cadillac" = "GM", "Pontiac" = "GM" ) ) mtcars %>% select(make_model, make, model, company) %>% head()
# A tibble: 6 x 4 make_model make model company <chr> <chr> <chr> <chr> 1 Mazda RX4 Mazda RX4 Mazda 2 Mazda RX4 Wag Mazda RX4 Wag Mazda 3 Datsun 710 Datsun 710 Datsun 4 Hornet 4 Drive Hornet 4 Drive Hornet 5 Hornet Sportabout Hornet Sportabout Hornet 6 Plymouth Valiant Plymouth Valiant Chrysler
Create a name for use in displaying results that is a character string composed of
make
, a space character, if the company name is not the same as the make then the company in parentheses()
, andmodel
.mtcars <- mtcars %>% mutate( comp_parn = if_else(company != make, str_c(" (",company, ") "), " ") ) %>% unite(name, make, comp_parn, model, sep = "", remove = FALSE) %>% select(-comp_parn) mtcars %>% select(name, make, model, company) %>% head()
# A tibble: 6 x 4 name make model company <chr> <chr> <chr> <chr> 1 Mazda RX4 Mazda RX4 Mazda 2 Mazda RX4 Wag Mazda RX4 Wag Mazda 3 Datsun 710 Datsun 710 Datsun 4 Hornet 4 Drive Hornet 4 Drive Hornet 5 Hornet Sportabout Hornet Sportabout Hornet 6 Plymouth (Chrysler) Valiant Plymouth Valiant Chrysler