Supporting Statistical Analysis for Research

## 5.2 Character variables

These exercises use the `mtcars.csv` data set.

1. Import the `mtcars.csv` data set.

``````mtcars_path <- file.path("..", "datasets", "mtcars.csv")
mtcars_in <- read_csv(mtcars_path, col_types = cols())``````
``Warning: Missing column names filled in: 'X1' [1]``
``mtcars_in <- rename(mtcars_in, make_model = X1)``
``````mtcars <- mtcars_in

glimpse(mtcars)``````
``````Observations: 32
Variables: 12
\$ make_model <chr> "Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet...
\$ mpg        <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22....
\$ cyl        <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, ...
\$ disp       <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 14...
\$ hp         <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123,...
\$ drat       <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.9...
\$ wt         <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3....
\$ qsec       <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20...
\$ vs         <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, ...
\$ am         <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
\$ gear       <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, ...
\$ carb       <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, ...``````
2. Divide the column that has the car name into columns that contain the make and model of the car.

``````mtcars <-
mtcars %>%
separate(make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE)``````
``Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [6].``
``````mtcars %>%
select(make_model, make, model) %>%
``````# A tibble: 6 x 3
make_model        make    model
<chr>             <chr>   <chr>
1 Mazda RX4         Mazda   RX4
2 Mazda RX4 Wag     Mazda   RX4 Wag
3 Datsun 710        Datsun  710
4 Hornet 4 Drive    Hornet  4 Drive
6 Valiant           Valiant <NA>      ``````
3. Do all observations have a make and model value? If there are missing values, can you fix them? (Hint, use Google to help you.)

``filter(mtcars, rlang::are_na(make) | rlang::are_na(model))``
``````# A tibble: 1 x 14
make_model make  model   mpg   cyl  disp    hp  drat    wt  qsec    vs
<chr>      <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Valiant    Vali~ <NA>   18.1     6   225   105  2.76  3.46  20.2     1
# ... with 3 more variables: am <dbl>, gear <dbl>, carb <dbl>``````

There is a missing `model` name for the `make` of `Valiant`

Googling shows that the `Valiant` was produced by `Plymouth`. We can correct the `Valiant` observation(s) to include `Plymouth`.

``````mtcars <- mtcars_in

mtcars <-
mtcars %>%
mutate(make_model = recode(make_model, "Valiant" = "Plymouth Valiant")) %>%
separate(make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE)

mtcars %>%
select(make_model, make, model) %>%
``````# A tibble: 6 x 3
make_model        make     model
<chr>             <chr>    <chr>
1 Mazda RX4         Mazda    RX4
2 Mazda RX4 Wag     Mazda    RX4 Wag
3 Datsun 710        Datsun   710
4 Hornet 4 Drive    Hornet   4 Drive
6 Plymouth Valiant  Plymouth Valiant   ``````
4. Some car companies have more than one make. In this data `Chrysler`, `Plymouth`, and `Dodge` were all made by `Chrysler`. Likewise `Cadillac` and `Pontiac` are made by `GM` and `Lincoln` and `Ford` are both made by `Ford`. Create a company variable based on the data in the `make` variable

``````mtcars <-
mtcars %>%
mutate(
company = make,
company =
recode(company,
"Plymouth" = "Chrysler",
"Dodge" = "Chrysler",
"Lincoln" = "Ford",
"Pontiac" = "GM"
)
)``````

Putting together all the code to create the company, make, and model would provide the following.

``````mtcars <- mtcars_in

mtcars <-
mtcars %>%
mutate(
make_model = recode(make_model, "Valiant" = "Plymouth Valiant")
) %>%
separate(
make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE
) %>%
mutate(
company = make,
company =
recode(company,
"Plymouth" = "Chrysler",
"Dodge" = "Chrysler",
"Lincoln" = "Ford",
"Pontiac" = "GM"
)
)

mtcars %>%
select(make_model, make, model, company) %>%
``````# A tibble: 6 x 4
make_model        make     model      company
<chr>             <chr>    <chr>      <chr>
1 Mazda RX4         Mazda    RX4        Mazda
2 Mazda RX4 Wag     Mazda    RX4 Wag    Mazda
3 Datsun 710        Datsun   710        Datsun
4 Hornet 4 Drive    Hornet   4 Drive    Hornet
6 Plymouth Valiant  Plymouth Valiant    Chrysler``````
5. Create a name for use in displaying results that is a character string composed of `make`, a space character, if the company name is not the same as the make then the company in parentheses `()`, and `model`.

``````mtcars <-
mtcars %>%
mutate(
comp_parn = if_else(company != make, str_c(" (",company, ") "), " ")
) %>%
unite(name, make, comp_parn, model, sep = "", remove = FALSE) %>%
select(-comp_parn)

mtcars %>%
select(name, make, model, company) %>%
``````# A tibble: 6 x 4