SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

5.2 Character variables

These exercises use the mtcars.csv data set.

  1. Import the mtcars.csv data set.

    mtcars_path <- file.path("..", "datasets", "mtcars.csv")
    mtcars_in <- read_csv(mtcars_path, col_types = cols())
    Warning: Missing column names filled in: 'X1' [1]
    mtcars_in <- rename(mtcars_in, make_model = X1)
    mtcars <- mtcars_in
    
    glimpse(mtcars)
    Observations: 32
    Variables: 12
    $ make_model <chr> "Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet...
    $ mpg        <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22....
    $ cyl        <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, ...
    $ disp       <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 14...
    $ hp         <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123,...
    $ drat       <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.9...
    $ wt         <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3....
    $ qsec       <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20...
    $ vs         <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, ...
    $ am         <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    $ gear       <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, ...
    $ carb       <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, ...
  2. Divide the column that has the car name into columns that contain the make and model of the car.

    mtcars <-
      mtcars %>%
      separate(make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE)
    Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [6].
    mtcars %>%
      select(make_model, make, model) %>%
      head()
    # A tibble: 6 x 3
      make_model        make    model     
      <chr>             <chr>   <chr>     
    1 Mazda RX4         Mazda   RX4       
    2 Mazda RX4 Wag     Mazda   RX4 Wag   
    3 Datsun 710        Datsun  710       
    4 Hornet 4 Drive    Hornet  4 Drive   
    5 Hornet Sportabout Hornet  Sportabout
    6 Valiant           Valiant <NA>      
  3. Do all observations have a make and model value? If there are missing values, can you fix them? (Hint, use Google to help you.)

    filter(mtcars, rlang::are_na(make) | rlang::are_na(model))
    # A tibble: 1 x 14
      make_model make  model   mpg   cyl  disp    hp  drat    wt  qsec    vs
      <chr>      <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 Valiant    Vali~ <NA>   18.1     6   225   105  2.76  3.46  20.2     1
    # ... with 3 more variables: am <dbl>, gear <dbl>, carb <dbl>

    There is a missing model name for the make of Valiant

    Googling shows that the Valiant was produced by Plymouth. We can correct the Valiant observation(s) to include Plymouth.

    mtcars <- mtcars_in
    
    mtcars <-
      mtcars %>%
      mutate(make_model = recode(make_model, "Valiant" = "Plymouth Valiant")) %>%
      separate(make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE)
    
    mtcars %>%
      select(make_model, make, model) %>%
      head()
    # A tibble: 6 x 3
      make_model        make     model     
      <chr>             <chr>    <chr>     
    1 Mazda RX4         Mazda    RX4       
    2 Mazda RX4 Wag     Mazda    RX4 Wag   
    3 Datsun 710        Datsun   710       
    4 Hornet 4 Drive    Hornet   4 Drive   
    5 Hornet Sportabout Hornet   Sportabout
    6 Plymouth Valiant  Plymouth Valiant   
  4. Some car companies have more than one make. In this data Chrysler, Plymouth, and Dodge were all made by Chrysler. Likewise Cadillac and Pontiac are made by GM and Lincoln and Ford are both made by Ford. Create a company variable based on the data in the make variable

    mtcars <-
      mtcars %>%
      mutate(
        company = make,
        company = 
          recode(company,
                 "Plymouth" = "Chrysler",
                 "Dodge" = "Chrysler",
                 "Lincoln" = "Ford",
                 "Cadillac" = "GM",
                 "Pontiac" = "GM"
                 )
        )

    Putting together all the code to create the company, make, and model would provide the following.

    mtcars <- mtcars_in
    
    mtcars <-
      mtcars %>%
      mutate(
        make_model = recode(make_model, "Valiant" = "Plymouth Valiant")
        ) %>%
      separate(
        make_model, c("make", "model"), sep = " ", extra = "merge", remove = FALSE
        ) %>%
      mutate(
        company = make,
        company = 
          recode(company,
                 "Plymouth" = "Chrysler",
                 "Dodge" = "Chrysler",
                 "Lincoln" = "Ford",
                 "Cadillac" = "GM",
                 "Pontiac" = "GM"
                 )
        )
    
    mtcars %>%
      select(make_model, make, model, company) %>%
      head()
    # A tibble: 6 x 4
      make_model        make     model      company 
      <chr>             <chr>    <chr>      <chr>   
    1 Mazda RX4         Mazda    RX4        Mazda   
    2 Mazda RX4 Wag     Mazda    RX4 Wag    Mazda   
    3 Datsun 710        Datsun   710        Datsun  
    4 Hornet 4 Drive    Hornet   4 Drive    Hornet  
    5 Hornet Sportabout Hornet   Sportabout Hornet  
    6 Plymouth Valiant  Plymouth Valiant    Chrysler
  5. Create a name for use in displaying results that is a character string composed of make, a space character, if the company name is not the same as the make then the company in parentheses (), and model.

    mtcars <-
      mtcars %>%
      mutate(
        comp_parn = if_else(company != make, str_c(" (",company, ") "), " ")
        ) %>%
      unite(name, make, comp_parn, model, sep = "", remove = FALSE) %>%
      select(-comp_parn)
    
    mtcars %>%
      select(name, make, model, company) %>%
      head()
    # A tibble: 6 x 4
      name                        make     model      company 
      <chr>                       <chr>    <chr>      <chr>   
    1 Mazda RX4                   Mazda    RX4        Mazda   
    2 Mazda RX4 Wag               Mazda    RX4 Wag    Mazda   
    3 Datsun 710                  Datsun   710        Datsun  
    4 Hornet 4 Drive              Hornet   4 Drive    Hornet  
    5 Hornet Sportabout           Hornet   Sportabout Hornet  
    6 Plymouth (Chrysler) Valiant Plymouth Valiant    Chrysler