SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

2.6 Column operations

The two common column operations are renaming columns, rename(), and selecting columns, select(). The select() function has a number of helper functions that make it easier to select a set of columns, such as, starts_with(), ends_with(), contains(), everything() and the slice operator.

2.6.1 Examples

  1. Renaming the variables of the cps data.

    cps_in <- 
      cps_in %>%
        id = X1,
        no_deg = nodeg,
        real_earn_74 = re74,
        real_earn_75 = re75,
        real_earn_78 = re78
    cps <-
    head(cps, 3)    
    # A tibble: 3 x 11
         id   trt   age  educ black  hisp  marr no_deg real_earn_74
      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
    1     1     0    45    11     0     0     1      1       21517.
    2     2     0    21    14     0     0     0      0        3176.
    3     3     0    38    12     0     0     1      0       23039.
    # ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>

    The above variable names are Snake coded, each word is separated using the underscore, _.

  2. Reordering the variables of the cps data.

    We will make the first two columns of the tibble the id and age variables.

    cps <-
      cps %>%
      select(id, age, everything())
    head(cps, 3)    
    # A tibble: 3 x 11
         id   age   trt  educ black  hisp  marr no_deg real_earn_74
      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
    1     1    45     0    11     0     0     1      1       21517.
    2     2    21     0    14     0     0     0      0        3176.
    3     3    38     0    12     0     0     1      0       23039.
    # ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>

    The everything() function fills in the names of the other variables that were not listed.

  3. Selecting variables (subsetting)

    We will select all the variables except the real_earn_78 variable using inclusion.

    cps_part1 <-
      cps %>%
      select(id, age, trt, educ, black, hisp, marr, no_deg, real_earn_74, real_earn_75)
    head(cps_part1, 3)
    # A tibble: 3 x 10
         id   age   trt  educ black  hisp  marr no_deg real_earn_74
      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
    1     1    45     0    11     0     0     1      1       21517.
    2     2    21     0    14     0     0     0      0        3176.
    3     3    38     0    12     0     0     1      0       23039.
    # ... with 1 more variable: real_earn_75 <dbl>

    We will select all the variables except the real_earn_74 and real_earn_75 variables using exclusion.

    cps_78 <-
      cps %>%
      select(-real_earn_74, -real_earn_75)
    head(cps_78, 3)    
    # A tibble: 3 x 9
         id   age   trt  educ black  hisp  marr no_deg real_earn_78
      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
    1     1    45     0    11     0     0     1      1       25565.
    2     2    21     0    14     0     0     0      0       13496.
    3     3    38     0    12     0     0     1      0       25565.
  4. Removing a variable from a tibble.

    subsetting to a single column of a tibble results in a one column tibble.

    cps %>%
      select(age) %>%
    Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    15992 obs. of  1 variable:
     $ age: num  45 21 38 48 18 22 48 18 48 45 ...
     - attr(*, "spec")=
      .. cols(
      ..   X1 = col_double(),
      ..   trt = col_double(),
      ..   age = col_double(),
      ..   educ = col_double(),
      ..   black = col_double(),
      ..   hisp = col_double(),
      ..   marr = col_double(),
      ..   nodeg = col_double(),
      ..   re74 = col_double(),
      ..   re75 = col_double(),
      ..   re78 = col_double()
      .. )

    The pull() function is used to get a column from a tibble as a vector.

    cps %>%
      pull(age) %>%
     num [1:15992] 45 21 38 48 18 22 48 18 48 45 ...