Supporting Statistical Analysis for Research

## 2.6 Column operations

The two common column operations are renaming columns, rename(), and selecting columns, select(). The select() function has a number of helper functions that make it easier to select a set of columns, such as, starts_with(), ends_with(), contains(), everything() and the slice operator.

### 2.6.1 Examples

1. Renaming the variables of the cps data.

cps_in <-
cps_in %>%
rename(
id = X1,
no_deg = nodeg,
real_earn_74 = re74,
real_earn_75 = re75,
real_earn_78 = re78
)
cps <-
cps_in

head(cps, 3)    
# A tibble: 3 x 11
id   trt   age  educ black  hisp  marr no_deg real_earn_74
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
1     1     0    45    11     0     0     1      1       21517.
2     2     0    21    14     0     0     0      0        3176.
3     3     0    38    12     0     0     1      0       23039.
# ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>

The above variable names are Snake coded, each word is separated using the underscore, _.

2. Reordering the variables of the cps data.

We will make the first two columns of the tibble the id and age variables.

cps <-
cps %>%
select(id, age, everything())

head(cps, 3)    
# A tibble: 3 x 11
id   age   trt  educ black  hisp  marr no_deg real_earn_74
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
1     1    45     0    11     0     0     1      1       21517.
2     2    21     0    14     0     0     0      0        3176.
3     3    38     0    12     0     0     1      0       23039.
# ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>

The everything() function fills in the names of the other variables that were not listed.

3. Selecting variables (subsetting)

We will select all the variables except the real_earn_78 variable using inclusion.

cps_part1 <-
cps %>%
select(id, age, trt, educ, black, hisp, marr, no_deg, real_earn_74, real_earn_75)

head(cps_part1, 3)
# A tibble: 3 x 10
id   age   trt  educ black  hisp  marr no_deg real_earn_74
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
1     1    45     0    11     0     0     1      1       21517.
2     2    21     0    14     0     0     0      0        3176.
3     3    38     0    12     0     0     1      0       23039.
# ... with 1 more variable: real_earn_75 <dbl>

We will select all the variables except the real_earn_74 and real_earn_75 variables using exclusion.

cps_78 <-
cps %>%
select(-real_earn_74, -real_earn_75)

head(cps_78, 3)    
# A tibble: 3 x 9
id   age   trt  educ black  hisp  marr no_deg real_earn_78
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>        <dbl>
1     1    45     0    11     0     0     1      1       25565.
2     2    21     0    14     0     0     0      0       13496.
3     3    38     0    12     0     0     1      0       25565.
4. Removing a variable from a tibble.

subsetting to a single column of a tibble results in a one column tibble.

cps %>%
select(age) %>%
str()
Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    15992 obs. of  1 variable:
\$ age: num  45 21 38 48 18 22 48 18 48 45 ...
- attr(*, "spec")=
.. cols(
..   X1 = col_double(),
..   trt = col_double(),
..   age = col_double(),
..   educ = col_double(),
..   black = col_double(),
..   hisp = col_double(),
..   marr = col_double(),
..   nodeg = col_double(),
..   re74 = col_double(),
..   re75 = col_double(),
..   re78 = col_double()
.. )

The pull() function is used to get a column from a tibble as a vector.

cps %>%
pull(age) %>%
str()
 num [1:15992] 45 21 38 48 18 22 48 18 48 45 ...