Supporting Statistical Analysis for Research
2.6 Column operations
The two common column operations are
renaming columns, rename(), and
selecting columns, select().
The select() function has a number of helper
functions that make it easier to
select a set of columns,
such as, starts_with(), ends_with(), contains(),
everything() and the slice operator.
2.6.1 Examples
Renaming the variables of the cps data.
cps_in <- cps_in %>% rename( id = X1, no_deg = nodeg, real_earn_74 = re74, real_earn_75 = re75, real_earn_78 = re78 ) cps <- cps_in head(cps, 3)# A tibble: 3 x 11 id trt age educ black hisp marr no_deg real_earn_74 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 0 45 11 0 0 1 1 21517. 2 2 0 21 14 0 0 0 0 3176. 3 3 0 38 12 0 0 1 0 23039. # ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>The above variable names are Snake coded, each word is separated using the underscore,
_.Reordering the variables of the cps data.
We will make the first two columns of the
tibbletheidandagevariables.cps <- cps %>% select(id, age, everything()) head(cps, 3)# A tibble: 3 x 11 id age trt educ black hisp marr no_deg real_earn_74 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 45 0 11 0 0 1 1 21517. 2 2 21 0 14 0 0 0 0 3176. 3 3 38 0 12 0 0 1 0 23039. # ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>The
everything()function fills in the names of the other variables that were not listed.Selecting variables (subsetting)
We will select all the variables except the
real_earn_78variable using inclusion.cps_part1 <- cps %>% select(id, age, trt, educ, black, hisp, marr, no_deg, real_earn_74, real_earn_75) head(cps_part1, 3)# A tibble: 3 x 10 id age trt educ black hisp marr no_deg real_earn_74 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 45 0 11 0 0 1 1 21517. 2 2 21 0 14 0 0 0 0 3176. 3 3 38 0 12 0 0 1 0 23039. # ... with 1 more variable: real_earn_75 <dbl>We will select all the variables except the
real_earn_74andreal_earn_75variables using exclusion.cps_78 <- cps %>% select(-real_earn_74, -real_earn_75) head(cps_78, 3)# A tibble: 3 x 9 id age trt educ black hisp marr no_deg real_earn_78 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 45 0 11 0 0 1 1 25565. 2 2 21 0 14 0 0 0 0 13496. 3 3 38 0 12 0 0 1 0 25565.Removing a variable from a
tibble.subsetting to a single column of a tibble results in a one column tibble.
cps %>% select(age) %>% str()Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 15992 obs. of 1 variable: $ age: num 45 21 38 48 18 22 48 18 48 45 ... - attr(*, "spec")= .. cols( .. X1 = col_double(), .. trt = col_double(), .. age = col_double(), .. educ = col_double(), .. black = col_double(), .. hisp = col_double(), .. marr = col_double(), .. nodeg = col_double(), .. re74 = col_double(), .. re75 = col_double(), .. re78 = col_double() .. )The
pull()function is used to get a column from a tibble as a vector.cps %>% pull(age) %>% str()num [1:15992] 45 21 38 48 18 22 48 18 48 45 ...