2.6 Column operations
The two common column operations are
renaming columns, rename()
, and
selecting columns, select()
.
The select()
function has a number of helper
functions that make it easier to
select a set of columns,
such as, starts_with()
, ends_with()
, contains()
,
everything()
and the slice operator.
2.6.1 Examples
Renaming the variables of the cps data.
cps_in <- cps_in %>% rename( id = X1, no_deg = nodeg, real_earn_74 = re74, real_earn_75 = re75, real_earn_78 = re78 ) cps <- cps_in head(cps, 3)
# A tibble: 3 x 11 id trt age educ black hisp marr no_deg real_earn_74 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 0 45 11 0 0 1 1 21517. 2 2 0 21 14 0 0 0 0 3176. 3 3 0 38 12 0 0 1 0 23039. # ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>
The above variable names are Snake coded, each word is separated using the underscore,
_
.Reordering the variables of the cps data.
We will make the first two columns of the
tibble
theid
andage
variables.cps <- cps %>% select(id, age, everything()) head(cps, 3)
# A tibble: 3 x 11 id age trt educ black hisp marr no_deg real_earn_74 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 45 0 11 0 0 1 1 21517. 2 2 21 0 14 0 0 0 0 3176. 3 3 38 0 12 0 0 1 0 23039. # ... with 2 more variables: real_earn_75 <dbl>, real_earn_78 <dbl>
The
everything()
function fills in the names of the other variables that were not listed.Selecting variables (subsetting)
We will select all the variables except the
real_earn_78
variable using inclusion.cps_part1 <- cps %>% select(id, age, trt, educ, black, hisp, marr, no_deg, real_earn_74, real_earn_75) head(cps_part1, 3)
# A tibble: 3 x 10 id age trt educ black hisp marr no_deg real_earn_74 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 45 0 11 0 0 1 1 21517. 2 2 21 0 14 0 0 0 0 3176. 3 3 38 0 12 0 0 1 0 23039. # ... with 1 more variable: real_earn_75 <dbl>
We will select all the variables except the
real_earn_74
andreal_earn_75
variables using exclusion.cps_78 <- cps %>% select(-real_earn_74, -real_earn_75) head(cps_78, 3)
# A tibble: 3 x 9 id age trt educ black hisp marr no_deg real_earn_78 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 45 0 11 0 0 1 1 25565. 2 2 21 0 14 0 0 0 0 13496. 3 3 38 0 12 0 0 1 0 25565.
Removing a variable from a
tibble
.subsetting to a single column of a tibble results in a one column tibble.
cps %>% select(age) %>% str()
Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 15992 obs. of 1 variable: $ age: num 45 21 38 48 18 22 48 18 48 45 ... - attr(*, "spec")= .. cols( .. X1 = col_double(), .. trt = col_double(), .. age = col_double(), .. educ = col_double(), .. black = col_double(), .. hisp = col_double(), .. marr = col_double(), .. nodeg = col_double(), .. re74 = col_double(), .. re75 = col_double(), .. re78 = col_double() .. )
The
pull()
function is used to get a column from a tibble as a vector.cps %>% pull(age) %>% str()
num [1:15992] 45 21 38 48 18 22 48 18 48 45 ...