 Supporting Statistical Analysis for Research
  Supporting Statistical Analysis for Research
 2.8 Subsetting rows
The tidyverse provides several functions to select
rows from a tibble.
- filter()selects rows using a Boolean condition.
- sample_n()and- sample_frac()take a random sample of the rows.
- slice()selects row by numeric position.
2.8.1 Examples
- Create test and training data frames using - filter().- set.seed(145705) cps <- cps %>% mutate( split = ifelse(runif(n()) > .75, "test", "train") ) cps_train <- cps %>% filter(split == "train") cps_test <- cps %>% filter(split == "test") dim(cps)- [1] 15992 11- dim(cps_train)- [1] 11902 11- dim(cps_test)- [1] 4090 11
- Create test and training data frames using - slice().- set.seed(145705) test_indx <- which(runif(nrow(cps)) > .75) train_ind <- setdiff(1:nrow(cps), test_indx) cps_train <- cps %>% slice(train_ind) cps_test <- cps %>% slice(test_indx) dim(cps)- [1] 15992 11- dim(cps_train)- [1] 11902 11- dim(cps_test)- [1] 4090 11