
3.8 Selecting rows
Rows are typically selected based on some condition in
the data,
as apposed to by name as columns typically are.
The filter()
function takes a Boolean variable and
removes rows (from all columns) that are FALSE
,
keeping only the rows with TRUE
.
Examples
Dropping observations (rows.)
This example uses
filter()
to create a subset data frame containing the 1000 largest companies.forbes_1000 <- forbes %>% filter( rank >= 1000 ) forbes_1000 %>% head() %>% print(10)
# A tibble: 6 x 14 name market_value country rank category sales profits assets pe <chr> <dbl> <fct> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> 1 Nort~ 2.48 United~ 1000 Utiliti~ 6.73 0.13 11.0 19.1 2 Kore~ 1.76 South ~ 1001 Utiliti~ 6.17 0.25 7.87 7.04 3 MOL 3.15 Hungary 1002 Oil & g~ 5.16 0.290 4.2 10.9 4 Firs~ 2.36 United~ 1003 Insuran~ 5.98 0.44 4.28 5.36 5 Sumi~ 4.42 Japan 1004 Diversi~ 4.52 0.04 16.9 110. 6 Hibe~ 3.58 United~ 1005 Banking 1.33 0.25 17.6 14.3 # ... with 5 more variables: nafta <lgl>, profit_lev <fct>, # industry <chr>, profits_std <dbl>, outlier <lgl>
Conditional examination of the data.
forbes %>% select(name, country, rank, market_value, nafta) %>% filter(nafta)
# A tibble: 824 x 5 name country rank market_value nafta <chr> <fct> <dbl> <dbl> <lgl> 1 Citigroup United States 1 255. TRUE 2 General Electric United States 2 329. TRUE 3 American Intl Group United States 3 195. TRUE 4 ExxonMobil United States 4 277. TRUE 5 Bank of America United States 6 118. TRUE 6 Fannie Mae United States 9 76.8 TRUE 7 Wal-Mart Stores United States 10 244. TRUE 8 Berkshire Hathaway United States 14 141. TRUE 9 JP Morgan Chase United States 15 81.9 TRUE 10 IBM United States 16 172. TRUE # ... with 814 more rows
Conditional proportion
The
filter()
function can be used to calculate a proportion conditional some state of the data.Here we will recalculate the proportion of outlier profits conditional on being based in a NAFTA country.
forbes %>% filter( nafta ) %>% summarise( outlier_proportion = mean(outlier, na.rm = TRUE) )
# A tibble: 1 x 1 outlier_proportion <dbl> 1 0.229