Supporting Statistical Analysis for Research

## 2.10 Summarizing data

The summarise() function transforms a tibble by applying functions that produce statistics of the variables.

### 2.10.1 Examples

1. Summarize one variable of the cps tibble.

We will find the mean and standard devation of age.

cps %>%
summarise(
mean_age = mean(age),
std_dev_age = sd(age)
)
# A tibble: 1 x 2
mean_age std_dev_age
<dbl>       <dbl>
1     33.2        11.0
2. Summarizing multiple columns

We will find the mean earnings in years 74, 75, and 78 for each ethnicity.

cps_eth_earn <-
cps %>%
group_by(ethnicity) %>%
summarise_at(
vars(real_earn_74:real_earn_78),
mean
)

cps_eth_earn
# A tibble: 3 x 4
ethnicity      real_earn_74 real_earn_75 real_earn_78
<fct>                 <dbl>        <dbl>        <dbl>
1 white_non_hisp       14376.       13999.       15213.
2 black                11427.       10941.       12007.
3 hisp                 12402.       12290.       13397.
3. Summarizing with multiple grouping variables

We will find the mean earnings in year 78 for each ethnicity and maritial status.

cps_eth_marr_earn <-
cps %>%
group_by(ethnicity, marr) %>%
summarise(
mean_earn_78 = mean(real_earn_78)
)

cps_eth_marr_earn
# A tibble: 6 x 3
# Groups:   ethnicity [3]
ethnicity       marr mean_earn_78
<fct>          <dbl>        <dbl>
1 white_non_hisp     0       11319.
2 white_non_hisp     1       16742.
3 black              0        9199.
4 black              1       13728.
5 hisp               0       10138.
6 hisp               1       14607.