SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

2.10 Summarizing data

The summarise() function transforms a tibble by applying functions that produce statistics of the variables.

2.10.1 Examples

  1. Summarize one variable of the cps tibble.

    We will find the mean and standard devation of age.

    cps %>%
      summarise(
        mean_age = mean(age),
        std_dev_age = sd(age)
      )
    # A tibble: 1 x 2
      mean_age std_dev_age
         <dbl>       <dbl>
    1     33.2        11.0
  2. Summarizing multiple columns

    We will find the mean earnings in years 74, 75, and 78 for each ethnicity.

    cps_eth_earn <-
      cps %>%
      group_by(ethnicity) %>%
      summarise_at(
        vars(real_earn_74:real_earn_78),
        mean
      )
    
    cps_eth_earn
    # A tibble: 3 x 4
      ethnicity      real_earn_74 real_earn_75 real_earn_78
      <fct>                 <dbl>        <dbl>        <dbl>
    1 white_non_hisp       14376.       13999.       15213.
    2 black                11427.       10941.       12007.
    3 hisp                 12402.       12290.       13397.
  3. Summarizing with multiple grouping variables

    We will find the mean earnings in year 78 for each ethnicity and maritial status.

    cps_eth_marr_earn <-
      cps %>%
      group_by(ethnicity, marr) %>%
      summarise(
        mean_earn_78 = mean(real_earn_78)
      )
    
    cps_eth_marr_earn
    # A tibble: 6 x 3
    # Groups:   ethnicity [3]
      ethnicity       marr mean_earn_78
      <fct>          <dbl>        <dbl>
    1 white_non_hisp     0       11319.
    2 white_non_hisp     1       16742.
    3 black              0        9199.
    4 black              1       13728.
    5 hisp               0       10138.
    6 hisp               1       14607.