SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

3.5 Column index

The two common column operations are renaming columns, rename(), and selecting columns, select(). The select() function has a number of helper functions that make it easier to select a set of columns, such as, starts_with(), ends_with(), contains(), everything() and the slice operator.

Examples

  1. Renaming the variables of the forbes data.

    The new names are given as the parameter names and the old names as the parameter variables. Neither of these names needs to be quoted, as long as there are no spaces in a name. Multiple variable renames can be listed in the parameters to rename().

    forbes_in <-
      forbes_in %>%
      rename(
        market_value = marketvalue
        )
  2. Dropping variables.

    Variables are removed from a data frame by using the negation symbol before the variable name.

    forbes <-
      forbes_in %>%
      select(-X1)
  3. Selecting variables.

    The slice operator is used in this example to select the columns starting with the company name through the sales variable.

    forbes %>%
      select(
        name:sales
        ) %>%
      glimpse()
    Observations: 2,000
    Variables: 4
    $ name     <chr> "Citigroup", "General Electric", "American Intl Group...
    $ country  <chr> "United States", "United States", "United States", "U...
    $ category <chr> "Banking", "Conglomerates", "Insurance", "Oil & gas o...
    $ sales    <dbl> 94.71, 134.19, 76.66, 222.88, 232.57, 49.01, 44.33, 1...

    Here the forbes data frame was not changed. The modified data frame was only diplayed.

  4. Reordering variables.

    The order of the variables of a data frame will match the order of the variables in select().

    This exmaple reorders the variables to put the name, market_value, and country vraibles first in the data frame. Rather than list all the remaining variables, the everything() helper function is used. This list the remaining variable.

    forbes <-
      forbes %>%
      select(
        name,
        market_value,
        country,
        everything()
        )
    
    glimpse(forbes)
    Observations: 2,000
    Variables: 8
    $ name         <chr> "Citigroup", "General Electric", "American Intl G...
    $ market_value <dbl> 255.30, 328.54, 194.87, 277.02, 173.54, 117.55, 1...
    $ country      <chr> "United States", "United States", "United States"...
    $ rank         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...
    $ category     <chr> "Banking", "Conglomerates", "Insurance", "Oil & g...
    $ sales        <dbl> 94.71, 134.19, 76.66, 222.88, 232.57, 49.01, 44.3...
    $ profits      <dbl> 17.85, 15.59, 6.46, 20.96, 10.27, 10.81, 6.66, 7....
    $ assets       <dbl> 1264.03, 626.93, 647.66, 166.99, 177.57, 736.45, ...