SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

12 Subsetting

  • Extractors
  • Selecting Columns
    • base R
      • Exercises
    • dplyr
      • Exercises
  • Selecting Rows
    • base R
    • dplyr

When working with the data in a data frame, you will routinely find that you only want to work with a portion of the data. Very often it will be convenient to create a new data frame that is a subset of your original data frame, especially where a function uses a data frame as it’s primary input argument, or where you will be using the same subset repeatedly in several subsequent statements in your script.

We can think of working with a subset of the columns of a data frame, or a subset of the rows. Two common examples are:

  • Subset by selecting variables to create a scatterplot matrix.
  • Subset by selecting observations for a data frame with no missing values.

12.1 Extractors

We have previously seen that we can index vectors by position, name, or condition to extract and replace elements. Now we extend this to data frames. In addition to the square bracket extractors, [ ], we also have two “list” extractors, the dollar sign , $, and the doubled square brackets, [[ ]]. Here it is relevant that a data frame is a special kind of list.

  • The square brackets, [ ], allow us to index both the rows and the columns of a data frame. We can index by position, name, or condition.

    When used with data frames (or matrices), we index by two positions, rows and columns. The square brackets should contain two objects, a vector indexing the rows and a vector indexing the columns. To select all the elements along one dimension, omit that vector but include the comma.

    Try these examples:

    mtcars["Valiant", "mpg"]  # a single element
    mtcars[ , c("mpg", "wt")] # all rows, two columns
    mtcars[1:6, ]             # rows 1-6, all columns
    mtcars[ , ]               # everything

    The next example illustrates indexing rows by position and columns by name.

    mtcars[1:6, c("mpg", "wt")] # [rows, columns]
                       mpg    wt
    Mazda RX4         21.0 2.620
    Mazda RX4 Wag     21.0 2.875
    Datsun 710        22.8 2.320
    Hornet 4 Drive    21.4 3.215
    Hornet Sportabout 18.7 3.440
    Valiant           18.1 3.460
  • The dollar sign extractor, $, allows us to index a single named element of a list. In a data frame, that would be a column, a single variable. (We’ll use the head function to show just the first few values.)

    head( mtcars$mpg )
    [1] 21.0 21.0 22.8 21.4 18.7 18.1
  • The double brackets, [[ ]], allow us to index a single element of a list by either name or position. This can be useful for programming repeated tasks.

    head( mtcars[[3]] )      # by position, displacement
    [1] 160 160 108 258 360 225
    head( mtcars[["disp"]] ) # by name
    [1] 160 160 108 258 360 225
  • In addition to these extractors, we also have a subset function that provides a very useful variation on the square bracket extractor.

    subset(mtcars, 
           subset=carb >=6,       # rows
           select=c("mpg", "wt")) # columns
                   mpg   wt
    Ferrari Dino  19.7 2.77
    Maserati Bora 15.0 3.57

Data for further examples:

cars <- mtcars
# create two character variables
cars$name <- row.names(cars)
library(tidyr) # separate is easier to use than strsplit
cars <- separate(cars, name, c("make","model"), extra="merge") 

12.2 Selecting Columns

  • Single column
  • Multiple columns
    • arbitrary names
    • by name prefix
    • by name range
    • by column type (move this exercises)
    • by column position range (move this exercises)

Perhaps the most common subset you will use will be pulling a single column from a data frame by name using the dollar sign extractor. This is so common and innocuous you might not even think of it as “subsetting!”

mtcars$mpg

12.2.1 Single name

This can be done with any extractor. The array and list extractors return a vector, while the subset function returns a data frame (we’ll pipe to the head function to show just a few rows of the data frame).

library(magrittr)

Attaching package: 'magrittr'
The following object is masked from 'package:tidyr':

    extract
cars$mpg      %>%  # a solitary vector
    head
[1] 21.0 21.0 22.8 21.4 18.7 18.1
cars[, "mpg"] %>%
    head
[1] 21.0 21.0 22.8 21.4 18.7 18.1
cars[["mpg"]] %>%
    head
[1] 21.0 21.0 22.8 21.4 18.7 18.1
subset(cars, select="mpg") %>% # base R, returns a data.frame
    head
                   mpg
Mazda RX4         21.0
Mazda RX4 Wag     21.0
Datsun 710        22.8
Hornet 4 Drive    21.4
Hornet Sportabout 18.7
Valiant           18.1

12.2.2 Arbitrary names

Very commonly you will want to put all of the variables you are using for a project into a single data frame, selecting a subset of columns using an arbitrary vector of names.

Note that the column names are specified as a character vector.

head( cars[, c("mpg", "disp")] )
                   mpg disp
Mazda RX4         21.0  160
Mazda RX4 Wag     21.0  160
Datsun 710        22.8  108
Hornet 4 Drive    21.4  258
Hornet Sportabout 18.7  360
Valiant           18.1  225
head( subset(cars, select=c("mpg", "disp")) )
                   mpg disp
Mazda RX4         21.0  160
Mazda RX4 Wag     21.0  160
Datsun 710        22.8  108
Hornet 4 Drive    21.4  258
Hornet Sportabout 18.7  360
Valiant           18.1  225

12.2.3 Name prefix, other name wildcards

It is fairly common for groups of related variables to have similar names with a common prefix or a common suffix, and often we want to extract these as a group. Perhaps the data have repeated observations of household income across multiple years, recorded as hhinc2005, hhinc2010, hhinc2015. In base R we use regular expressions to find multiple matching names, and convert those to column positions.

head( cars[ , grep("^m"   , names(cars))] ) # starts with "m"
                   mpg    make      model
Mazda RX4         21.0   Mazda        RX4
Mazda RX4 Wag     21.0   Mazda    RX4 Wag
Datsun 710        22.8  Datsun        710
Hornet 4 Drive    21.4  Hornet    4 Drive
Hornet Sportabout 18.7  Hornet Sportabout
Valiant           18.1 Valiant       <NA>
head( cars[ , grep("^[mc]", names(cars))] ) # starts with "m" or "c"
                   mpg cyl carb    make      model
Mazda RX4         21.0   6    4   Mazda        RX4
Mazda RX4 Wag     21.0   6    4   Mazda    RX4 Wag
Datsun 710        22.8   4    1  Datsun        710
Hornet 4 Drive    21.4   6    1  Hornet    4 Drive
Hornet Sportabout 18.7   8    2  Hornet Sportabout
Valiant           18.1   6    1 Valiant       <NA>

12.2.4 Position range

Less commonly you may extract columns from a data frame by position. This is easy to use when there are not too many columns in the original data frame, harder to use where there are a lot of columns, or where the data have been restructured as with a merge or reshape.

head(cars[, 1:5])
                   mpg cyl disp  hp drat
Mazda RX4         21.0   6  160 110 3.90
Mazda RX4 Wag     21.0   6  160 110 3.90
Datsun 710        22.8   4  108  93 3.85
Hornet 4 Drive    21.4   6  258 110 3.08
Hornet Sportabout 18.7   8  360 175 3.15
Valiant           18.1   6  225 105 2.76

12.2.5 Column Type

Occasionally you will want to work with all of the variables of a certain type - perhaps to create a scatterplot matrix of all numeric variables, to create a table of means and standard deviations of continuous variables, or to create frequency tables for all factor variables.

To accomplish this task we specify the variables to use by condition. In order to check the type of each variable, we use the sapply function (one of several apply functions used for repeated tasks).

sapply(cars, is.character)                      # a variation of lapply
  mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
 carb  make model 
FALSE  TRUE  TRUE 
head(cars[,sapply(cars, is.character)])         # base R, []
                     make      model
Mazda RX4           Mazda        RX4
Mazda RX4 Wag       Mazda    RX4 Wag
Datsun 710         Datsun        710
Hornet 4 Drive     Hornet    4 Drive
Hornet Sportabout  Hornet Sportabout
Valiant           Valiant       <NA>
head(subset(cars, 
            select=sapply(cars, is.character))) # base R
                     make      model
Mazda RX4           Mazda        RX4
Mazda RX4 Wag       Mazda    RX4 Wag
Datsun 710         Datsun        710
Hornet 4 Drive     Hornet    4 Drive
Hornet Sportabout  Hornet Sportabout
Valiant           Valiant       <NA>
# select(cars, where(is.character))               # dplyr

#sapply(cars, is.numeric) 

12.2.6 Name range

The idea here is that we want all of the variables that are positioned together in the data frame, for example from mpg to disp inclusive. To accomplish this in base R, we need to figure out what position mpg and disp occupy in the data frame. Given these positions, we convert a name range into a position range.

varnames <- names(cars)
grep("mpg", varnames)  # an "exact" or "fixed" regular expression
[1] 1
grep("disp", varnames)
[1] 3
# as one statement
cars[, grep("mpg", names(cars)):grep("disp", names(cars))]
                     mpg cyl  disp
Mazda RX4           21.0   6 160.0
Mazda RX4 Wag       21.0   6 160.0
Datsun 710          22.8   4 108.0
Hornet 4 Drive      21.4   6 258.0
Hornet Sportabout   18.7   8 360.0
Valiant             18.1   6 225.0
Duster 360          14.3   8 360.0
Merc 240D           24.4   4 146.7
Merc 230            22.8   4 140.8
Merc 280            19.2   6 167.6
Merc 280C           17.8   6 167.6
Merc 450SE          16.4   8 275.8
Merc 450SL          17.3   8 275.8
Merc 450SLC         15.2   8 275.8
Cadillac Fleetwood  10.4   8 472.0
Lincoln Continental 10.4   8 460.0
Chrysler Imperial   14.7   8 440.0
Fiat 128            32.4   4  78.7
Honda Civic         30.4   4  75.7
Toyota Corolla      33.9   4  71.1
Toyota Corona       21.5   4 120.1
Dodge Challenger    15.5   8 318.0
AMC Javelin         15.2   8 304.0
Camaro Z28          13.3   8 350.0
Pontiac Firebird    19.2   8 400.0
Fiat X1-9           27.3   4  79.0
Porsche 914-2       26.0   4 120.3
Lotus Europa        30.4   4  95.1
Ford Pantera L      15.8   8 351.0
Ferrari Dino        19.7   6 145.0
Maserati Bora       15.0   8 301.0
Volvo 142E          21.4   4 121.0

12.3 Subsetting Rows

12.3.1 most often by condition

cars[cars$am==0 & cars$mpg > 20, ]  # base R, note the repeated "cars$..."
                mpg cyl  disp  hp drat    wt  qsec vs am gear
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3
               carb   make   model
Hornet 4 Drive    1 Hornet 4 Drive
Merc 240D         2   Merc    240D
Merc 230          2   Merc     230
Toyota Corona     1 Toyota  Corona
subset(cars, am==0 & mpg > 20)      # base R, don't need to use "cars$..."
                mpg cyl  disp  hp drat    wt  qsec vs am gear
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3
               carb   make   model
Hornet 4 Drive    1 Hornet 4 Drive
Merc 240D         2   Merc    240D
Merc 230          2   Merc     230
Toyota Corona     1 Toyota  Corona
# this is how the subset option alongside a formula works, as well.

# filter(cars, am==0 & mpg > 20 )  # dplyr
# filter(cars, am==0, mpg > 20 )

# subset(data, subset) and filter(data,...) work only with conditions, not names or positions?

12.3.2 by name (sometimes people put IDs here)

cars["Merc 230",]  # base R, tidy R objects to row names.
          mpg cyl  disp hp drat   wt qsec vs am gear carb make
Merc 230 22.8   4 140.8 95 3.92 3.15 22.9  1  0    4    2 Merc
         model
Merc 230   230

12.3.3 by position - occasionally useful

cars[1:5,]
                   mpg cyl disp  hp drat    wt  qsec vs am gear
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3
                  carb   make      model
Mazda RX4            4  Mazda        RX4
Mazda RX4 Wag        4  Mazda    RX4 Wag
Datsun 710           1 Datsun        710
Hornet 4 Drive       1 Hornet    4 Drive
Hornet Sportabout    2 Hornet Sportabout

12.3.4 random observations

# approximately 25% of our data
cars[runif(nrow(cars)) < 0.25,] # approximately 8 observations
                     mpg cyl  disp  hp drat    wt  qsec vs am
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0
                    gear carb     make       model
Mazda RX4              4    4    Mazda         RX4
Hornet Sportabout      3    2   Hornet  Sportabout
Duster 360             3    4   Duster         360
Merc 240D              4    2     Merc        240D
Merc 280C              4    4     Merc        280C
Merc 450SL             3    3     Merc       450SL
Merc 450SLC            3    3     Merc      450SLC
Cadillac Fleetwood     3    4 Cadillac   Fleetwood
Lincoln Continental    3    4  Lincoln Continental
Dodge Challenger       3    2    Dodge  Challenger
AMC Javelin            3    2      AMC     Javelin
# exactly 8 observations
rcars <- cars[order(runif(nrow(cars))),]
rcars[1:8,]
                     mpg cyl  disp  hp drat    wt  qsec vs am
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0
                    gear carb     make       model
Lotus Europa           5    2    Lotus      Europa
Lincoln Continental    3    4  Lincoln Continental
Camaro Z28             3    4   Camaro         Z28
Valiant                3    1  Valiant        <NA>
Merc 450SE             3    3     Merc       450SE
Hornet Sportabout      3    2   Hornet  Sportabout
Maserati Bora          5    8 Maserati        Bora
AMC Javelin            3    2      AMC     Javelin
cars[order(runif(nrow(cars))),][1:8,] # two mind-bending variations!
                    mpg cyl  disp  hp drat    wt  qsec vs am
Ferrari Dino       19.7   6 145.0 175 3.62 2.770 15.50  0  1
Maserati Bora      15.0   8 301.0 335 3.54 3.570 14.60  0  1
Fiat X1-9          27.3   4  79.0  66 4.08 1.935 18.90  1  1
Mazda RX4 Wag      21.0   6 160.0 110 3.90 2.875 17.02  0  1
Merc 450SLC        15.2   8 275.8 180 3.07 3.780 18.00  0  0
Merc 230           22.8   4 140.8  95 3.92 3.150 22.90  1  0
Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0
Ford Pantera L     15.8   8 351.0 264 4.22 3.170 14.50  0  1
                   gear carb     make     model
Ferrari Dino          5    6  Ferrari      Dino
Maserati Bora         5    8 Maserati      Bora
Fiat X1-9             4    1     Fiat      X1-9
Mazda RX4 Wag         4    4    Mazda   RX4 Wag
Merc 450SLC           3    3     Merc    450SLC
Merc 230              4    2     Merc       230
Cadillac Fleetwood    3    4 Cadillac Fleetwood
Ford Pantera L        5    4     Ford Pantera L
cars[order(runif(nrow(cars)))[1:8],]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4
Honda Civic       30.4   4  75.7  52 4.93 1.615 18.52  1  1    4
Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5
Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5
Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5
Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3
                  carb     make     model
Mazda RX4            4    Mazda       RX4
Honda Civic          2    Honda     Civic
Merc 280C            4     Merc      280C
Ford Pantera L       4     Ford Pantera L
Porsche 914-2        2  Porsche     914-2
Lotus Europa         2    Lotus    Europa
Merc 450SL           3     Merc     450SL
Chrysler Imperial    4 Chrysler  Imperial
# or more simply
cars[sample(1:nrow(cars), 8),]
                     mpg cyl  disp  hp drat    wt  qsec vs am
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0
                    gear carb     make       model
Lincoln Continental    3    4  Lincoln Continental
Dodge Challenger       3    2    Dodge  Challenger
Porsche 914-2          5    2  Porsche       914-2
Fiat 128               4    1     Fiat         128
AMC Javelin            3    2      AMC     Javelin
Toyota Corolla         4    1   Toyota     Corolla
Maserati Bora          5    8 Maserati        Bora
Merc 280               4    4     Merc         280

12.4 Exercises

  • subset columns out of order

  • subset the same column twice

  • What happens if you forget the comma when indexing a data frame with square brackets? Find an example that returns data with no error message, and another example that returns an error and no data.

  • subset rows out of order - this is how order() is used to sort a data frame

  • subset each row twice - you might use this when simulating data from an experiment or repeated measures study

12.5 Selecting Columns - Tidyverse

Tidyverse functions for subsetting rows or columns from a data frame (either a data.frame or a tibble) are found in the dplyr package. As we have seen previously, tidyverse functions take a data frame as their first argument, simplifying their use with pipes.

library(dplyr)

Most of the work of subsetting columns is accomplished with the select function. An additional function, pull can be used to extract a single column.

12.5.1 Single column

The pull function indexes by both positions and names.

mtcars %>%
    pull(1) %>%
    head
[1] 21.0 21.0 22.8 21.4 18.7 18.1
mtcars %>%
    pull(mpg) %>%
    head
[1] 21.0 21.0 22.8 21.4 18.7 18.1

This could also be accomplished with the select function. However we no longer have the option of specifying numeric positions. And selecting a single column returns a data frame rather than a vector.

mtcars %>%
    select(mpg) %>%
    head
                   mpg
Mazda RX4         21.0
Mazda RX4 Wag     21.0
Datsun 710        22.8
Hornet 4 Drive    21.4
Hornet Sportabout 18.7
Valiant           18.1

12.5.2 Multiple columns

The select function is the tidy workhorse here. It primarily indexes columns by name or by logical condition. It does not index with numerical positions. However, it introduces a variation on that concept in the form of named positions. And like the dollar sign extractor, it uses unquoted variable names - look again at the previous two examples!

12.5.3 Arbitrary vectors

Each variable name is listed as a separate argument.

cars %>%
    select(mpg, disp)  
                     mpg  disp
Mazda RX4           21.0 160.0
Mazda RX4 Wag       21.0 160.0
Datsun 710          22.8 108.0
Hornet 4 Drive      21.4 258.0
Hornet Sportabout   18.7 360.0
Valiant             18.1 225.0
Duster 360          14.3 360.0
Merc 240D           24.4 146.7
Merc 230            22.8 140.8
Merc 280            19.2 167.6
Merc 280C           17.8 167.6
Merc 450SE          16.4 275.8
Merc 450SL          17.3 275.8
Merc 450SLC         15.2 275.8
Cadillac Fleetwood  10.4 472.0
Lincoln Continental 10.4 460.0
Chrysler Imperial   14.7 440.0
Fiat 128            32.4  78.7
Honda Civic         30.4  75.7
Toyota Corolla      33.9  71.1
Toyota Corona       21.5 120.1
Dodge Challenger    15.5 318.0
AMC Javelin         15.2 304.0
Camaro Z28          13.3 350.0
Pontiac Firebird    19.2 400.0
Fiat X1-9           27.3  79.0
Porsche 914-2       26.0 120.3
Lotus Europa        30.4  95.1
Ford Pantera L      15.8 351.0
Ferrari Dino        19.7 145.0
Maserati Bora       15.0 301.0
Volvo 142E          21.4 121.0

12.5.4 Name prefixes and other regular expressions

The “helper” functions, such as starts_with and matches make this very flexible

cars %>%
    select(starts_with("m")) %>% # literal
    head
                   mpg    make      model
Mazda RX4         21.0   Mazda        RX4
Mazda RX4 Wag     21.0   Mazda    RX4 Wag
Datsun 710        22.8  Datsun        710
Hornet 4 Drive    21.4  Hornet    4 Drive
Hornet Sportabout 18.7  Hornet Sportabout
Valiant           18.1 Valiant       <NA>
cars %>%
    select(matches("^[mc]")) %>% # regular expression
    head
                   mpg cyl carb    make      model
Mazda RX4         21.0   6    4   Mazda        RX4
Mazda RX4 Wag     21.0   6    4   Mazda    RX4 Wag
Datsun 710        22.8   4    1  Datsun        710
Hornet 4 Drive    21.4   6    1  Hornet    4 Drive
Hornet Sportabout 18.7   8    2  Hornet Sportabout
Valiant           18.1   6    1 Valiant       <NA>

12.5.5 name range

The idea here is that we want all of the variables that are positioned together in the data frame, from mpg to disp inclusive.

cars %>%
    select(mpg:disp) %>%
    head
                   mpg cyl disp
Mazda RX4         21.0   6  160
Mazda RX4 Wag     21.0   6  160
Datsun 710        22.8   4  108
Hornet 4 Drive    21.4   6  258
Hornet Sportabout 18.7   8  360
Valiant           18.1   6  225

12.5.6 Position range

Less commonly you may extract columns from a data frame by position. This is easy to use when there are not too many columns in the original data frame, harder to use where there are a lot of columns.

cars %>%
    select(1:5) %>%  # dplyr
    head
                   mpg cyl disp  hp drat
Mazda RX4         21.0   6  160 110 3.90
Mazda RX4 Wag     21.0   6  160 110 3.90
Datsun 710        22.8   4  108  93 3.85
Hornet 4 Drive    21.4   6  258 110 3.08
Hornet Sportabout 18.7   8  360 175 3.15
Valiant           18.1   6  225 105 2.76

12.5.7 column type