SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

4.4 Dropping unneeded variables

These exercises use the PSID.csv data set that was imported in the prior section.

  1. Import the PSID.csv data set.

    library(tidyverse)
    psid_path <- file.path("..", "datasets", "PSID.csv")
    psid_in <- read_csv(psid_path, col_types = cols())
    Warning: Missing column names filled in: 'X1' [1]
    psid_in <-
      rename(
        psid_in,
        obs_num = X1,
        intvw_num = intnum,
        person_id = persnum,
        marital_status = married
        )
    
    psid <- psid_in
    glimpse(psid)
    Observations: 4,856
    Variables: 9
    $ obs_num        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...
    $ intvw_num      <dbl> 4, 4, 4, 4, 5, 6, 6, 7, 7, 7, 10, 10, 10, 11, 1...
    $ person_id      <dbl> 4, 6, 7, 173, 2, 4, 172, 4, 170, 171, 3, 171, 1...
    $ age            <dbl> 39, 35, 33, 39, 47, 44, 38, 38, 39, 37, 48, 47,...
    $ educatn        <dbl> 12, 12, 12, 10, 9, 12, 16, 9, 12, 11, 13, 12, 1...
    $ earnings       <dbl> 77250, 12000, 8000, 15000, 6500, 6500, 7000, 50...
    $ hours          <dbl> 2940, 2040, 693, 1904, 1683, 2024, 1144, 2080, ...
    $ kids           <dbl> 2, 2, 1, 2, 5, 2, 3, 4, 3, 5, 98, 3, 0, 0, 2, 0...
    $ marital_status <chr> "married", "divorced", "married", "married", "m...
  2. Drop the first variable in the data frame. You may have renamed it after it was loaded.

    psid <- select(psid, -obs_num)
  3. Make the age variable the first variable in the data frame.

    psid <- select(psid, age, everything())