SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

4.4 Dropping unneeded variables

These exercises use the PSID.csv data set that was imported in the prior section.

  1. Import the PSID.csv data set.

    from pathlib import Path
    import pandas as pd
    psid_path = Path('..') / 'datasets' / 'PSID.csv'
    psid_in = pd.read_csv(psid_path)
    psid_in = (
        psid_in
            .rename( columns={
                'Unnamed: 0': 'obs_num',
                'intnum': 'intvw_num', 
                'persnum': 'person_id',
                'married': 'marital_status'}))
    psid =  psid_in.copy(deep=True)
    
    print(psid.dtypes)
    obs_num             int64
    intvw_num           int64
    person_id           int64
    age                 int64
    educatn           float64
    earnings            int64
    hours               int64
    kids                int64
    marital_status     object
    dtype: object
  2. Drop the first variable in the data frame. You may have renamed it after it was loaded.

    psid = psid.drop(columns='obs_num')
  3. Make the age variable the first variable in the data frame.

    psid = psid.loc[:, [
        'age', 'intvw_num', 'person_id', 'educatn',
        'earnings', 'hours', 'kids', 'marital_status']]