4.4 Dropping unneeded variables

SSCC - Social Science Computing Cooperative

Supporting Statistical Analysis for Research

These exercises use the PSID.csv data set that was imported in the prior section.

Import the PSID.csv data set.

from pathlib import Path
import pandas as pd

psid_path = Path('..') / 'datasets' / 'PSID.csv'
psid_in = pd.read_csv(psid_path)
psid_in = (
    psid_in
        .rename( columns={
            'Unnamed: 0': 'obs_num',
            'intnum': 'intvw_num', 
            'persnum': 'person_id',
            'married': 'marital_status'}))
psid =  psid_in.copy(deep=True)

print(psid.dtypes)

obs_num             int64
intvw_num           int64
person_id           int64
age                 int64
educatn           float64
earnings            int64
hours               int64
kids                int64
marital_status     object
dtype: object

Drop the first variable in the data frame. You may have renamed it after it was loaded.
```
psid = psid.drop(columns='obs_num')
```

Make the age variable the first variable in the data frame.

psid = psid.loc[:, [
    'age', 'intvw_num', 'person_id', 'educatn',
    'earnings', 'hours', 'kids', 'marital_status']]