Supporting Statistical Analysis for Research
4.4 Dropping unneeded variables
These exercises use the PSID.csv
data set
that was imported in the prior section.
Import the
PSID.csv
data set.from pathlib import Path import pandas as pd
psid_path = Path('..') / 'datasets' / 'PSID.csv' psid_in = pd.read_csv(psid_path) psid_in = ( psid_in .rename( columns={ 'Unnamed: 0': 'obs_num', 'intnum': 'intvw_num', 'persnum': 'person_id', 'married': 'marital_status'})) psid = psid_in.copy(deep=True) print(psid.dtypes)
obs_num int64 intvw_num int64 person_id int64 age int64 educatn float64 earnings int64 hours int64 kids int64 marital_status object dtype: object
Drop the first variable in the data frame. You may have renamed it after it was loaded.
psid = psid.drop(columns='obs_num')
Make the age variable the first variable in the data frame.
psid = psid.loc[:, [ 'age', 'intvw_num', 'person_id', 'educatn', 'earnings', 'hours', 'kids', 'marital_status']]