Supporting Statistical Analysis for Research
4.3 Copying data sets
These exercises use the PSID.csv data set that was imported in the prior section.
Import the
PSID.csv
data set. Set the variable names to something useful, if they are not already. Change at least one name.from pathlib import Path import pandas as pd
psid_path = Path('..') / 'datasets' / 'PSID.csv' psid_in = pd.read_csv(psid_path) psid_in = ( psid_in .rename( columns={ 'Unnamed: 0': 'obs_num', 'intnum': 'intvw_num', 'persnum': 'person_id', 'married': 'marital_status'})) print(psid_in.dtypes)
obs_num int64 intvw_num int64 person_id int64 age int64 educatn float64 earnings int64 hours int64 kids int64 marital_status object dtype: object
Create a copy of the imported data frame that will be used for data cleaning.
psid = psid_in.copy(deep=True)
Save the data frame as a csv to a file. Make sure to give the file a new name.
psid.to_csv(Path('..') / 'datasets' / 'PSID_copy.csv')