Supporting Statistical Analysis for Research
4.2 Naming variables
These exercises use the PSID.csv data set
that was imported in the prior section.
Import the
PSID.csvdata set.The following is used at the RStudio prompt to enter Python mode.
library(reticulate) repl_python()The remainer is Python code.
from pathlib import Path import pandas as pdpsid_path = Path('..') / 'datasets' / 'PSID.csv' psid = pd.read_csv(psid_path) print(psid.dtypes)Unnamed: 0 int64 intnum int64 persnum int64 age int64 educatn float64 earnings int64 hours int64 kids int64 married object dtype: objectSet the variable names to something useful, if they are not already. Change at least one name.
psid = psid.rename( columns={ 'Unnamed: 0': 'obs_num', 'intnum': 'intvw_num', 'persnum': 'person_id', 'married': 'marital_status'}) print(psid.dtypes)obs_num int64 intvw_num int64 person_id int64 age int64 educatn float64 earnings int64 hours int64 kids int64 marital_status object dtype: object