Supporting Statistical Analysis for Research
4.2 Naming variables
These exercises use the PSID.csv
data set
that was imported in the prior section.
Import the
PSID.csv
data set.The following is used at the RStudio prompt to enter Python mode.
library(reticulate) repl_python()
The remainer is Python code.
from pathlib import Path import pandas as pd
psid_path = Path('..') / 'datasets' / 'PSID.csv' psid = pd.read_csv(psid_path) print(psid.dtypes)
Unnamed: 0 int64 intnum int64 persnum int64 age int64 educatn float64 earnings int64 hours int64 kids int64 married object dtype: object
Set the variable names to something useful, if they are not already. Change at least one name.
psid = psid.rename( columns={ 'Unnamed: 0': 'obs_num', 'intnum': 'intvw_num', 'persnum': 'person_id', 'married': 'marital_status'}) print(psid.dtypes)
obs_num int64 intvw_num int64 person_id int64 age int64 educatn float64 earnings int64 hours int64 kids int64 marital_status object dtype: object