SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

4.2 Naming variables

These exercises use the PSID.csv data set that was imported in the prior section.

  1. Import the PSID.csv data set.

    The following is used at the RStudio prompt to enter Python mode.

    library(reticulate)
    repl_python()

    The remainer is Python code.

    from pathlib import Path
    import pandas as pd
    psid_path = Path('..') / 'datasets' / 'PSID.csv'
    psid = pd.read_csv(psid_path)
    
    print(psid.dtypes)
    Unnamed: 0      int64
    intnum          int64
    persnum         int64
    age             int64
    educatn       float64
    earnings        int64
    hours           int64
    kids            int64
    married        object
    dtype: object
  2. Set the variable names to something useful, if they are not already. Change at least one name.

    psid = psid.rename(
        columns={
            'Unnamed: 0': 'obs_num',
            'intnum': 'intvw_num', 
            'persnum': 'person_id',
            'married': 'marital_status'})
    
    print(psid.dtypes)
    obs_num             int64
    intvw_num           int64
    person_id           int64
    age                 int64
    educatn           float64
    earnings            int64
    hours               int64
    kids                int64
    marital_status     object
    dtype: object