SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

5.3 Numeric variables

These exercises use the mtcars.csv data set.

  1. Import the mtcars.csv data set.

    from pathlib import Path
    import pandas as pd
    import numpy as np
    mtcars_path = Path('..') / 'datasets' / 'mtcars.csv'
    mtcars_in = pd.read_csv(mtcars_path)
    mtcars_in = mtcars_in.rename(columns={'Unnamed: 0': 'make_model'})
    mtcars =  mtcars_in.copy(deep=True)
    
    print(mtcars.dtypes)
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt            float64
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    dtype: object
  2. The wt variable is measured in thousands of pounds. Change this variable to a character variable that has a comma separating the thousands digit from the hundreds digit, e.g. 2.14 becomes 2,140.

    Hint, for one of the possible solution you may find it useful to look for a string function/method that will pad. Padding adds characters to a fixed width.

    mtcars =  (
        mtcars
            .assign(
                wt_1000s=lambda df: df['wt'],
                wt=lambda df: df['wt'].round(decimals=3).astype(str),
                wt_d4=lambda df: df['wt'].str.extract('(\\d+)\\.', expand=False),
                wt_d123=lambda df: df['wt'].str.extract('\\.(\\d+)', expand=False))
            .assign(
                wt_d123=lambda df:
                    df['wt_d123'].str.pad(3, side='right', fillchar='0'),
                wt=lambda df: df['wt_d4'] + ',' + df['wt_d123'])
            .drop(columns=['wt_d4', 'wt_d123']))
    
    (mtcars
        .loc[:, ['make_model', 'wt', 'wt_1000s']]
        .head(5)
        .pipe(print))
              make_model     wt  wt_1000s
    0          Mazda RX4  2,620     2.620
    1      Mazda RX4 Wag  2,875     2.875
    2         Datsun 710  2,320     2.320
    3     Hornet 4 Drive  3,215     3.215
    4  Hornet Sportabout  3,440     3.440
    print(mtcars.dtypes)
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt             object
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    wt_1000s      float64
    dtype: object

    This is another posible solution.

    This solution make use of the str.replace() method. The default behavior for the pat parameter is a raw string. The regex parameter can be use to return it to a literal string if needed. The repl parameter is a literal string and bachslashes will need to be escaped.

    mtcars =  mtcars_in.copy(deep=True)
    
    mtcars =  (
        mtcars
            .assign(
                wt_1000s=lambda df: df['wt'],
                wt=lambda df: df['wt'] * 1000)
            .assign(
                wt=lambda df: (df['wt']
                    .round(0)
                    .astype(int)
                    .astype(str)
                    .str.replace('(...$)', ',\\1'))))
    
    (mtcars
        .loc[:, ['make_model', 'wt', 'wt_1000s']]
        .head(5)
        .pipe(print))
              make_model     wt  wt_1000s
    0          Mazda RX4  2,620     2.620
    1      Mazda RX4 Wag  2,875     2.875
    2         Datsun 710  2,320     2.320
    3     Hornet 4 Drive  3,215     3.215
    4  Hornet Sportabout  3,440     3.440
    print(mtcars.dtypes)
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt             object
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    wt_1000s      float64
    dtype: object
  3. Convert the character variable you created in the prior exercise to a new numeric variable. Make the units of measure for this new variable 1,000 of pounds.

    mtcars =  (
        mtcars
            .assign(
                wt=lambda df: df['wt'].str.replace(',', ''))
            .assign(
                wt=lambda df: pd.to_numeric(df['wt']) / 1000))
    
    (mtcars
        .loc[:, ['make_model', 'wt', 'wt_1000s']]
        .head(5)
        .pipe(print))
              make_model     wt  wt_1000s
    0          Mazda RX4  2.620     2.620
    1      Mazda RX4 Wag  2.875     2.875
    2         Datsun 710  2.320     2.320
    3     Hornet 4 Drive  3.215     3.215
    4  Hornet Sportabout  3.440     3.440
    print(mtcars.dtypes)
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt            float64
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    wt_1000s      float64
    dtype: object