5.3 Numeric variables

These exercises use the mtcars.csv data set.

  1. Import the mtcars.csv data set.

    from pathlib import Path
    import pandas as pd
    import numpy as np
    mtcars_path = Path('..') / 'datasets' / 'mtcars.csv'
    mtcars_in = pd.read_csv(mtcars_path)
    mtcars_in = mtcars_in.rename(columns={'Unnamed: 0': 'make_model'})
    mtcars =  mtcars_in.copy(deep=True)
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt            float64
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    dtype: object
  2. The wt variable is measured in thousands of pounds. Change this variable to a character variable that has a comma separating the thousands digit from the hundreds digit, e.g. 2.14 becomes 2,140.

    Hint, for one of the possible solution you may find it useful to look for a string function/method that will pad. Padding adds characters to a fixed width.

    mtcars =  (
                wt_1000s=lambda df: df['wt'],
                wt=lambda df: df['wt'].round(decimals=3).astype(str),
                wt_d4=lambda df: df['wt'].str.extract('(\\d+)\\.', expand=False),
                wt_d123=lambda df: df['wt'].str.extract('\\.(\\d+)', expand=False))
                wt_d123=lambda df:
                    df['wt_d123'].str.pad(3, side='right', fillchar='0'),
                wt=lambda df: df['wt_d4'] + ',' + df['wt_d123'])
            .drop(columns=['wt_d4', 'wt_d123']))
        .loc[:, ['make_model', 'wt', 'wt_1000s']]
              make_model     wt  wt_1000s
    0          Mazda RX4  2,620     2.620
    1      Mazda RX4 Wag  2,875     2.875
    2         Datsun 710  2,320     2.320
    3     Hornet 4 Drive  3,215     3.215
    4  Hornet Sportabout  3,440     3.440
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt             object
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    wt_1000s      float64
    dtype: object

    This is another posible solution.

    This solution make use of the str.replace() method. The default behavior for the pat parameter is a raw string. The regex parameter can be use to return it to a literal string if needed. The repl parameter is a literal string and bachslashes will need to be escaped.

    mtcars =  mtcars_in.copy(deep=True)
    mtcars =  (
                wt_1000s=lambda df: df['wt'],
                wt=lambda df: df['wt'] * 1000)
                wt=lambda df: (df['wt']
                    .str.replace('(...$)', ',\\1'))))
        .loc[:, ['make_model', 'wt', 'wt_1000s']]
              make_model     wt  wt_1000s
    0          Mazda RX4  2,620     2.620
    1      Mazda RX4 Wag  2,875     2.875
    2         Datsun 710  2,320     2.320
    3     Hornet 4 Drive  3,215     3.215
    4  Hornet Sportabout  3,440     3.440
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt             object
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    wt_1000s      float64
    dtype: object
  3. Convert the character variable you created in the prior exercise to a new numeric variable. Make the units of measure for this new variable 1,000 of pounds.

    mtcars =  (
                wt=lambda df: df['wt'].str.replace(',', ''))
                wt=lambda df: pd.to_numeric(df['wt']) / 1000))
        .loc[:, ['make_model', 'wt', 'wt_1000s']]
              make_model     wt  wt_1000s
    0          Mazda RX4  2.620     2.620
    1      Mazda RX4 Wag  2.875     2.875
    2         Datsun 710  2.320     2.320
    3     Hornet 4 Drive  3.215     3.215
    4  Hornet Sportabout  3.440     3.440
    make_model     object
    mpg           float64
    cyl             int64
    disp          float64
    hp              int64
    drat          float64
    wt            float64
    qsec          float64
    vs              int64
    am              int64
    gear            int64
    carb            int64
    wt_1000s      float64
    dtype: object