5.3 Numeric variables
These exercises use the mtcars.csv
data set.
Import the
mtcars.csv
data set.from pathlib import Path import pandas as pd import numpy as np
mtcars_path = Path('..') / 'datasets' / 'mtcars.csv' mtcars_in = pd.read_csv(mtcars_path) mtcars_in = mtcars_in.rename(columns={'Unnamed: 0': 'make_model'}) mtcars = mtcars_in.copy(deep=True) print(mtcars.dtypes)
make_model object mpg float64 cyl int64 disp float64 hp int64 drat float64 wt float64 qsec float64 vs int64 am int64 gear int64 carb int64 dtype: object
The
wt
variable is measured in thousands of pounds. Change this variable to a character variable that has a comma separating the thousands digit from the hundreds digit, e.g. 2.14 becomes 2,140.Hint, for one of the possible solution you may find it useful to look for a string function/method that will pad. Padding adds characters to a fixed width.
mtcars = ( mtcars .assign( wt_1000s=lambda df: df['wt'], wt=lambda df: df['wt'].round(decimals=3).astype(str), wt_d4=lambda df: df['wt'].str.extract('(\\d+)\\.', expand=False), wt_d123=lambda df: df['wt'].str.extract('\\.(\\d+)', expand=False)) .assign( wt_d123=lambda df: df['wt_d123'].str.pad(3, side='right', fillchar='0'), wt=lambda df: df['wt_d4'] + ',' + df['wt_d123']) .drop(columns=['wt_d4', 'wt_d123'])) (mtcars .loc[:, ['make_model', 'wt', 'wt_1000s']] .head(5) .pipe(print))
make_model wt wt_1000s 0 Mazda RX4 2,620 2.620 1 Mazda RX4 Wag 2,875 2.875 2 Datsun 710 2,320 2.320 3 Hornet 4 Drive 3,215 3.215 4 Hornet Sportabout 3,440 3.440
print(mtcars.dtypes)
make_model object mpg float64 cyl int64 disp float64 hp int64 drat float64 wt object qsec float64 vs int64 am int64 gear int64 carb int64 wt_1000s float64 dtype: object
This is another posible solution.
This solution make use of the
str.replace()
method. The default behavior for thepat
parameter is a raw string. Theregex
parameter can be use to return it to a literal string if needed. Therepl
parameter is a literal string and bachslashes will need to be escaped.mtcars = mtcars_in.copy(deep=True) mtcars = ( mtcars .assign( wt_1000s=lambda df: df['wt'], wt=lambda df: df['wt'] * 1000) .assign( wt=lambda df: (df['wt'] .round(0) .astype(int) .astype(str) .str.replace('(...$)', ',\\1')))) (mtcars .loc[:, ['make_model', 'wt', 'wt_1000s']] .head(5) .pipe(print))
make_model wt wt_1000s 0 Mazda RX4 2,620 2.620 1 Mazda RX4 Wag 2,875 2.875 2 Datsun 710 2,320 2.320 3 Hornet 4 Drive 3,215 3.215 4 Hornet Sportabout 3,440 3.440
print(mtcars.dtypes)
make_model object mpg float64 cyl int64 disp float64 hp int64 drat float64 wt object qsec float64 vs int64 am int64 gear int64 carb int64 wt_1000s float64 dtype: object
Convert the character variable you created in the prior exercise to a new numeric variable. Make the units of measure for this new variable 1,000 of pounds.
mtcars = ( mtcars .assign( wt=lambda df: df['wt'].str.replace(',', '')) .assign( wt=lambda df: pd.to_numeric(df['wt']) / 1000)) (mtcars .loc[:, ['make_model', 'wt', 'wt_1000s']] .head(5) .pipe(print))
make_model wt wt_1000s 0 Mazda RX4 2.620 2.620 1 Mazda RX4 Wag 2.875 2.875 2 Datsun 710 2.320 2.320 3 Hornet 4 Drive 3.215 3.215 4 Hornet Sportabout 3.440 3.440
print(mtcars.dtypes)
make_model object mpg float64 cyl int64 disp float64 hp int64 drat float64 wt float64 qsec float64 vs int64 am int64 gear int64 carb int64 wt_1000s float64 dtype: object