
5.4 Factors and Indicators
These exercises use the mtcars.csv
data set.
Import the
mtcars.csv
data set.from pathlib import Path import pandas as pd import numpy as np
mtcars_path = Path('..') / 'datasets' / 'mtcars.csv' mtcars_in = pd.read_csv(mtcars_path) mtcars_in = mtcars_in.rename(columns={'Unnamed: 0': 'make_model'}) mtcars = mtcars_in.copy(deep=True) print(mtcars.dtypes)
make_model object mpg float64 cyl int64 disp float64 hp int64 drat float64 wt float64 qsec float64 vs int64 am int64 gear int64 carb int64 dtype: object
Factor the
cyl
,gear
andcarb
variables.mtcars = ( mtcars .apply( func=lambda x: x.astype('category') if x.name in ['cyl', 'gear', 'carb'] else x)) print(mtcars.dtypes)
make_model object mpg float64 cyl category disp float64 hp int64 drat float64 wt float64 qsec float64 vs int64 am int64 gear category carb category dtype: object
or
mtcars = mtcars_in.copy(deep=True) cyl_lev = pd.Series(mtcars['cyl'].unique()).sort_values() gear_lev = pd.Series(mtcars['gear'].unique()).sort_values() carb_lev = pd.Series(mtcars['carb'].unique()).sort_values() mtcars = ( mtcars .assign( cyl = lambda df: pd.Categorical(df['cyl'], categories=cyl_lev), gear = lambda df: pd.Categorical(df['gear'], categories=gear_lev), carb = lambda df: pd.Categorical(df['carb'], categories=carb_lev))) print(mtcars.dtypes)
make_model object mpg float64 cyl category disp float64 hp int64 drat float64 wt float64 qsec float64 vs int64 am int64 gear category carb category dtype: object
Create a variable that identifies the observations that are in the top 25 percent of miles per gallon. Display a few of these vehicles.
Hint, you will need to find a function to identify the percentage points of a variable.
Note, that the quantile function returns a series.
mtcars = ( mtcars .assign( efficient = lambda df: np.where( df['mpg'] >= df['mpg'].quantile([0.75]).at[0.75], True, False))) (mtcars .loc[:, ['make_model', 'mpg', 'efficient']] .head() .pipe(print))
make_model mpg efficient 0 Mazda RX4 21.0 False 1 Mazda RX4 Wag 21.0 False 2 Datsun 710 22.8 True 3 Hornet 4 Drive 21.4 False 4 Hornet Sportabout 18.7 False
or
mtcars = ( mtcars .assign( efficient = lambda df: np.where( df['mpg'] >= df['mpg'].quantile([0.75]).iloc[0], True, False))) (mtcars .loc[:, ['make_model', 'mpg', 'efficient']] .head() .pipe(print))
make_model mpg efficient 0 Mazda RX4 21.0 False 1 Mazda RX4 Wag 21.0 False 2 Datsun 710 22.8 True 3 Hornet 4 Drive 21.4 False 4 Hornet Sportabout 18.7 False
Create a variables that bins the values of
hp
using the following amounts of hp: 100, 170, 240, and 300.mtcars = ( mtcars .assign( power = lambda df: pd.cut(df['hp'], bins=[-np.inf, 100, 170, 240, 300, np.inf], labels=['gocart', 'slow', 'typical', 'fast', 'beast']))) (mtcars .loc[:, ['make_model', 'mpg', 'efficient', 'power']] .head() .pipe(print))
make_model mpg efficient power 0 Mazda RX4 21.0 False slow 1 Mazda RX4 Wag 21.0 False slow 2 Datsun 710 22.8 True gocart 3 Hornet 4 Drive 21.4 False slow 4 Hornet Sportabout 18.7 False typical