5.6 Related observations
These exercises use the mtcars.csv
data set.
Import the
mtcars.csv
data set.from pathlib import Path import pandas as pd import numpy as np
mtcars_path = Path('..') / 'datasets' / 'mtcars.csv' mtcars_in = pd.read_csv(mtcars_path) mtcars_in = mtcars_in.rename(columns={'Unnamed: 0': 'make_model'}) mtcars = mtcars_in.copy(deep=True) print(mtcars.dtypes)
make_model object mpg float64 cyl int64 disp float64 hp int64 drat float64 wt float64 qsec float64 vs int64 am int64 gear int64 carb int64 dtype: object
Find the most efficient car (mpg) for each number of cylinders.
best_mpg_cyl = ( mtcars .assign(mpg_rank = lambda df: df .groupby('cyl') ['mpg'] .rank(method='min', ascending=False))) (best_mpg_cyl .query('mpg_rank == 1') .loc[:, ['make_model', 'cyl', 'mpg', 'disp']] .head() .pipe(print))
make_model cyl mpg disp 3 Hornet 4 Drive 6 21.4 258.0 19 Toyota Corolla 4 33.9 71.1 24 Pontiac Firebird 8 19.2 400.0
The weight of a car is a major contributor to how efficient it is. Create a variable that measures mpg per unit of weight. Plot this new variable against
hp
and thendisp
, These plots should consider the relationship with the number of cylinders. From these plots, doeshp
ordisp
seem to be more related to the new variable when considering the number of cylinders?mtcars = ( mtcars .assign(mpg_per_wt = lambda df: df['mpg'] / df['wt'])) (mtcars .loc[:, ['make_model', 'wt', 'mpg', 'mpg_per_wt']] .head() .pipe(print))
make_model wt mpg mpg_per_wt 0 Mazda RX4 2.620 21.0 8.015267 1 Mazda RX4 Wag 2.875 21.0 7.304348 2 Datsun 710 2.320 22.8 9.827586 3 Hornet 4 Drive 3.215 21.4 6.656299 4 Hornet Sportabout 3.440 18.7 5.436047
print( p9.ggplot(mtcars, p9.aes(x='disp', y='mpg_per_wt')) + p9.geom_point() + p9.facet_wrap('~cyl') + p9.theme_bw())
<ggplot: (-9223371893261961419)>
print( p9.ggplot(mtcars, p9.aes(x='hp', y='mpg_per_wt')) + p9.geom_point() + p9.facet_wrap('~cyl') + p9.theme_bw())
<ggplot: (-9223371893262604358)>
Both
hp
anddisp
seem to be related tompg_per_wt
. Thedisp
variable seems to have a stronger relationship withmpg_per_wt
.Find the least efficient car (using the new variable that considers both mpg and weight) for each number of cylinders and gear combination. Exclude any combination that does not have at least two observations.
eff_cyl_gear = ( mtcars .assign( num_group_obs = lambda df: df .groupby(['cyl', 'gear']) ['mpg_per_wt'] .transform('size'), efficiency_rank = lambda df: df .groupby(['cyl', 'gear']) ['mpg_per_wt'] .rank(method='min', ascending=True)) .query('num_group_obs >= 2 & efficiency_rank == 1')) (eff_cyl_gear .loc[:, ['make_model', 'cyl', 'gear', 'mpg_per_wt', 'mpg', 'disp']] .sort_values(['cyl', 'gear', 'mpg_per_wt']) .head(5) .pipe(print))
make_model cyl gear mpg_per_wt mpg disp 8 Merc 230 4 4 7.238095 22.8 140.8 26 Porsche 914-2 4 5 12.149533 26.0 120.3 5 Valiant 6 3 5.231214 18.1 225.0 10 Merc 280C 6 4 5.174419 17.8 167.6 15 Lincoln Continental 8 3 1.917404 10.4 460.0