5.4 Factors and Indicators

SSCC - Social Science Computing Cooperative

Supporting Statistical Analysis for Research

These exercises use the mtcars.csv data set.

Import the mtcars.csv data set.

from pathlib import Path
import pandas as pd
import numpy as np

mtcars_path = Path('..') / 'datasets' / 'mtcars.csv'
mtcars_in = pd.read_csv(mtcars_path)
mtcars_in = mtcars_in.rename(columns={'Unnamed: 0': 'make_model'})
mtcars =  mtcars_in.copy(deep=True)

print(mtcars.dtypes)

make_model     object
mpg           float64
cyl             int64
disp          float64
hp              int64
drat          float64
wt            float64
qsec          float64
vs              int64
am              int64
gear            int64
carb            int64
dtype: object

Factor the cyl, gear and carb variables.

mtcars = (
    mtcars
        .apply(
            func=lambda x: x.astype('category')
            if x.name in ['cyl', 'gear', 'carb'] else x))

print(mtcars.dtypes)

make_model      object
mpg            float64
cyl           category
disp           float64
hp               int64
drat           float64
wt             float64
qsec           float64
vs               int64
am               int64
gear          category
carb          category
dtype: object

mtcars =  mtcars_in.copy(deep=True)

cyl_lev = pd.Series(mtcars['cyl'].unique()).sort_values()
gear_lev = pd.Series(mtcars['gear'].unique()).sort_values()
carb_lev = pd.Series(mtcars['carb'].unique()).sort_values()
mtcars = (
    mtcars
        .assign(
            cyl = lambda df:
                pd.Categorical(df['cyl'], categories=cyl_lev),
            gear = lambda df:
                pd.Categorical(df['gear'], categories=gear_lev),
            carb = lambda df:
                pd.Categorical(df['carb'], categories=carb_lev)))

print(mtcars.dtypes)

make_model      object
mpg            float64
cyl           category
disp           float64
hp               int64
drat           float64
wt             float64
qsec           float64
vs               int64
am               int64
gear          category
carb          category
dtype: object

Create a variable that identifies the observations that are in the top 25 percent of miles per gallon. Display a few of these vehicles.

Hint, you will need to find a function to identify the percentage points of a variable.

Note, that the quantile function returns a series.

mtcars = (
    mtcars
        .assign(
            efficient = lambda df:
                np.where(
                    df['mpg'] >= df['mpg'].quantile([0.75]).at[0.75],
                    True,
                    False)))
(mtcars
    .loc[:, ['make_model', 'mpg', 'efficient']]
    .head()
    .pipe(print))

          make_model   mpg  efficient
0          Mazda RX4  21.0      False
1      Mazda RX4 Wag  21.0      False
2         Datsun 710  22.8       True
3     Hornet 4 Drive  21.4      False
4  Hornet Sportabout  18.7      False

mtcars = (
    mtcars
        .assign(
            efficient = lambda df:
                np.where(
                    df['mpg'] >= df['mpg'].quantile([0.75]).iloc[0],
                    True,
                    False)))
(mtcars
    .loc[:, ['make_model', 'mpg', 'efficient']]
    .head()
    .pipe(print))

          make_model   mpg  efficient
0          Mazda RX4  21.0      False
1      Mazda RX4 Wag  21.0      False
2         Datsun 710  22.8       True
3     Hornet 4 Drive  21.4      False
4  Hornet Sportabout  18.7      False

Create a variables that bins the values of hp using the following amounts of hp: 100, 170, 240, and 300.

mtcars = (
    mtcars
        .assign(
            power = lambda df:
                pd.cut(df['hp'],
                    bins=[-np.inf, 100, 170, 240, 300, np.inf],
                    labels=['gocart', 'slow', 'typical',
                            'fast', 'beast'])))

(mtcars
    .loc[:, ['make_model', 'mpg', 'efficient', 'power']]
    .head()
    .pipe(print))

          make_model   mpg  efficient    power
0          Mazda RX4  21.0      False     slow
1      Mazda RX4 Wag  21.0      False     slow
2         Datsun 710  22.8       True   gocart
3     Hornet 4 Drive  21.4      False     slow
4  Hornet Sportabout  18.7      False  typical