
5.7 Relationships between columns
These exercises use the Chile.csv
data set.
Import the
Chile.csv
file.from pathlib import Path import pandas as pd import numpy as np
chile_path = Path('..') / 'datasets' / 'Chile.csv' chile_in = pd.read_csv(chile_path) chile_in = chile_in.rename(columns={'statusquo': 'status_quo'}) chile = ( chile_in .copy(deep=True) .drop('Unnamed: 0', axis='columns')) print(chile.dtypes)
region object population int64 sex object age float64 education object income float64 status_quo float64 vote object dtype: object
Find all rows with a missing value in any column using a related columns method.
chile_na_rows = ( chile .assign(missing=lambda df: df .isna() .any(axis='columns') >= 1) .query('missing') .drop('missing', axis='columns')) print(chile_na_rows.head())
region population sex age education income status_quo vote 12 N 175000 F 27.0 PS NaN 1.43448 Y 14 N 175000 M 36.0 PS 35000.0 1.49026 NaN 27 N 175000 F 43.0 P NaN 0.15489 A 75 N 125000 F 32.0 S NaN -0.85035 N 97 N 125000 F 34.0 P 2500.0 0.10807 NaN