
5.5 Date and time variables
These exercises use the MplsStops.csv
data set
Import the
MplsStops.csv
file.from pathlib import Path import pandas as pd import numpy as np
MplsStops_path = Path('..') / 'datasets' / 'MplsStops.csv' MplsStops_in = pd.read_csv(MplsStops_path) MplsStops_in = ( MplsStops_in .rename( columns={ 'idNum': 'id_num', 'citationIssued': 'citation_issued', 'personSearch': 'person_search', 'vehicleSearch': 'vehicle_search', 'preRace': 'pre_race', 'policePrecinct': 'police_precinct'})) MplsStops = MplsStops_in.copy(deep=True) print(MplsStops.dtypes)
Unnamed: 0 int64 id_num object date object problem object MDC object citation_issued object person_search object vehicle_search object pre_race object race object gender object lat float64 long float64 police_precinct int64 neighborhood object dtype: object
Create a day of the week variable.
MplsStops = ( MplsStops .assign(date = lambda df: pd.to_datetime(df['date'],infer_datetime_format=True)) .assign(day = lambda df: df['date'].dt.weekday_name)) (MplsStops .loc[:, ['id_num', 'date', 'day']] .head() .pipe(print))
id_num date day 0 17-000003 2017-01-01 00:00:42 Sunday 1 17-000007 2017-01-01 00:03:07 Sunday 2 17-000073 2017-01-01 00:23:15 Sunday 3 17-000092 2017-01-01 00:33:48 Sunday 4 17-000098 2017-01-01 00:37:58 Sunday
Create a variable that measures the amount of time that has passed between the prior stop and the current stop.
MplsStops = ( MplsStops .sort_values(by=['date']) .assign(prior_date = lambda df: df['date'].shift(1)) .assign(time_from_prior = lambda df: df['date'] - df['prior_date'])) (MplsStops .loc[:, ['date', 'prior_date', 'time_from_prior']] .head() .pipe(print))
date prior_date time_from_prior 0 2017-01-01 00:00:42 NaT NaT 1 2017-01-01 00:03:07 2017-01-01 00:00:42 00:02:25 2 2017-01-01 00:23:15 2017-01-01 00:03:07 00:20:08 3 2017-01-01 00:33:48 2017-01-01 00:23:15 00:10:33 4 2017-01-01 00:37:58 2017-01-01 00:33:48 00:04:10
On September 8th, 2017 Minneapolis swore in new police chief (story.) Create an indicator variable that identifies observations that occurred on September 9th or later in the data frame.
MplsStops = ( MplsStops .assign(new_chief = lambda df: df['date'] >= '2017-09-09 00:00:00')) (MplsStops .loc[:, ['id_num', 'date', 'day', 'new_chief']] .copy() .iloc[37583:37589, :] .pipe(print))
id_num date day new_chief 37583 17-344351 2017-09-08 23:57:21 Friday False 37584 17-344353 2017-09-08 23:58:43 Friday False 37585 17-344368 2017-09-09 00:09:59 Saturday True 37586 17-344369 2017-09-09 00:10:34 Saturday True 37587 17-344377 2017-09-09 00:15:30 Saturday True 37588 17-344382 2017-09-09 00:19:43 Saturday True