Supporting Statistical Analysis for Research
5.5 Date and time variables
These exercises use the MplsStops.csv data set
Import the
MplsStops.csvfile.from pathlib import Path import pandas as pd import numpy as npMplsStops_path = Path('..') / 'datasets' / 'MplsStops.csv' MplsStops_in = pd.read_csv(MplsStops_path) MplsStops_in = ( MplsStops_in .rename( columns={ 'idNum': 'id_num', 'citationIssued': 'citation_issued', 'personSearch': 'person_search', 'vehicleSearch': 'vehicle_search', 'preRace': 'pre_race', 'policePrecinct': 'police_precinct'})) MplsStops = MplsStops_in.copy(deep=True) print(MplsStops.dtypes)Unnamed: 0 int64 id_num object date object problem object MDC object citation_issued object person_search object vehicle_search object pre_race object race object gender object lat float64 long float64 police_precinct int64 neighborhood object dtype: objectCreate a day of the week variable.
MplsStops = ( MplsStops .assign(date = lambda df: pd.to_datetime(df['date'],infer_datetime_format=True)) .assign(day = lambda df: df['date'].dt.weekday_name)) (MplsStops .loc[:, ['id_num', 'date', 'day']] .head() .pipe(print))id_num date day 0 17-000003 2017-01-01 00:00:42 Sunday 1 17-000007 2017-01-01 00:03:07 Sunday 2 17-000073 2017-01-01 00:23:15 Sunday 3 17-000092 2017-01-01 00:33:48 Sunday 4 17-000098 2017-01-01 00:37:58 SundayCreate a variable that measures the amount of time that has passed between the prior stop and the current stop.
MplsStops = ( MplsStops .sort_values(by=['date']) .assign(prior_date = lambda df: df['date'].shift(1)) .assign(time_from_prior = lambda df: df['date'] - df['prior_date'])) (MplsStops .loc[:, ['date', 'prior_date', 'time_from_prior']] .head() .pipe(print))date prior_date time_from_prior 0 2017-01-01 00:00:42 NaT NaT 1 2017-01-01 00:03:07 2017-01-01 00:00:42 00:02:25 2 2017-01-01 00:23:15 2017-01-01 00:03:07 00:20:08 3 2017-01-01 00:33:48 2017-01-01 00:23:15 00:10:33 4 2017-01-01 00:37:58 2017-01-01 00:33:48 00:04:10On September 8th, 2017 Minneapolis swore in new police chief (story.) Create an indicator variable that identifies observations that occurred on September 9th or later in the data frame.
MplsStops = ( MplsStops .assign(new_chief = lambda df: df['date'] >= '2017-09-09 00:00:00')) (MplsStops .loc[:, ['id_num', 'date', 'day', 'new_chief']] .copy() .iloc[37583:37589, :] .pipe(print))id_num date day new_chief 37583 17-344351 2017-09-08 23:57:21 Friday False 37584 17-344353 2017-09-08 23:58:43 Friday False 37585 17-344368 2017-09-09 00:09:59 Saturday True 37586 17-344369 2017-09-09 00:10:34 Saturday True 37587 17-344377 2017-09-09 00:15:30 Saturday True 37588 17-344382 2017-09-09 00:19:43 Saturday True