Supporting Statistical Analysis for Research
5.5 Date and time variables
These exercises use the MplsStops.csv
data set
Import the
MplsStops.csv
file.from pathlib import Path import pandas as pd import numpy as np
MplsStops_path = Path('..') / 'datasets' / 'MplsStops.csv' MplsStops_in = pd.read_csv(MplsStops_path) MplsStops_in = ( MplsStops_in .rename( columns={ 'idNum': 'id_num', 'citationIssued': 'citation_issued', 'personSearch': 'person_search', 'vehicleSearch': 'vehicle_search', 'preRace': 'pre_race', 'policePrecinct': 'police_precinct'})) MplsStops = MplsStops_in.copy(deep=True) print(MplsStops.dtypes)
Unnamed: 0 int64 id_num object date object problem object MDC object citation_issued object person_search object vehicle_search object pre_race object race object gender object lat float64 long float64 police_precinct int64 neighborhood object dtype: object
Create a day of the week variable.
MplsStops = ( MplsStops .assign(date = lambda df: pd.to_datetime(df['date'],infer_datetime_format=True)) .assign(day = lambda df: df['date'].dt.weekday_name)) (MplsStops .loc[:, ['id_num', 'date', 'day']] .head() .pipe(print))
id_num date day 0 17-000003 2017-01-01 00:00:42 Sunday 1 17-000007 2017-01-01 00:03:07 Sunday 2 17-000073 2017-01-01 00:23:15 Sunday 3 17-000092 2017-01-01 00:33:48 Sunday 4 17-000098 2017-01-01 00:37:58 Sunday
Create a variable that measures the amount of time that has passed between the prior stop and the current stop.
MplsStops = ( MplsStops .sort_values(by=['date']) .assign(prior_date = lambda df: df['date'].shift(1)) .assign(time_from_prior = lambda df: df['date'] - df['prior_date'])) (MplsStops .loc[:, ['date', 'prior_date', 'time_from_prior']] .head() .pipe(print))
date prior_date time_from_prior 0 2017-01-01 00:00:42 NaT NaT 1 2017-01-01 00:03:07 2017-01-01 00:00:42 00:02:25 2 2017-01-01 00:23:15 2017-01-01 00:03:07 00:20:08 3 2017-01-01 00:33:48 2017-01-01 00:23:15 00:10:33 4 2017-01-01 00:37:58 2017-01-01 00:33:48 00:04:10
On September 8th, 2017 Minneapolis swore in new police chief (story.) Create an indicator variable that identifies observations that occurred on September 9th or later in the data frame.
MplsStops = ( MplsStops .assign(new_chief = lambda df: df['date'] >= '2017-09-09 00:00:00')) (MplsStops .loc[:, ['id_num', 'date', 'day', 'new_chief']] .copy() .iloc[37583:37589, :] .pipe(print))
id_num date day new_chief 37583 17-344351 2017-09-08 23:57:21 Friday False 37584 17-344353 2017-09-08 23:58:43 Friday False 37585 17-344368 2017-09-09 00:09:59 Saturday True 37586 17-344369 2017-09-09 00:10:34 Saturday True 37587 17-344377 2017-09-09 00:15:30 Saturday True 37588 17-344382 2017-09-09 00:19:43 Saturday True