SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

5.5 Date and time variables

These exercises use the MplsStops.csv data set

  1. Import the MplsStops.csv file.

    from pathlib import Path
    import pandas as pd
    import numpy as np
    MplsStops_path = Path('..') / 'datasets' / 'MplsStops.csv'
    MplsStops_in = pd.read_csv(MplsStops_path)
    MplsStops_in = (
        MplsStops_in
            .rename(
                columns={
                    'idNum': 'id_num',
                    'citationIssued': 'citation_issued',
                    'personSearch': 'person_search',
                    'vehicleSearch': 'vehicle_search',
                    'preRace': 'pre_race',
                    'policePrecinct': 'police_precinct'}))
    MplsStops =  MplsStops_in.copy(deep=True)
    
    print(MplsStops.dtypes)
    Unnamed: 0           int64
    id_num              object
    date                object
    problem             object
    MDC                 object
    citation_issued     object
    person_search       object
    vehicle_search      object
    pre_race            object
    race                object
    gender              object
    lat                float64
    long               float64
    police_precinct      int64
    neighborhood        object
    dtype: object
  2. Create a day of the week variable.

    MplsStops = (
        MplsStops
            .assign(date = lambda df:
                pd.to_datetime(df['date'],infer_datetime_format=True))
            .assign(day = lambda df:
                df['date'].dt.weekday_name))
    
    (MplsStops
        .loc[:, ['id_num', 'date', 'day']]
        .head()
        .pipe(print))
          id_num                date     day
    0  17-000003 2017-01-01 00:00:42  Sunday
    1  17-000007 2017-01-01 00:03:07  Sunday
    2  17-000073 2017-01-01 00:23:15  Sunday
    3  17-000092 2017-01-01 00:33:48  Sunday
    4  17-000098 2017-01-01 00:37:58  Sunday
  3. Create a variable that measures the amount of time that has passed between the prior stop and the current stop.

    MplsStops = (
        MplsStops
            .sort_values(by=['date'])
            .assign(prior_date = lambda df:
                df['date'].shift(1))
            .assign(time_from_prior = lambda df:
                df['date'] - df['prior_date']))
    
    (MplsStops
        .loc[:, ['date', 'prior_date', 'time_from_prior']]
        .head()
        .pipe(print))
                     date          prior_date time_from_prior
    0 2017-01-01 00:00:42                 NaT             NaT
    1 2017-01-01 00:03:07 2017-01-01 00:00:42        00:02:25
    2 2017-01-01 00:23:15 2017-01-01 00:03:07        00:20:08
    3 2017-01-01 00:33:48 2017-01-01 00:23:15        00:10:33
    4 2017-01-01 00:37:58 2017-01-01 00:33:48        00:04:10
  4. On September 8th, 2017 Minneapolis swore in new police chief (story.) Create an indicator variable that identifies observations that occurred on September 9th or later in the data frame.

    MplsStops = (
        MplsStops
            .assign(new_chief = lambda df:
                df['date'] >= '2017-09-09 00:00:00'))
    
    (MplsStops
        .loc[:, ['id_num', 'date', 'day', 'new_chief']]
        .copy()
        .iloc[37583:37589, :]
        .pipe(print))
              id_num                date       day  new_chief
    37583  17-344351 2017-09-08 23:57:21    Friday      False
    37584  17-344353 2017-09-08 23:58:43    Friday      False
    37585  17-344368 2017-09-09 00:09:59  Saturday       True
    37586  17-344369 2017-09-09 00:10:34  Saturday       True
    37587  17-344377 2017-09-09 00:15:30  Saturday       True
    37588  17-344382 2017-09-09 00:19:43  Saturday       True