 Supporting Statistical Analysis for Research
  Supporting Statistical Analysis for Research
 4.6 Subsets of a data frame
- Import the - PSID.csvdata set that was imported in the prior section.- from pathlib import Path import pandas as pd- psid_path = Path('..') / 'datasets' / 'PSID.csv' psid_in = pd.read_csv(psid_path) psid_in = ( psid_in .rename( columns={ 'Unnamed: 0': 'obs_num', 'intnum': 'intvw_num', 'persnum': 'person_id', 'married': 'marital_status'})) psid = psid_in.copy(deep=True) print(psid.dtypes)- obs_num int64 intvw_num int64 person_id int64 age int64 educatn float64 earnings int64 hours int64 kids int64 marital_status object dtype: object- The obs_num variable is retained for these exaple. The examples of this section operate on row numbers and this variable has the row numbers. 
- Display the last three rows of the data frame using positional values to subset. - (psid .iloc[-3:, :] .pipe(print))- obs_num intvw_num person_id age ... earnings hours kids marital_status 4853 4854 9302 1 37 ... 22045 2793 98 divorced 4854 4855 9305 2 40 ... 134 30 3 married 4855 4856 9306 2 37 ... 33000 2423 4 married [3 rows x 9 columns]- Displaying using the - tail()function to confirm the correct three rows are displayed.- (psid .tail(3) .pipe(print))- obs_num intvw_num person_id age ... earnings hours kids marital_status 4853 4854 9302 1 37 ... 22045 2793 98 divorced 4854 4855 9305 2 40 ... 134 30 3 married 4855 4856 9306 2 37 ... 33000 2423 4 married [3 rows x 9 columns]
- Display the first, third, fifth, and seventh rows of columns two and three. - (psid .iloc[[0, 2, 4, 6], :] .pipe(print))- obs_num intvw_num person_id age ... earnings hours kids marital_status 0 1 4 4 39 ... 77250 2940 2 married 2 3 4 7 33 ... 8000 693 1 married 4 5 5 2 47 ... 6500 1683 5 married 6 7 6 172 38 ... 7000 1144 3 married [4 rows x 9 columns]
- Create a smaller data frame using the first 20 rows. - psid_small = psid.iloc[1:20, :]