Supporting Statistical Analysis for Research
2.4 Importing csv files and parsers
The tidyverse function to read a csv file is read_csv().
The following are a few important parameters of read_csv().
file, the path to the file to be imported.col_names, setting this toFALSEindicates the first row does not contains variable names.col_types, setting this tocol()uses guessed types for the columns. Alternatively, the parameters ofcol()can be used to define the types of each column.na, list of strings that indicate missing data.guess_max, specifies the number of row to consider before making a guess of what type the columns are. The default value of 1000 works well on most csv files.skip, number of lines at the front of the file to be ignored. This is used when a csv file contains metadata at the beginning of the file.
The read_*() functions of the tidyverse use a common set
of parsers.
These parser are used to format data such as numeric, factors,
date and time, etc.
These parsers can be directly called to parse a column.
The parse_factor() function will be demonstrated in the
Modifying variables section below.
2.4.1 Examples
Importing a csv file
cps_in <- read_csv(file.path("..", "datasets", "cps1.csv"), col_types = cols())Warning: Missing column names filled in: 'X1' [1]Note, one of the columns did not have name and the
read_csv()function gave it a name.The
head()function returns the beginning values of an objecthead(cps_in, 3)# A tibble: 3 x 11 X1 trt age educ black hisp marr nodeg re74 re75 re78 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 0 45 11 0 0 1 1 21517. 25244. 25565. 2 2 0 21 14 0 0 0 0 3176. 5853. 13496. 3 3 0 38 12 0 0 1 0 23039. 25131. 25565.The
glimpse()function displays the column types and the first few values of each column of a data frame.glimpse(cps_in)Observations: 15,992 Variables: 11 $ X1 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1... $ trt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ age <dbl> 45, 21, 38, 48, 18, 22, 48, 18, 48, 45, 34, 16, 53, 19, ... $ educ <dbl> 11, 14, 12, 6, 8, 11, 10, 11, 9, 12, 14, 10, 10, 12, 12,... $ black <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ hisp <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ marr <dbl> 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0,... $ nodeg <dbl> 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0,... $ re74 <dbl> 21516.6700, 3175.9710, 23039.0200, 24994.3700, 1669.2950... $ re75 <dbl> 25243.550, 5852.565, 25130.760, 25243.550, 10727.610, 18... $ re78 <dbl> 25564.670, 13496.080, 25564.670, 25564.670, 9860.869, 25...