2.4 Importing csv files and parsers
The tidyverse function to read a csv file is read_csv()
.
The following are a few important parameters of read_csv()
.
file
, the path to the file to be imported.col_names
, setting this toFALSE
indicates the first row does not contains variable names.col_types
, setting this tocol()
uses guessed types for the columns. Alternatively, the parameters ofcol()
can be used to define the types of each column.na
, list of strings that indicate missing data.guess_max
, specifies the number of row to consider before making a guess of what type the columns are. The default value of 1000 works well on most csv files.skip
, number of lines at the front of the file to be ignored. This is used when a csv file contains metadata at the beginning of the file.
The read_*()
functions of the tidyverse use a common set
of parsers.
These parser are used to format data such as numeric, factors,
date and time, etc.
These parsers can be directly called to parse a column.
The parse_factor()
function will be demonstrated in the
Modifying variables section below.
2.4.1 Examples
Importing a csv file
cps_in <- read_csv(file.path("..", "datasets", "cps1.csv"), col_types = cols())
Warning: Missing column names filled in: 'X1' [1]
Note, one of the columns did not have name and the
read_csv()
function gave it a name.The
head()
function returns the beginning values of an objecthead(cps_in, 3)
# A tibble: 3 x 11 X1 trt age educ black hisp marr nodeg re74 re75 re78 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 0 45 11 0 0 1 1 21517. 25244. 25565. 2 2 0 21 14 0 0 0 0 3176. 5853. 13496. 3 3 0 38 12 0 0 1 0 23039. 25131. 25565.
The
glimpse()
function displays the column types and the first few values of each column of a data frame.glimpse(cps_in)
Observations: 15,992 Variables: 11 $ X1 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1... $ trt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ age <dbl> 45, 21, 38, 48, 18, 22, 48, 18, 48, 45, 34, 16, 53, 19, ... $ educ <dbl> 11, 14, 12, 6, 8, 11, 10, 11, 9, 12, 14, 10, 10, 12, 12,... $ black <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ hisp <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ marr <dbl> 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0,... $ nodeg <dbl> 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0,... $ re74 <dbl> 21516.6700, 3175.9710, 23039.0200, 24994.3700, 1669.2950... $ re75 <dbl> 25243.550, 5852.565, 25130.760, 25243.550, 10727.610, 18... $ re78 <dbl> 25564.670, 13496.080, 25564.670, 25564.670, 9860.869, 25...