<- sample(1:10, 1)
x
x+ 1
x
sample(1:10, 1)
sample(1:10, 1) + 1
2 Data Objects
2.1 Warm-Up
sample(1:10, 1)
gives us a random number between 1 and 10.
Run the two blocks of code below in R.
In the first block, after you print
x
to the console, isx + 1
what you expect?In the second block, after you print
sample(1:10, 1)
, issample(1:10, 1) + 1
what you expect?
Run both blocks several more times. Do either of your answers change? Why?
2.2 Outcomes
Objective: To create, modify, and remove data objects.
Why it matters: Almost all of your work in R will involve data objects, everything from importing datasets to creating plots to fitting statistical models. An understanding of basic object operations is foundational for all your work in R.
Learning outcomes:
Fundamental Skills | Extended Skills |
|
|
Key functions and operators:
<-
ls()
rm()
2.3 Naming Data
Data objects in R have three main characteristics:
- type
- structure
- class
A data object may be created, used, and discarded in a single step, as we did in the second block of code in the warm-up. This is called anonymous data. Or, we can save a data object for use in a later step, as we did with the first block of code in the warm-up.
To store a data object in memory for later use, give it a name. Naming a data object is called assignment and is done with the assignment operator: <-
<- 2
x <- 3 y
After running that code, x
and y
are stored in our computer’s memory. You can list all data objects available in your current session with the ls()
function:
ls()
[1] "x" "y"
This will match what you have in the Environment pane in the top-right corner of RStudio.
When we close RStudio, our computer deletes all objects in memory. This helps us achieve reproducibility and avoid hidden dependencies across projects.
If your Environment has objects in it when you open RStudio, change your settings following the instructions here. If you want to preserve specific objects across R sessions or share objects with colleagues, learn how to save objects here.
2.3.1 Rules for Names
You can choose the names for your data objects, but you should follow a few conventions:
Use a consistent case. R is case-sensitive, so
x
andX
are different, as areage
andAge
andAGE
andaGe
.Begin names with a character. Try
income2000
instead of2000income
.Balance length and meaning. Shorter names are easier to read and type, but they are less informative to whoever reads your code after you (which is probably future you!). Longer names are more informative but also more prone to typos.
Compare the object names below. Which would you like to type today? Which would you like to read next year?
<- 25 # very short, very uninformative x <- 25 age <- 25 age_start <- 25 age_start_of_study <- 25 # very long, very informative age_at_beginning_of_data_collection
Combine multi-word names with camel case, dot case, or snake case. Spaces are not allowed within names, nor are most special characters. These can return all sorts of errors and unexpected results:
<- 2000 birth year
Error in parse(text = input): <text>:1:7: unexpected symbol 1: birth year ^
$year <- 2000 x
Warning in x$year <- 2000: Coercing LHS to a list
-year <- 2000 y
Error in y - year <- 2000: could not find function "-<-"
Instead, create multi-word names with your favorite of these three strategies:
<- 2000 # camel case birthYear <- 2000 # dot case birth.year <- 2000 # snake case birth_year
2.3.2 Non-Syntactic Names
You will encounter names that break these rules, called “non-syntactic” names. To work with these objects, you will need to surround their names with back ticks: `object_name`
2000income <- 50000
2000income
Error in parse(text = input): <text>:1:6: unexpected symbol
1: 2000income
^
`2000income` <- 50000
`2000income`
[1] 50000
Statistical models in R often contain an object called (Intercept)
which you can reference with `(Intercept)`
. You will also need this strategy to pull up the documentation for operators like %in%
:
`%in%` ?
2.3.3 Reusing Names
When you reuse a name by assigning new data to it, the old data is overwritten. This is considered a routine action, so there is no warning or error, nor a need to tell R you want to replace the old data.
y
[1] 3
<- 10
y y
[1] 10
R will even allow you to reuse the name with data of a different type or structure:
<- "hello"
y y
[1] "hello"
<- anscombe # built-in dataset
y y
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
2.4 Removing Data
Sometimes you will need to remove an object from your environment to free up memory or de-clutter your environment. One situation where you may need to do this is if you accidentally make multiple copies of a very large dataset.
Remove individuals objects with rm()
. Remove the object x
and then verify it is no longer in the list returned by ls()
:
rm(x)
ls()
[1] "2000income" "age"
[3] "age_at_beginning_of_data_collection" "age_start"
[5] "age_start_of_study" "birth_year"
[7] "birth.year" "birthYear"
[9] "y"
R does not have an “undo” button, so after you remove an object or reuse its name, the only way to restore the data is to rerun your previous code.
An alternative strategy that I do not recommend is to create objects with different names at every stage as a form of version control, like data_raw
then data_renamed
then data_renamed_ver2
then data_cleaned
then data_for_plotting
then data_merged2_ver3_final_final
. This approach often leads to confusion and mistakes.
Instead, break your scripts apart into smaller scripts that accomplish a single task. Your script should not be 3000 lines long and do everything from cleaning the data to plotting it to fitting statistical models. Shorter scripts are easier to manage, and they are easier to understand when you revisit your code years later. We will discuss project organization in First Steps with Dataframes.
2.5 Exercises
2.5.1 Fundamental
Give
x
the value 3. Then give it the value 5. Printx
after each command to check its value.List all objects in the environment.
Remove
x
from the environment.
2.5.2 Extended
Make these object names syntactic, consistent, and meaningful. There is no one right answer. You can apply your own style and make decisions about what a name might mean.
income1 INCOME2 income2 3income birth date y year_of_birth state$of$residence
Run this code, which will create 26 objects in your environment. Then, remove all of them. Hint: see
?rm
.for (i in 1:length(letters)) { assign(letters[i], i) }