SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

1 Data Objects

Data objects in R have three main characteristics:

  • Type
  • Structure
  • Class

A data object may be created, used, and discarded again all in the same step (anonymous data), or a data object may be saved for use in a later step.

1.1 Giving Names to Data

To store a data object, you give it a name. This is called assignment: we assign data values a name.

Assignment may be written several ways. The two most common are the “left arrow”, <-, and the single equals, =. The left arrow is generally preferred, because it is unambiguous. (The single equals sign is also used in function arguments, such as rnorm(n = 10, mean = 5.)
But the equals symbol is often used by people coming to R from other programming languages, so its use is fairly common. (See help("assignOps").)

x <- 7
y = 8

x + y
[1] 15

Assigning a name to data stores the data in your computer’s memory. You can list the objects available in your current session with

ls()
[1] "x" "y"

This will match what you have in the Environment pane in the top right corner of RStudio.

1.1.1 Good Names

There are a lot of naming conventions and advice to be found on the internet. Our free advice:

  • Use a consistent naming scheme. This will help you and anyone else who might need to read your code in the future.
  • Meaningful names are helpful. For instance use “age” rather than “x”.
  • Short names are easier to read and type. Long names make your code harder to scan. There is a trade-off between having meaningful names and having short names.

Some naming rules:

  • Capitalization matters. (age, Age, AGE, and aGe are all different.)
  • Begin a name with a character. (Try income2000 instead of 2000income.)
  • Keep names compact - no spaces.
  • Avoid most special characters (!@#$%^&*). The main exceptions are periods (.) and underscores (_), which are helpful in creating easy-to-read multi-word variables, like Petal.Length or birth_weight.

(You will eventually encounter names that violate these rules, non-syntactic names. For example, the base R function lm (regress) assigns the name (Intercept) to a coefficient. Non-syntactic names just make life more difficult.)

1.2 Removing Data

It is a good idea to clean up your workspace as you go, removing data objects that are no longer needed. This makes it easier to keep track of the key data objects you want to work with.

To clean up your workspace, use

remove(x)

ls()
[1] "y"

A common alias for remove() is rm(). You can use these functions to remove several objects at once. (See help("remove").)

1.3 Reusing Names

When you reuse an object name for assignment, you are throwing out the old data. This is considered a routine action, so there is no warning or error.

y
[1] 8
y <- c("red", "green", "blue")
y
[1] "red"   "green" "blue" 

1.4 Data Objects Exercises

  1. Assign the value 3 the name “v”, and the value 2 the name “w”. Then calculate \(v + w\).

  2. Bad Names - assign the value 7 the name “one”, and the value 2 the name “three”. Calculate \(one^{three}\). Never write code that looks like this!

  3. Capitalization - assign 8 to “a” and 3 to “A”. Calculate the mean of “a” and “A”.

    Here, “a” and “A” look different enough that this might be acceptable. However, “x” and “X” look similar enough that they would be a poor choice of names here.

  4. Try assigning a data value to the name “1$” (without the quotes). Try the name “1a” (again, no quotes). It is interesting that this gives two different error messages - any idea why?

    Remarkably, R allows you to use names like these. However, such non-syntactic names require you to use backticks (back quotes). In principle you could use unicode symbols as names, like \(\mu\) or \(\sigma\). However, current R code editing software does not make this easy - maybe in the future?

  5. Tidy up by removing all the data objects from your workspace.