R for Researchers: Introduction
R is an extremely powerful programming language for statistical computing and graphics generation. It's flexibility, extensibility, and no cost have contributed to R's wide use in academic environments and among statisticians. R is a good choice for large or small statistical projects.
R is open source and is supported by an extensive user community. The R Development Core Team and CRAN are at the center of the user community. The core team oversees the evolution of the base set of functionality which is included when R is installed. CRAN, the Comprehensive R Archive Network, is a repository of additional functionality, called packages. CRAN packages are tested with each new core package and have user documentation available. All CRAN packages are supported by a member of the R user community. Any issues with a package are reported to and are addressed by the supporting member. This support provides a level of stability and quality to users. A great deal of additional functionality is available through CRAN.
RStudio provides an integrated development environment (IDE) for R users. This IDE provides support for project organization, source control, and document generation. RStudio will help you write R code faster and more efficiently. The use of RStudio is integrated into this article series.
About This Series
The goal of the R for Researchers article series is to provide you with a solid foundation in the use of the R language and the RStudio development environment. You will be able to build on this foundation to become an expert R user.
R for Researchers includes the following articles:
- Introduction
- R Projects
- R Markdown
- R Scripts
- Data preparation
- Data exploration
- Data presentation
- Regression (ordinary least squares)
- Regression Diagnostics
- Regression (generalized linear models)
- Regression Inference
These articles can be grouped into three chapters. The first chapter, articles two through four, are an introduction to the RStudio tools which are useful for researchers. The tools include source control, integrating R results into documents, and the use of R projects and script for reproducible research. This is the background necessary to be productive writing R code. The second chapter, articles five through seven, covers importing data, data types and structures, numerical and graphical summaries of data, and formatting data summaries for publication. These tools will equip you to prepare data for analysis. The third chapter, articles eight through the end, cover R functions used in building regression models.
The articles on RStudio tools and preparing data for analysis require little statistical knowledge. The regression articles demonstrate statistical regressions methods and assume the reader has prior statistical knowledge of regression methods. If you have a background in regression techniques, you will be prepared to apply them using R when you are done. If you are new to some or all of the statistical techniques, these articles will help you gain experience in the R commands used for regression and the general approach of regression in R. The Regression (GLM) article and the GLM sections in the the Regression inference article maybe skipped if you do not have exposure to GLM models. We recommend prior training in a statistics technique before applying the technique using R.
These articles are meant to be read in order. The material in each article builds on prior articles. Data sets are used across multiple articles. Examples and exercises in the articles build on the work of prior examples and exercises. This provides experience with the work flow of doing an analysis.
Each lesson includes examples and exercises, with solutions given for most of them. If you get stuck on an exercise it's probably best to review the provided solution. Reading other solutions is a great way to learn, even when you are able to do all the exercises yourself.
These articles cover some of the common commands used in data preparation and statistical analysis. It would be impossible to show all of what R can do, or what you might want to do, in these articles. With the skills from these articles you will have the background needed to add specific commands and options as you need them.
Materials for these articles
The instruction for loading the datasets needed to work the examples and exercises in these articles is included in the Data preparation article.
The instruction for the installing the packages used in the examples and exercises in these articles is included in the R Scripts article.
R versus RStudio
Both R (sometimes called base R) and RStudio execute the same R commands. Scripts and programs written in either tool will run in the other. Thus in many ways they are equivalent.
RStudio's IDE provides a set of convenient tabs to access many tool such as viewing plots or variables, package status and installation, and Git commands for source control. RStudio also supports the integration of R code and output into a variety of document formats. Base R can do all of this as well. However, these activities and tools must be managed and controlled individually in base R. You will likely find RStudio is a better environment for writing new R code.
Base R does have its uses as well. Base R is useful for batch jobs. Long jobs which are run on Linstat use base R. Base R is also useful for jobs which require management of very large amounts of data or for timing the execution of commands.
Running R at the SSCC
The SSCC makes both R and RStudio available on Winstat and in our computer labs. R is also available on Linstat and Condor. Most SSCC members run R or RStudio on Winstat, but some jobs require different resources. For details about the capabilities of the SSCC's servers see Computing Resources at the SSCC.
Windows vs. Linux
R scripts run and act the same whether they are running on Windows or Linux. However, Linstat (the SSCC's Linux computing cluster) has much more memory than Winstat (the SSCC's Windows terminal server farm), and is better suited for long jobs. Running R jobs on Linstat is probably easier than you think: see Using Linstat to learn how. Winstat is where initial data exploration is done and scripts would typically be developed (even if the scripts would be run on Linstat.) Winstat is also well suited for running small to moderate size jobs.
The SSCC Condor flock is ideal for running very long jobs (jobs that will take days, weeks or even longer) or for running multiple jobs at the same time.
Running R, RStudio and Git on your own computer
R, RStudio, and Git are all needed on your computer for this article series. They are all free downloads and run on Windows and Macs. Using RStudio on your own computer can be very effective. It is important to remember that files that are not on the SSCC network are not backed up by the SSCC. We recommend that you regularly save your files to the SSCC network. RStudio can make this easy through its integration with source control, see the central repository section of the R Scripts article.
It is best to install Git prior to R and RStudio. This reduces the amount of configuration required in RStudio.
R can be installed from the r-project website, R. You will need to click on the download R link and select a CRAN repository to download R from. Note: do not install R into a directory that includes a space in it's name. Installing R in the C:R_Program_Files directory on a PC avoids the space in the path issue.
RStudio can be installed from the RStudio website. You will need to click on the Download RStudio link. Note: install RStudio in same directory that you installed R in, see above.
Git can be installed from Git. This is all the Git functionality that is needed for this article series. Note: install Git in the default program directory and not with R and RStudio.
You can also install a Git GUI, which will provide a user interface for Git functionality beyond what is supported by RStudio. There are a number of free Git GUIs available. They all can be used to support your work in RStudio. We recommend SourceTree for use on individual computers. Winstat has Git Extensions installed, for technical reasons unique to WinStat.
Next: R Projects
Last Revised: 4/13/2015