Data Wrangling in R
“Data wrangling” is the process of preparing data for analysis. If you want to model, plot, or even make tables of summary statistics from your data, you will need the skills taught in this book. These materials are regularly taught as a four-day workshop. Register here.
This book will teach you fundamental data wrangling skills such as:
- standardizing capitalization in text data
- recoding categorical variables
- changing variable names
- combining multiple datasets into one
…and much more.
This book is written for applied researchers, so technical details on R’s implementation are omitted. For example, for our purposes, both “integer” and “double” are just “numeric” data.
This book assumes you know the basics of the RStudio interface and the R language, such as:
- where to write and run code
- how to find documentation for a function
- how to install packages
If you do not know how to do those things, read the R Basics with RStudio materials or take the Introduction to R workshop where those materials are taught.
0.1 How to Use this Book
This book is structured so that you can work straight through it. The later chapters rely on skills from the earlier chapters. After you complete the book, you will have acquired basic skills for a variety of tasks.
If you look at the navigation bar on the left, you will see we do not begin to use dataframes until chapter 10. That is intentional. The progression of this book is from small to big. You will start with small bits of data you can see the whole of, and gain confidence manipulating them. Then you will progress to vectors, which are series of individual data points. After learning how to work with different kinds of vectors, you will finally start working with dataframes. Be patient. All that you learn in the first nine chapters is necessary to work with dataframes. Wax on, wax off.
This book is also intended to be a reference. Each chapter contains a lot of material. You will forget much of it, and that is okay. Bookmark this book and return to it later when you have questions.
0.2 How the Chapters are Structured
Each chapter of this book has four sections:
Warm-up: Exercises that introduce you to some of the concepts in the chapter. Some will ask you to use your current skills to solve a problem that is more easily solved with that chapter’s materials, while others illustrate situations where the materials will become useful.
Outcomes: An overview of the objectives, purpose, skills, and functions for each chapter.
Materials: Examples of R code, explanation of how the code works, and discussions of when the code is useful in real-world tasks.
Exercises: Opportunities to practice the essential skills from each chapter. No solutions are provided so that you do not shortcut the learning process. If you get stuck, take a break and try again later. If you are still stuck, email the help desk (link below).
The Outcome and Exercises reference “fundamental” and “extended” skills and exercises. Fundamental skills are necessary for most data wrangling tasks, while extended skills require skills that are either quicker ways of completing “fundamental” tasks or less commonly needed in data wrangling. You should complete at least the fundamental exercises in each chapter.