Please sign the sign in sheet for us.
The course material can be found at https://www.ssc.wisc.edu/sscc/pubs/DWE/
If you plan to use your own Windows laptop, you need to check the following.
Run the following code in the console of a windows machine
library(tidyverse)
If the above code does not run withouut error, you will need to use one of the classroom computer for at least today's class.
If you are using one of the classroom computers, you will need to do the following.
Open a web brouser
Login to Winstat using your SSCC account.
If you do not have an SSCC account, we have a guest account you can use. See one of the SSCC staff in the room for a guest account and password.
The purpose of the course is to explain how (structured) data is prepared for further analysis. The intent is to focus on the data.
Programming skills are needed to apply these data wrangling skills. The course will the cover programming skills that are needed to do data preparation.
R is the programming language that will be used in this class. R, like Python, has many packages that provide additional functionality. The tidyverse package will primarily be taught. While you will learn some R skills, this is not a course to teach you to be an R programmer. There is a lot about R and programming that is not covered. You will be able to use the tidyverse to wrangle data when you finish this course.
This course will use RStudio to demonstrate the use of the tidyverse. RStudio allows the integration of R and Python code (even in the same script) and integrates markdown, Bookdown, and git into the IDE.
The data skills that will be covered in this course are part of what a data scientist does. As with programming skills, this is course is not meant to prepare you for being a data scientist. Rather this course teaches you to apply some of the tools that are used by and built by data scientist.
The course is organized into chapters and sections. Each section is a discourse on one particular data wrangling skill. Each section generally starts with a discussion of programming or data skills that will be used and is followed by examples and practice problems. Please stop me whenever you have questions.
The course will use post-it to signal me on your status when working on problems.
Red means you have a question or need help.
Yellow means you are working and doing alright on your own.
Blue means you are done.
You should have a post-it note up at all times when the class is working on problems.
Class will start at 1. If you are late, do your best to get caught up on your own. At the next practice time I can help you as time permits.
Comments and suggestion can be written on your post-it notes and left for me at the end of class. I would appreciate hearing how the class is going for you, what is working well for you, and suggestions for improvements.
Please make sure you have signed the sign in sheet before you leave each day. Thank you
We will do the following steps together as a class.
Open RStudio
Create an RStudio Project for the course material.
Copy the datasets folder into your project folder.
If you are on the sscc network, the datasets folder is in the following folder.
X:\SSCC Tutorials\DWE
If you are not on the sscc network, the datasets folder can be downloaded from,
Using the file explore create a scripts folder.
Using the file explore create a exercises folder.