Please sign the sign in sheet for us.
The course material can be found at https://www.ssc.wisc.edu/sscc/pubs/DWE/
If you plan to use your own Windows laptop, you need to check the following.
Run the following code in the console
import pandas as pd
import os
df = (
pd.DataFrame(data={
'A': [1,2,3,4,5],
'B': [5,3,2,1,4]}))
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:\ProgramData\Anaconda3\Library\plugins\platforms'
import plotnine as p9
import matplotlib as plt
(p9.ggplot(data=df, mapping=p9.aes(x='A', y='B')) +
p9.geom_point() +
p9.theme_bw())
If the above code does not run withouut error, you will need to use Winstat.
If you are using Winstat, you will need to do the following.
Open a web browser
Login to Winstat using your SSCC account.
If you do not have an SSCC account, we have a guest account you can use. See one of the SSCC staff in the room for a guest account and password.
The purpose of the course is to explain how (structured) data is prepared for further analysis. The intent is to focus on the data.
Programming skills are needed to apply these data wrangling skills. The course will cover programming skills that are needed to do data preparation.
Python is the programming language that will be used in this class. Python, like R, has many packages that provide additional functionality. The pandas package will primarily be taught, using a "modern pandas" approach. While you will learn some Python skills, this is not a course to teach you to be a programmer. There is a lot about Python and programming that is not covered. You will be able to use pandas to wrangle data when you finish this course.
This course will use both RStudio and Spyder to demonstrate the use of Python. RStudio allows the integration of R and Python code (even in the same script) and integrates markdown, Bookdown, and git into the IDE. Spyder is one of the IDE's that is included with Anaconda. It is a more native Python IDE.
The data skills that will be covered in this course are part of what a data scientist does. As with programming skills, this is course is not meant to prepare you for being a data scientist. Rather this course teaches you to apply some of the tools that are used by and built by data scientist.
The course is organized into chapters and sections. Each section is a discourse on one particular data wrangling skill. Each section generally starts with a discussion of programming or data skills that will be use and is followed by examples and practice problems. Please stop me whenever you have questions.
The course will use post-it to signal me on your status when working on problems.
Red means you have a question or need help.
Yellow means you are working and doing alright on your own.
Blue means you are done.
You should have a post-it note up at all times when the class is working on problems.
Class will start at 2:15 each afternoon. If you are late, do your best to get caught up on your own. At the next practice time I can help you as time permits.
Comments and suggestion can be written on your post-it notes and left for me at the end of class. I would appreciate hearing how the class is going for you, what is working well for you, and suggestions for improvements.
Please make sure you have signed the sign in sheet before you leave each day. Thank you
We will do the following steps together as a class.
Open RStudio
Create an RStudio Project for the course material.
Copy the datasets folder into your project folder.
If you are on the sscc network, the datasets folder is in the following folder.
X:Tutorials
If you are not on the sscc network, the datasets folder can be downloaded from,
Using the file explore create a scripts folder.
Using the file explore create a exercises folder.