Stata Workshop: Importing and Combining Large Numbers of Data Files

In this Stata workshop we'll take on a task that many SSCC researchers have faced: importing a large number of data files and combining them into a single data set. Just to make things interesting, we'll add a couple more real-world challenges: the data files will be fixed format text files rather than the more familiar and easier to use delimited files, and they'll contain more than one type of record.

Those who wish to attend should have a solid understanding of basic Stata syntax, such as you'll get from our Stata for Researchers class. You should also be familiar with loops and macros, so if you've never used them plan on taking our Stata Programming class first. This workshop will extend what you learn in that class to cover loops over files.

You'll also learn how to read fixed format text files into Stata, how to use "branching if" statements (very different from the usual "subsetting if"), and how to detect and handle errors.

Since jobs like this can take a long time, you'll learn how to run them on Linstat, the SSCC's Linux computing cluster, where they'll run faster and you don't have to stay logged in while they run. You'll also learn how to use HTCondor to split the job up into smaller parts and run them at the same time on multiple servers.

By design, many of these skills will be useful for any larger-scale project. You'll also gain practical experience in getting things done in Stata.

Instructor: Dimond
Room: 3218 Sewell Social Sciences Building
Date: 2/19
Time: 2:00 - 3:15
Semester: spring19