Stata for Students: Reading Data from a Spreadsheet or CSV File

This article is part of the Stata for Students series. If you are new to Stata we strongly recommend reading all the articles in the Stata Basics section.

Stata can use many kinds of data files. In this section we'll talk about how to import two of the most common kinds of data files: Excel spreadsheets and CSV (comma-separated variable) files.

If you want to carry out the examples for this section, click on the following links to download a GSS sample in Excel format and a GSS sample in CSV format, then move them to your U:\SFS folder as described in Managing Stata Files. Note that these files are subsets of the full GSS results. You can get the full data set from the GSS web site.

Importing Excel Spreadsheets

Normally you tell Stata what data set you want to use with the use command, but if the data set is an Excel spreadsheet, use import excel instead. Here's an example do file:

capture log close
log using imp.log, replace

clear all
set more off

import excel using gss2014, firstrow

save gss2014_from_excel, replace

log close

When you give the filename of the spreadsheet you do not need to include .xls or .xlsx, but if you do it needs to be the right one.

The firstrow option tells Stata that the first row of the spreadsheet contains the names of the variables. Otherwise it will name the variables A, B, C, etc.

There are other options that let you specify which which worksheet to read, or to only read in part of a spreadsheet so you can skip titles or notes that aren't really data. If you have a complicated spreadsheet you might click File, Import Excel, tweak the options in the dialog box you get until it can read the spreadsheet successfully, then copy the resulting command into your do file.

Importing CSV Files

A CSV file is really just a text file, but structured so that each line of text represents one observation and each variable is separated by a comma. Hence the name, comma-separated variable file. Text files that put a specific character between variables are also known as delimited files, and the Stata command to read them is import delimited.

capture log close
log using imp_csv.log, replace

clear all
set more off

import delimited using gss2014

save gss2014_from_csv, replace

log close

The import delimited command will try to figure out whether the first row contains variable names or data, and usually succeeds. If it gets it wrong, you can specify that the first row contains variable names with the varnames(1) option.

Importing an Excel spreadsheet or CSV file takes longer than loading a Stata data set, and there's no need to repeat that process every time you analyze the data. That's why these do files simply import a data set and save it as a Stata data set. You can then write a separate do file that loads the Stata data set and carries out your analysis.

Last Revised: 6/24/2016