This article is part of the Stata for Students series. If you are new to Stata we strongly recommend reading all the articles in the Stata Basics section.
The describe command gives you a variety of useful information about your data set.
Setting Up
If you plan to carry out the examples in this article, make sure you've downloaded the GSS sample to your U:\SFS folder as described in Managing Stata Files. Then create a do file called desc.do in that folder as described in Doing Your Work Using Do Files and start with the following code:
capture log close
log using desc.log, replace
clear all
set more off
use gss_sample
// do work here
log close
If you plan on applying what you learn directly to your homework, create a similar do file but have it load the data set used for your assignment.
Using describe
If you run describe all by itself, you'll get a description of all the variables in the data set:
describe
Produces the output:
Contains data from U:\sfs\gss_sample.dta obs: 254 vars: 895 22 Jun 2016 15:52 size: 277,622 ----------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label ----------------------------------------------------------------------------------------------------------------- prestg10 byte %8.0g LABA Rs occupational prestige score (2010) sppres10 byte %8.0g LABA Spouse occupational prestige score (2010) papres10 byte %8.0g LABA Father's occupational prestige score (2010) mapres10 byte %8.0g LABA Mother's occupational prestige score (2010) prestg105plus byte %8.0g LABA Rs occupational prestige score using threshold method (2010) sppres105plus byte %8.0g LABA Spouse occupational prestige score using threshold method (2010) papres105plus byte %8.0g LABA Father's occupational prestige score using threshold method (2010) mapres105plus byte %8.0g LABA Mother's occupational prestige score using threshold method (2010) sei10 double %12.0g LABB R's socioeconomic index (2010) spsei10 double %12.0g LABB R's spouse's socioeconomic index (2010) pasei10 double %12.0g LABB R's father's socioeconomic index (2010) masei10 double %12.0g LABB R's mother's socioeconomic index (2010) sei10educ double %12.0g LABB Percentage of some college educ in OCC10 based on ACS 2010 spsei10educ double %12.0g LABB Percentage of some college educ in SPOCC10 based on ACS 2010 pasei10educ double %12.0g LABB Percentage of some college educ in PAOCC10 based on ACS 2010 masei10educ double %12.0g LABB Percentage of some college educ in MAOCC10 based on ACS 2010
This is just the first page. With 895 variables, the describe output for the GSS is very long. Remember you can press 'q' or click on the red stop sign button to have Stata quit what it is doing.
A few highlights of this output:
- This data set has 254 observations, which in this case means 254 people who responded to the General Social Survey. It is a subset of the complete GSS results.
- It has 895 variables.
- The variable name is what you need to use in your commands.
- The variable label can help you understand what each variable means, though it's no substitute for the complete GSS documentation.
- All of these variables have something in the value label column. Commands like tab will show you the value labels by default, but code must refer to the actual values.
If you want information about a specific variable, put its name right after describe:
describe sex
Produces:
storage display value variable name type format label variable label ------------------------------------------------------------------ sex byte %8.0g SEX RESPONDENTS SEX
With so many variables, it can be hard to find what you need in the GSS. One useful trick:
describe *edu*
This will describe all variables that contain "edu" anywhere in their name. The output is:
storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------------------ sei10educ double %12.0g LABB Percentage of some college educ in OCC10 based on ACS 2010 spsei10educ double %12.0g LABB Percentage of some college educ in SPOCC10 based on ACS 2010 pasei10educ double %12.0g LABB Percentage of some college educ in PAOCC10 based on ACS 2010 masei10educ double %12.0g LABB Percentage of some college educ in MAOCC10 based on ACS 2010 coneduc byte %8.0g LABAB CONFIDENCE IN EDUCATION educ byte %8.0g LABAJ HIGHEST YEAR OF SCHOOL COMPLETED immeduc byte %8.0g IMMEDUC LEGAL IMMIGRANTS SHOULD HAVE SAME EDUCATION AS AMERICANS inteduc byte %8.0g INTEDUC INTERESTED IN LOCAL SCHOOL ISSUES maeduc byte %8.0g LABAJ HIGHEST YEAR SCHOOL COMPLETED, MOTHER nateduc byte %8.0g LABBL IMPROVING NATIONS EDUCATION SYSTEM nateducy byte %8.0g LABBL EDUCATION -- VERSION Y paeduc byte %8.0g LABAJ HIGHEST YEAR SCHOOL COMPLETED, FATHER sexeduc byte %8.0g SEXEDUC SEX EDUCATION IN PUBLIC SCHOOLS speduc byte %8.0g LABAJ HIGHEST YEAR SCHOOL COMPLETED, SPOUSE usedup byte %8.0g USEDUP HOW OFTEN DURING PAST MONTH R FELT USED UP
This is not a complete list of variables related to education, and includes one variable that is not related to education, usedup. But if you're interested in looking at education issues using the GSS it's a start.
Complete Do File
The following is a complete do file for this section.
capture log close
log using desc.log, replace
clear all
set more off
use gss_sample
describe
describe sex
describe *edu*
log close
Last Revised: 7/18/2016