SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

4.3 Running regressions

4.3.1 Linear models - Ordinary Least Squares models (OLS)

The lm() function fits OLS models.

  • Syntax and use of the lm() function

    lm(formula, weights=w, data=dataFrame)

    Returns a model object. This is a list of objects which result from fitting the model.

    The formula parameter is of the form described above.

    The data parameter is optional. dataFrame specifies the data.frame which contains the response and/or regressors to be fit. R will look in the current environment for variables which are not found in the data.frame.

    The weights parameter is optional. When present, a weighted fit is done using the w vector as the weights.

The following example uses lm() to fit the sleepstudy data from the lme4 package.

  1. Fit an OLS model.

    mod <- lm(Reaction ~ Days + Subject, data = sleepstudy)

4.3.2 Generalized linear models (GLM)

The glm() function fits a GLM model. The family parameter specifies the variance and link functions which are used in the model fit. The variance function specifies the relationship of the variance to the mean. The link defines the transformation to be applied to the response variable. As an example, the family poisson uses the "log" link function and "\(\mu\)" as the variance function. A GLM model is defined by both the formula and the family.

The default link function (the canonical link) for a family can be changed by specifying a link in the family function. For example, if the response variable is non negative and the variance is proportional to the mean, you migh use the "log" link with the "quasipoisson" family function. This would be specified as

family = quasipoisson()

The variance function is specified by the family and log is the canonical link for poisson.

  • Syntax and use of the glm() function

    glm(formula, family = family, weights = w, data = dataFrame)

    Returns a model object. This is a list of objects which result from fitting the model.

    The formula, dataFrame, and w parameters are as defined for the lm() function above.

    The family parameter defines the family used by glm(). Some common families are binomial, poisson, gaussian, quasi, quasibinomial, and quasipoisson.

The following example uses glm() to fit the Cowles data set from the car package.

  1. Fit an GLM model.

    modglm <- glm(volunteer ~ sex + extraversion * neuroticism,
                  family = binomial,
                  data = Cowles
              )