4.4 Variable selection functions

SSCC - Social Science Computing Cooperative

Supporting Statistical Analysis for Research

R supports a number of commonly used criteria for selecting variables. These include BIC, AIC, F-tests, likelihood ratio tests and adjusted R squared. Adjusted R squared is returned in the summary of the model object and will be cover with the summary() function below.

The drop1() function compares all possible models that can be constructed by dropping a single model term. The add1() function compares all possible models that can be constructed by adding a term. The step() function does repeated drop1() and add1() until the optimal AIC value is reached.

Syntax for the drop1(), add1(), and step() functions.

drop1(modelObj, test = test)
add1(modelObj, scope = scope, test = test) step(modelObj, scope = scope, test = test, direction = direction )

The scope parameter is a formula that provides the largest model to consider for adding a term.

The test parameter can be set to "F" or "LRT". The default test is AIC.

The direction parameter can be "both", "backward", or "forward".

The following example does an F-test of the terms of the OLS model from above and a likelihood ratio test for several possible terms to the GLM model from above.

Using drop1() and add1().

drop1(mod, test = "F")

Single term deletions

Model:
Reaction ~ Days + Subject
        Df Sum of Sq    RSS    AIC F value    Pr(>F)    
<none>               154634 1254.0                      
Days     1    162703 317336 1381.5 169.401 < 2.2e-16 ***
Subject 17    250618 405252 1393.5  15.349 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

add1(modglm, scope = ~ (sex + extraversion + neuroticism)^2, test = "LRT")

Single term additions

Model:
volunteer ~ sex + extraversion * neuroticism
                 Df Deviance    AIC      LRT Pr(>Chi)
<none>                1897.4 1907.4                  
sex:extraversion  1   1897.2 1909.2 0.232369   0.6298
sex:neuroticism   1   1897.4 1909.4 0.008832   0.9251

The anova() function is similar to the drop1() and add1() functions. The difference is that anova takes as parameters the models to be compared instead of the models being generated by the function. That is, to use anova(), the models must be built by the user first.

The AIC() and BIC() functions provide the AIC and BIC values for a model. The user can then compare these values to the values from other models being considered.

Syntax for the AIC() and BIC() functions.

AIC(modelObj)
BIC(modelObj)

The following example calculates the AIC and BIC for the OLS model from above.

Getting AIC and BIC.

AIC(mod)

[1] 1766.872

BIC(mod)

[1] 1830.731