4.4 Variable selection functions
R supports a number of commonly used criteria for selecting variables.
These include BIC, AIC,
F-tests, likelihood ratio tests and adjusted R squared.
Adjusted R squared is returned in the summary of the model object
and will be cover with the summary()
function below.
The drop1()
function compares all possible models
that can be constructed by dropping a single model term.
The add1()
function compares all possible models that can
be constructed by adding a term.
The step()
function does repeated drop1()
and add1()
until
the optimal AIC value is reached.
Syntax for the
drop1()
,add1()
, andstep()
functions.drop1(modelObj, test = test)
add1(modelObj, scope = scope, test = test) step(modelObj, scope = scope, test = test, direction = direction )The scope parameter is a formula that provides the largest model to consider for adding a term.
The test parameter can be set to
"F"
or"LRT"
. The default test is AIC.The direction parameter can be
"both"
,"backward"
, or"forward"
.
The following example does an F-test of the terms of the OLS model from above and a likelihood ratio test for several possible terms to the GLM model from above.
Using
drop1()
andadd1()
.drop1(mod, test = "F")
Single term deletions Model: Reaction ~ Days + Subject Df Sum of Sq RSS AIC F value Pr(>F) <none> 154634 1254.0 Days 1 162703 317336 1381.5 169.401 < 2.2e-16 *** Subject 17 250618 405252 1393.5 15.349 < 2.2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(modglm, scope = ~ (sex + extraversion + neuroticism)^2, test = "LRT")
Single term additions Model: volunteer ~ sex + extraversion * neuroticism Df Deviance AIC LRT Pr(>Chi) <none> 1897.4 1907.4 sex:extraversion 1 1897.2 1909.2 0.232369 0.6298 sex:neuroticism 1 1897.4 1909.4 0.008832 0.9251
The anova()
function is similar to the drop1()
and add1()
functions.
The difference is that anova takes as parameters the models
to be compared instead of the models being generated by
the function.
That is, to use anova()
, the models must be built by the
user first.
The AIC()
and BIC()
functions provide the
AIC and BIC values for a model.
The user can then compare these values to the
values from other models being considered.
Syntax for the
AIC()
andBIC()
functions.AIC(modelObj)
BIC(modelObj)
The following example calculates the AIC and BIC for the OLS model from above.
Getting AIC and BIC.
AIC(mod)
[1] 1766.872
BIC(mod)
[1] 1830.731