Supporting Statistical Analysis for Research

4.6 Results

4.6.1summary() function

The summary() function provides a nice summary of a model object. You could also use the str() function to see the details of what is included in the model object.

The following examples display the summary for the three models created above.

• Displaying the summary of the linear model from above.


Call:
lm(formula = Reaction ~ Days + Subject, data = sleepstudy)

Residuals:
Min       1Q   Median       3Q      Max
-100.540  -16.389   -0.341   15.215  131.159

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  295.0310    10.4471  28.240  < 2e-16 ***
Days          10.4673     0.8042  13.015  < 2e-16 ***
Subject309  -126.9008    13.8597  -9.156 2.35e-16 ***
Subject310  -111.1326    13.8597  -8.018 2.07e-13 ***
Subject330   -38.9124    13.8597  -2.808 0.005609 **
Subject331   -32.6978    13.8597  -2.359 0.019514 *
Subject332   -34.8318    13.8597  -2.513 0.012949 *
Subject333   -25.9755    13.8597  -1.874 0.062718 .
Subject334   -46.8318    13.8597  -3.379 0.000913 ***
Subject335   -92.0638    13.8597  -6.643 4.51e-10 ***
Subject337    33.5872    13.8597   2.423 0.016486 *
Subject349   -66.2994    13.8597  -4.784 3.87e-06 ***
Subject350   -28.5311    13.8597  -2.059 0.041147 *
Subject351   -52.0361    13.8597  -3.754 0.000242 ***
Subject352    -4.7123    13.8597  -0.340 0.734300
Subject369   -36.0992    13.8597  -2.605 0.010059 *
Subject370   -50.4321    13.8597  -3.639 0.000369 ***
Subject371   -47.1498    13.8597  -3.402 0.000844 ***
Subject372   -24.2477    13.8597  -1.750 0.082108 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 30.99 on 161 degrees of freedom
Multiple R-squared:  0.7277,    Adjusted R-squared:  0.6973
F-statistic: 23.91 on 18 and 161 DF,  p-value: < 2.2e-16

The summary display starts with the call to lm which generated the model object.

The residual summary is the five number summary for the residuals. This can be used as a quick check for skewed residuals.

The coefficient's summary shows the estimated value, standard error, and p-value for each coefficient. The p-values are from Wald tests of each coefficient being equal to zero. For OLS models this is equivalent to an F-test of nested models with the variable of interest being removed in the nested model.

The display ends with information on the model fit. This is the residual standard error, R squared of the model, and the F-test of the significance of the model verse the null model.

• Displaying the summary of the GLM model from above.

summary(modglm)

Call:
glm(formula = volunteer ~ sex + extraversion * neuroticism, family = binomial,
data = Cowles)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.4749  -1.0602  -0.8934   1.2609   1.9978

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)              -2.358207   0.501320  -4.704 2.55e-06 ***
sexmale                  -0.247152   0.111631  -2.214  0.02683 *
extraversion              0.166816   0.037719   4.423 9.75e-06 ***
neuroticism               0.110777   0.037648   2.942  0.00326 **
extraversion:neuroticism -0.008552   0.002934  -2.915  0.00355 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1933.5  on 1420  degrees of freedom
Residual deviance: 1897.4  on 1416  degrees of freedom
AIC: 1907.4

Number of Fisher Scoring iterations: 4

The summary display for glm models includes similar call, residuls, and coefficient sections. The glm model fit summary includes dispersion, deviance, and iteration information.

4.6.2confint() function

The confint() function can be applied to all of the above model object types. The confint() function will calculate a profiled confidence interval when it is appropriate.

The following example displays the confidence intervals for the three models created above.

• Displaying the confidence intervals for both of the models.

confint(mod)
                  2.5 %    97.5 %
(Intercept)  274.399941 315.66215
Days           8.879103  12.05547
Subject309  -154.271100 -99.53060
Subject310  -138.502810 -83.76231
Subject330   -66.282660 -11.54216
Subject331   -60.068030  -5.32753
Subject332   -62.202010  -7.46151
Subject333   -53.345770   1.39473
Subject334   -74.202030 -19.46153
Subject335  -119.434040 -64.69354
Subject337     6.216930  60.95743
Subject349   -93.669610 -38.92911
Subject350   -55.901400  -1.16090
Subject351   -79.406330 -24.66583
Subject352   -32.082540  22.65796
Subject369   -63.469440  -8.72894
Subject370   -77.802310 -23.06181
Subject371   -74.520040 -19.77954
Subject372   -51.617950   3.12255
confint(modglm)
Waiting for profiling to be done...
                               2.5 %       97.5 %
(Intercept)              -3.35652914 -1.389154923
sexmale                  -0.46642058 -0.028694911
extraversion              0.09374678  0.241771712
neuroticism               0.03744357  0.185227757
extraversion:neuroticism -0.01434742 -0.002833714

4.6.3predict() function

The predict() function is used to determine the predicted values for a particular set of values of the regressors. The predict() function can also return the confidence interval or prediction interval with the predictions for OLS models.

• Syntax and use of the predict() function

predict(modelObj, newObs, interval = intervaltype, level = level, type = type)

The modelObj is an an object returned from a regression function.

The newObs parameter is optional. If it is not provided, the predictions will be for the observed values the model was fit to. The form of newObs is a data.frame with the same columns as used in modelObj.

The intervaltype parameter is available for OLS models. It can be set to "none", "confidence", or "prediction". The default is none, no interval, and alternatively it can be a confidence interval or a prediction interval.

The level parameter is the confidence or prediction level.

The type parameter is used with geralized linear models. The default value is "link", for the linear predictor scale. It can be set to "response" for predictions on the scale of the response variable.

The following example makes predictions for each of the three models from above.

• Predicting new observations.

Predicting subjects at 331 at 10 days and 372 at 8 days.

newObs <- data.frame(Days = c(10, 8),
Subject = c("331", "372")
)
predict(mod, newObs, interval = "prediction")
       fit      lwr      upr
1 367.0061 302.2256 431.7867
2 354.5216 290.0925 418.9508

Predicting a male with an 8 for extraversion and 15 for neuroticism.

newObsGlm <- data.frame(sex = c("male"),
extraversion = c(8),
neuroticism = c(15)
)
predict(modglm, newObsGlm, type = "response")
        1
0.3462704 

4.6.4 Extractors

Extractor functions are the preferred method for retrieving information on the model. Some commonly used extractor functions are listed below.

• fitted()

The fitted() function returns the predicted values for the observation in the data set used to fit the model.

• residual()

The residual() function returns the residual values from the fitted model.

• hatvalues()

The hatvalues() function returns the hat values, leverage measures, that result from fitting the model.

• Influence measures

The cooks.distance() and influence() functions returns the Cook's distance or a set of influence measures that resulted from fitting the model.