SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

4.6 Results

4.6.1 summary() function

The summary() function provides a nice summary of a model object. You could also use the str() function to see the details of what is included in the model object.

The following examples display the summary for the three models created above.

  • Displaying the summary of the linear model from above.

    lm(formula = Reaction ~ Days + Subject, data = sleepstudy)
         Min       1Q   Median       3Q      Max 
    -100.540  -16.389   -0.341   15.215  131.159 
                 Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  295.0310    10.4471  28.240  < 2e-16 ***
    Days          10.4673     0.8042  13.015  < 2e-16 ***
    Subject309  -126.9008    13.8597  -9.156 2.35e-16 ***
    Subject310  -111.1326    13.8597  -8.018 2.07e-13 ***
    Subject330   -38.9124    13.8597  -2.808 0.005609 ** 
    Subject331   -32.6978    13.8597  -2.359 0.019514 *  
    Subject332   -34.8318    13.8597  -2.513 0.012949 *  
    Subject333   -25.9755    13.8597  -1.874 0.062718 .  
    Subject334   -46.8318    13.8597  -3.379 0.000913 ***
    Subject335   -92.0638    13.8597  -6.643 4.51e-10 ***
    Subject337    33.5872    13.8597   2.423 0.016486 *  
    Subject349   -66.2994    13.8597  -4.784 3.87e-06 ***
    Subject350   -28.5311    13.8597  -2.059 0.041147 *  
    Subject351   -52.0361    13.8597  -3.754 0.000242 ***
    Subject352    -4.7123    13.8597  -0.340 0.734300    
    Subject369   -36.0992    13.8597  -2.605 0.010059 *  
    Subject370   -50.4321    13.8597  -3.639 0.000369 ***
    Subject371   -47.1498    13.8597  -3.402 0.000844 ***
    Subject372   -24.2477    13.8597  -1.750 0.082108 .  
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    Residual standard error: 30.99 on 161 degrees of freedom
    Multiple R-squared:  0.7277,    Adjusted R-squared:  0.6973 
    F-statistic: 23.91 on 18 and 161 DF,  p-value: < 2.2e-16

    The summary display starts with the call to lm which generated the model object.

    The residual summary is the five number summary for the residuals. This can be used as a quick check for skewed residuals.

    The coefficient's summary shows the estimated value, standard error, and p-value for each coefficient. The p-values are from Wald tests of each coefficient being equal to zero. For OLS models this is equivalent to an F-test of nested models with the variable of interest being removed in the nested model.

    The display ends with information on the model fit. This is the residual standard error, R squared of the model, and the F-test of the significance of the model verse the null model.

  • Displaying the summary of the GLM model from above.

    glm(formula = volunteer ~ sex + extraversion * neuroticism, family = binomial, 
        data = Cowles)
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -1.4749  -1.0602  -0.8934   1.2609   1.9978  
                              Estimate Std. Error z value Pr(>|z|)    
    (Intercept)              -2.358207   0.501320  -4.704 2.55e-06 ***
    sexmale                  -0.247152   0.111631  -2.214  0.02683 *  
    extraversion              0.166816   0.037719   4.423 9.75e-06 ***
    neuroticism               0.110777   0.037648   2.942  0.00326 ** 
    extraversion:neuroticism -0.008552   0.002934  -2.915  0.00355 ** 
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    (Dispersion parameter for binomial family taken to be 1)
        Null deviance: 1933.5  on 1420  degrees of freedom
    Residual deviance: 1897.4  on 1416  degrees of freedom
    AIC: 1907.4
    Number of Fisher Scoring iterations: 4

    The summary display for glm models includes similar call, residuls, and coefficient sections. The glm model fit summary includes dispersion, deviance, and iteration information.

4.6.2 confint() function

The confint() function can be applied to all of the above model object types. The confint() function will calculate a profiled confidence interval when it is appropriate.

The following example displays the confidence intervals for the three models created above.

  • Displaying the confidence intervals for both of the models.

                      2.5 %    97.5 %
    (Intercept)  274.399941 315.66215
    Days           8.879103  12.05547
    Subject309  -154.271100 -99.53060
    Subject310  -138.502810 -83.76231
    Subject330   -66.282660 -11.54216
    Subject331   -60.068030  -5.32753
    Subject332   -62.202010  -7.46151
    Subject333   -53.345770   1.39473
    Subject334   -74.202030 -19.46153
    Subject335  -119.434040 -64.69354
    Subject337     6.216930  60.95743
    Subject349   -93.669610 -38.92911
    Subject350   -55.901400  -1.16090
    Subject351   -79.406330 -24.66583
    Subject352   -32.082540  22.65796
    Subject369   -63.469440  -8.72894
    Subject370   -77.802310 -23.06181
    Subject371   -74.520040 -19.77954
    Subject372   -51.617950   3.12255
    Waiting for profiling to be done...
                                   2.5 %       97.5 %
    (Intercept)              -3.35652914 -1.389154923
    sexmale                  -0.46642058 -0.028694911
    extraversion              0.09374678  0.241771712
    neuroticism               0.03744357  0.185227757
    extraversion:neuroticism -0.01434742 -0.002833714

4.6.3 predict() function

The predict() function is used to determine the predicted values for a particular set of values of the regressors. The predict() function can also return the confidence interval or prediction interval with the predictions for OLS models.

  • Syntax and use of the predict() function

    predict(modelObj, newObs, interval = intervaltype, level = level, type = type)

    The modelObj is an an object returned from a regression function.

    The newObs parameter is optional. If it is not provided, the predictions will be for the observed values the model was fit to. The form of newObs is a data.frame with the same columns as used in modelObj.

    The intervaltype parameter is available for OLS models. It can be set to "none", "confidence", or "prediction". The default is none, no interval, and alternatively it can be a confidence interval or a prediction interval.

    The level parameter is the confidence or prediction level.

    The type parameter is used with geralized linear models. The default value is "link", for the linear predictor scale. It can be set to "response" for predictions on the scale of the response variable.

The following example makes predictions for each of the three models from above.

  • Predicting new observations.

    Predicting subjects at 331 at 10 days and 372 at 8 days.

    newObs <- data.frame(Days = c(10, 8),
                         Subject = c("331", "372")
    predict(mod, newObs, interval = "prediction")
           fit      lwr      upr
    1 367.0061 302.2256 431.7867
    2 354.5216 290.0925 418.9508

    Predicting a male with an 8 for extraversion and 15 for neuroticism.

    newObsGlm <- data.frame(sex = c("male"),
                            extraversion = c(8),
                            neuroticism = c(15)
    predict(modglm, newObsGlm, type = "response")

4.6.4 Extractors

Extractor functions are the preferred method for retrieving information on the model. Some commonly used extractor functions are listed below.

  • fitted()

    The fitted() function returns the predicted values for the observation in the data set used to fit the model.

  • residual()

    The residual() function returns the residual values from the fitted model.

  • hatvalues()

    The hatvalues() function returns the hat values, leverage measures, that result from fitting the model.

  • Influence measures

    The cooks.distance() and influence() functions returns the Cook's distance or a set of influence measures that resulted from fitting the model.