R for Researchers: Regression (OLS) solutions

April 2015

This article contains solutions to exercises for an article in the series R for Researchers. For a list of topics covered by this series, see the Introduction article. If you're new to R we highly recommend reading the articles in order.

There is often more than one approach to the exercises. Do not be concerned if your approach is different than the solution provided.

These solutions require the solutions from the prior lesson be run in your R session.

Exercise solutions

These exercises use the alfalfa dataset and the work you started on the alfAnalysis script. Open the script and run all the commands in the script to prepare your session for these problems.

Note, we will use the shade and irrig variable as continuous variables for these exercise. They could also be considered as factor variables. Since both represent increasing levels we first try to use them as scale.

  1. Set the the reference level of the inoc variable to cntrl.

    #######################################################
    #######################################################
    ##
    ##   Regression
    ##
    #######################################################
    #######################################################
    
    
    str(alfalfa$inoc)
     Factor w/ 5 levels "A","B","C","cntrl",..: 1 2 5 3 4 5 4 2 1 3 ...
    alfalfa$inoc <- factor(alfalfa$inoc,levels=c("cntrl","A","B","C","D") )
  2. Create a quadratic poly term for the shade variable.

    shade2 <- poly(alfalfa$shade, degree=2)
  3. Regress yield on the irrig, inoc, the quadratic shade term, and all their interactions.

    out <- lm(yield~(irrig+inoc+shade2)^2, data=alfalfa)
    summary(out)
    
    Call:
    lm(formula = yield ~ (irrig + inoc + shade2)^2, data = alfalfa)
    
    Residuals:
             1          2          3          4          5          6          7 
    -1.403e-02  2.053e-02 -2.149e-01 -1.712e-01  1.621e-01  6.807e-01 -3.241e-01 
             8          9         10         11         12         13         14 
     2.053e-02  5.610e-02  3.044e-01  1.141e-01 -7.523e-01 -8.415e-02  5.321e-16 
            15         16         17         18         19         20         21 
    -1.847e-01  3.241e-01  5.610e-02 -4.565e-01  2.258e-01  3.224e-01 -8.210e-02 
            22         23         24         25 
     2.092e-01 -1.621e-01 -3.582e-02 -1.403e-02 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)    32.4643     1.3937  23.294 0.000173 ***
    irrig          -1.1195     0.4020  -2.785 0.068729 .  
    inocA           4.3853     1.9397   2.261 0.108848    
    inocB          -1.1339     2.5535  -0.444 0.687085    
    inocC           1.5821     1.5433   1.025 0.380738    
    inocD           4.6810     1.6295   2.873 0.063909 .  
    shade21         4.0046     5.0646   0.791 0.486852    
    shade22        -8.5243     7.6699  -1.111 0.347454    
    irrig:inocA     0.6117     0.5686   1.076 0.360816    
    irrig:inocB     2.6848     0.8416   3.190 0.049701 *  
    irrig:inocC     1.7532     0.5001   3.505 0.039332 *  
    irrig:inocD     0.1157     0.4993   0.232 0.831676    
    irrig:shade21   2.5552     1.2161   2.101 0.126428    
    irrig:shade22   3.4764     1.8453   1.884 0.156083    
    inocA:shade21  -9.3599     5.2525  -1.782 0.172771    
    inocB:shade21  -1.4753     3.4398  -0.429 0.696927    
    inocC:shade21   4.1493     3.2650   1.271 0.293373    
    inocD:shade21  -0.5848     4.5746  -0.128 0.906373    
    inocA:shade22  -8.8399     4.1364  -2.137 0.122187    
    inocB:shade22   7.3414     7.2192   1.017 0.384063    
    inocC:shade22   0.8405     3.5126   0.239 0.826294    
    inocD:shade22  -3.5093     3.1239  -1.123 0.343060    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.8035 on 3 degrees of freedom
    Multiple R-squared:  0.9935,    Adjusted R-squared:  0.9478 
    F-statistic: 21.74 on 21 and 3 DF,  p-value: 0.01347
  4. Use the backward selection method to reduce the model. Use the significance of the term as the criteria, as was done in the lesson.

    There are two methods provided in this solution.

    step(out, test="F")
    Start:  AIC=-19.94
    yield ~ (irrig + inoc + shade2)^2
    
                   Df Sum of Sq     RSS      AIC F value  Pr(>F)  
    <none>                       1.9370 -19.9437                  
    - irrig:shade2  2    4.9536  6.8906   7.7819  3.8361 0.14904  
    - inoc:shade2   8   17.6767 19.6137  21.9338  3.4222 0.16995  
    - irrig:inoc    4   15.2447 17.1817  26.6242  5.9028 0.08823 .
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Call:
    lm(formula = yield ~ (irrig + inoc + shade2)^2, data = alfalfa)
    
    Coefficients:
      (Intercept)          irrig          inocA          inocB          inocC  
          32.4643        -1.1195         4.3853        -1.1339         1.5821  
            inocD        shade21        shade22    irrig:inocA    irrig:inocB  
           4.6810         4.0046        -8.5243         0.6117         2.6848  
      irrig:inocC    irrig:inocD  irrig:shade21  irrig:shade22  inocA:shade21  
           1.7532         0.1157         2.5552         3.4764        -9.3599  
    inocB:shade21  inocC:shade21  inocD:shade21  inocA:shade22  inocB:shade22  
          -1.4753         4.1493        -0.5848        -8.8399         7.3414  
    inocC:shade22  inocD:shade22  
           0.8405        -3.5093  
    out2 <- lm(yield~irrig+inoc+shade2+irrig:inoc+irrig:shade2,
               data=alfalfa)
    drop1(out2, test="F")
    Single term deletions
    
    Model:
    yield ~ irrig + inoc + shade2 + irrig:inoc + irrig:shade2
                 Df Sum of Sq    RSS    AIC F value Pr(>F)
    <none>                    19.614 21.934               
    irrig:inoc    4   16.5458 36.159 29.227  2.3199 0.1216
    irrig:shade2  2    1.6273 21.241 19.926  0.4563 0.6451
    out3 <- lm(yield~irrig+inoc+shade2+irrig:inoc, data=alfalfa)
    drop1(out3, test="F")
    Single term deletions
    
    Model:
    yield ~ irrig + inoc + shade2 + irrig:inoc
               Df Sum of Sq    RSS    AIC F value    Pr(>F)    
    <none>                  21.241 19.926                      
    shade2      2    63.341 84.582 50.471 19.3832 0.0001257 ***
    irrig:inoc  4    20.399 41.639 28.754  3.1211 0.0526577 .  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    out4 <- lm(yield~irrig+inoc+shade2, data=alfalfa)
    drop1(out4, test="F")
    Single term deletions
    
    Model:
    yield ~ irrig + inoc + shade2
           Df Sum of Sq     RSS    AIC F value    Pr(>F)    
    <none>               41.639 28.754                      
    irrig   1    14.797  56.436 34.356   6.041   0.02501 *  
    inoc    4   155.894 197.534 59.676  15.912 1.380e-05 ***
    shade2  2    84.328 125.967 52.429  17.214 8.196e-05 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    out5 <- lm(yield~irrig+inoc+shade, data=alfalfa)
    drop1(out5, test="F")
    Single term deletions
    
    Model:
    yield ~ irrig + inoc + shade
           Df Sum of Sq     RSS    AIC F value    Pr(>F)    
    <none>               45.576 29.013                      
    irrig   1    14.797  60.373 34.042  5.8439   0.02646 *  
    inoc    4   155.894 201.470 58.169 15.3924 1.236e-05 ***
    shade   1    80.391 125.967 52.429 31.7501 2.402e-05 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  5. Commit your changes to AlfAnalysis.

    There is no code associated with the solution to this problem.

Return to the Regression (OLS) article.

Last Revised: 3/2/2015