--- title: 'R for Researchers: Diagnostic solutions' date: "April 2015" --- This article contains solutions to exercises for an article in the series R for Researchers. For a list of topics covered by this series, see the [Introduction](RFR_Introduction.html) article. If you\'re new to R we highly recommend reading the articles in order. There is often more than one approach to the exercises. Do not be concerned if your approach is different than the solution provided. These solutions require the solutions from the prior lesson be run in your R session. ```{r, echo=FALSE, results="hide", message=FALSE, warning=FALSE, fig.show='hide'} source("Scripts/RFR_alfalfa_DP.R") source("Scripts/RFR_alfalfa_DE.R") source("Scripts/RFR_alfalfa_Reg.R") ``` #### Exercise solutions These exercises use the alfalfa dataset and the work you started on the alfAnalysis script. Open the script and run all the commands in the script to prepare your session for these problems. Note, we will use the shade and irrig variable as continuous variables for these exercise. They could also be considered as factor variables. Since both represent increasing levels we first try to use them as scale. Use the model you selected as the best model from the prior exercises. 1. Use plot to generate the prepared diagnostic plots. ```{r, comment=NA } plot(out5) ``` 2. Create a data.frame which includes the model variables as well as the fitted, residuals, Cook\'s distance, and leverage. ```{r, comment=NA } out5Diag <- alfalfa[,c("irrig","inoc","shade","yield")] out5Diag$fit <- fitted(out5) out5Diag$res <- rstudent(out5) out5Diag$cooks <- cooks.distance(out5) out5Diag$lev <- hatvalues(out5) str(out5Diag) ``` 3. Reshape the data.frame from problem 3 to tall form. ```{r, comment=NA } out5DiagNum <- out5Diag for(i in colnames(out5DiagNum)) { out5DiagNum[,i] <- as.numeric(out5DiagNum[,i]) } out5DiagT <- reshape(out5Diag, varying=c("irrig","inoc","shade"), v.names="varVal", timevar="variable", times=c("irrig","inoc","shade"), drop=c("yield","fit"), direction="long" ) str(out5DiagT) ``` 4. Plot Cook\'s distance verse the model variables faceted by the model variables. ```{r, comment=NA } ggplot(out5DiagT, aes(x=varVal, y=cooks) ) + geom_point() + facet_wrap(~variable, scales="free_x") + theme_bw() + theme(strip.background = element_rect(fill = "White")) ``` 5. Rerun the model with the observation with the highest Cook\'s distance removed. ```{r, comment=NA } out5DiagCkId <- which(out5Diag$cooks >= .5) out5DiagCkId out5Ck <- lm(yield~irrig+inoc+shade, data=alfalfa[-c(out5DiagCkId),]) summary(out5Ck) ``` 6. Compare the changes in the model coefficients. ```{r, comment=NA } out5CoefDiff <- (coef(out5Ck) -coef(out5) ) / sqrt(diag(vcov(out5))) names(out5CoefDiff) <- names(coef(out5)) out5CoefDiff ``` 7. Commit your changes to AlfAnalysis. There is no code associated with the solution to this problem. Return to the [Diagnostics](RFR_Diagnostics.html) article. Last Revised: 3/2/2015