broomTo practice the new tests we’ve learned, we’ll use the built-in mtcars dataset.
What is the correlation between horsepower and weight? Is it significant?
mtcars %$% cor.test(hp, wt) %>% tidy()## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr> 
## 1    0.659      4.80 4.15e-5        30    0.403     0.819 Pears…
## # ... with 1 more variable: alternative <chr>Plot a scatterplot of the rear axle ratio vs. fuel displacement. Does the relationship look linear?
ggplot(mtcars, aes(x = drat, y = disp)) + 
  geom_point(size = 3) +
  labs(x = 'Rear axle ratio', y = 'Displacement (cu. in.)') +
  theme_classic() Since the relationship looks approximately linear, test it with lm()
lm(disp ~ drat, data = mtcars) %>% tidy()## # A tibble: 2 x 5
##   term        estimate std.error statistic      p.value
##   <chr>          <dbl>     <dbl>     <dbl>        <dbl>
## 1 (Intercept)     823.     108.       7.60 0.0000000179
## 2 drat           -165.      29.8     -5.53 0.00000528Do an ANOVA of quarter mile time over the number of cylinders in the the engine.
aov(qsec ~ cyl, data = mtcars) %>% tidy()## # A tibble: 2 x 6
##   term         df sumsq meansq statistic   p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>     <dbl>
## 1 cyl           1  34.6  34.6       16.1  0.000366
## 2 Residuals    30  64.4   2.15      NA   NANow take the same ANOVA from the chunk above and add a post-hoc Tukey test to it.
aov(qsec ~ as.factor(cyl), data = mtcars) %>% TukeyHSD() %>% tidy()## # A tibble: 3 x 6
##   term           comparison estimate conf.low conf.high adj.p.value
##   <chr>          <chr>         <dbl>    <dbl>     <dbl>       <dbl>
## 1 as.factor(cyl) 6-4           -1.16    -2.94     0.619     0.257  
## 2 as.factor(cyl) 8-4           -2.37    -3.85    -0.883     0.00133
## 3 as.factor(cyl) 8-6           -1.20    -2.91     0.498     0.205Pick one of the three tests you did above and practice tidying it up with the three broom functions!
tidy()Add tidy() onto end of your test.
aov(qsec ~ cyl, data = mtcars) %>% tidy()## # A tibble: 2 x 6
##   term         df sumsq meansq statistic   p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>     <dbl>
## 1 cyl           1  34.6  34.6       16.1  0.000366
## 2 Residuals    30  64.4   2.15      NA   NAglance()Add glance() onto the end of your test to look at the model parameters.
aov(cyl ~ qsec, data = mtcars) %>% glance()## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
## *     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.350         0.328  1.46      16.1 3.66e-4     2  -56.6  119.  124.
## # ... with 2 more variables: deviance <dbl>, df.residual <int>augment()Add the results of the test back to mtcars using augment()
aov(cyl ~ qsec, data = mtcars) %>% augment(mtcars)## # A tibble: 32 x 19
##    .rownames   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear
##  * <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Mazda RX4  21       6  160    110  3.9   2.62  16.5     0     1     4
##  2 Mazda RX…  21       6  160    110  3.9   2.88  17.0     0     1     4
##  3 Datsun 7…  22.8     4  108     93  3.85  2.32  18.6     1     1     4
##  4 Hornet 4…  21.4     6  258    110  3.08  3.22  19.4     1     0     3
##  5 Hornet S…  18.7     8  360    175  3.15  3.44  17.0     0     0     3
##  6 Valiant    18.1     6  225    105  2.76  3.46  20.2     1     0     3
##  7 Duster 3…  14.3     8  360    245  3.21  3.57  15.8     0     0     3
##  8 Merc 240D  24.4     4  147.    62  3.69  3.19  20       1     0     4
##  9 Merc 230   22.8     4  141.    95  3.92  3.15  22.9     1     0     4
## 10 Merc 280   19.2     6  168.   123  3.92  3.44  18.3     1     0     4
## # ... with 22 more rows, and 8 more variables: carb <dbl>, .fitted <dbl>,
## #   .se.fit <dbl>, .resid <dbl>, .hat <dbl>, .sigma <dbl>, .cooksd <dbl>,
## #   .std.resid <dbl>Use the augmented table from the chunk above to plot something.
aov(cyl ~ qsec, data = mtcars) %>% augment(mtcars) %>%
  ggplot(aes(x = qsec, y = .resid, color = cyl)) +
  geom_point()Use the tests talked about this week, cor(), cor.test(), lm(), aov(), and/or TukeyHSD() to ask two more questions about the mtcars dataset.
Look at mtcars again. What’s another question you could ask about it? Plot the variable(s) you’re interested in below
ggplot(mtcars, aes(x = mpg, color = as.factor(vs))) +
  geom_density(size = 1) +
  labs(x = 'miles per gallon', y = '') + 
  scale_color_discrete(name = 'engine ehape', labels = c('v-shaped', 'straight')) +
  theme_classic()Based on the variables you just plotted, what’s your question?
Does engine shape affect the number of miles per gallon a car gets?
Pick one of the tests to answer your question and run it in the chunk below.
aov(mpg ~ as.factor(vs), data = mtcars) %>% tidy()## # A tibble: 2 x 6
##   term             df sumsq meansq statistic    p.value
##   <chr>         <dbl> <dbl>  <dbl>     <dbl>      <dbl>
## 1 as.factor(vs)     1  497.  497.       23.7  0.0000342
## 2 Residuals        30  630.   21.0      NA   NAWhat do you conclude from your test?
With a p-value of 0.0000342, it’s unlikely that the difference in mpg between the different engine types is due to chance.
Look at mtcars again. What’s another question you could ask about it? Plot the variable(s) you’re interested in below
ggplot(mtcars, aes(x = disp, y = hp)) +
  geom_point(size = 3) +
  labs(x = 'displacement (cu. in.)', y = 'gross horsepower') +
  theme_classic()Based on the variables you just plotted, what’s your question?
Can displacement predict horsepower?
Pick one of the tests to answer your question and run it in the chunk below.
lm(hp ~ disp, data = mtcars) %>% tidy()## # A tibble: 2 x 5
##   term        estimate std.error statistic      p.value
##   <chr>          <dbl>     <dbl>     <dbl>        <dbl>
## 1 (Intercept)   45.7     16.1         2.84 0.00811     
## 2 disp           0.438    0.0618      7.08 0.0000000714What do you conclude from your test?
With a p-value of 0.00811 for the y-intercept and a p-value of 7.14e-8 for the slope, displacement can make a good prediction of the car’s horsepower.