Model selection and overfitting

Lecture 18

Author
Affiliation

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Fall 2024

Published

November 5, 2024

Warm-up

While you wait…

  • Go to your ae project in RStudio.

  • Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.

  • If you missed class last Thursday, pull to get today’s application exercise file: ae-15-modeling-loans.qmd.

  • Make sure you’ve completed the “Get to know the data” section of your AE.

Announcements

  • My office hours this week:
    • I’ll hold them at a modified time: 1-2 pm on Wednesday at Old Chem 213 (in place of Dav’s office hours)
    • Dav will fill in for me 2-4 pm on Wednesday at Old Chem 203B
  • Make sure you’re caught up with prepare materials before Thursday’s class

Reminders

What is the difference between \(R^2\) and adjusted \(R^2\)?

  • \(R^2\):

    • Proportion of variability in the outcome explained by the model.

    • Useful for quantifying the fit of a given model.

  • Adjusted \(R^2\):

    • Proportion of variability in the outcome explained by the model, with a penalty added for the number of predictors in the model.

    • Useful for comparing models.

Application exercise

Finish up Thursday’s AE

  • Go to your ae project in RStudio.

  • Get back to working on ae-15-modeling-loans

Goals:

  • Review prediction and interpretation of model results

  • Review main and interaction effects models

  • Discuss model selection further

Recap

  • What is the practical difference between a model with parallel and non-parallel lines?

  • What is the definition of R-squared?

  • Why do we choose models based on adjusted R-squared and not R-squared?