First Steps to Understand and Improve Your OLS Regression — Part 1

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Illustrating a basic multiple regression model for house price prediction.
Image for post
Image for post
  • Adjusted R-squared. R-squared by itself isn’t very helpful with multiple regression models as it tends to go up the more features you add to your model. In this current example, try to dummy the zipcode variable and turn each zip code into a separate independent variable. You’ll add about 70 features to your model, improving your R-squared, but not the model itself. Adjusted R-squared is the solution to this issue. It takes into account not just how many variables you add, but are these variables useful.
  • The F-Statistic. This statistic gives us information about the coefficients of our regression on a macro level. The coefficients of all your features shouldn’t jointly be equal to zero; otherwise, the model is useless. So, the F-statistic allows us to reject the Null Hypothesis that jointly all coefficients for our independent variables equal 0. In our model, the F-statistic is 1829, which is relatively high and allows us to reject the Null Hypothesis at the 5% level of significance.
  • The Probability of the F-Statistic. This is the p-value associated with that F test. If the p-value is less than your level of significance, you can reject the null hypothesis. That is the case with our model.
Image for post
Image for post

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store