๐ Residual Analysis โ Understanding Prediction Errors
No predictive model is perfect. Even the best regression model will produce some prediction errors. Residual analysis helps us study these errors and determine whether our model is reliable.
๐ฏ What is a Residual?
A residual is the difference between the observed value and the predicted value from the regression model.
\[ Residual = Actual\ Value - Predicted\ Value \]
\[ e_i = y_i - \hat{y}_i \]
Where:
- yแตข = actual observed value
- ลทแตข = predicted value from the regression model
- eแตข = residual
๐ Example โ Exam Score Prediction
| Study Hours | Actual Score | Predicted Score | Residual |
|---|---|---|---|
| 2 | 52 | 50 | 2 |
| 4 | 63 | 60 | 3 |
| 6 | 68 | 70 | -2 |
| 8 | 82 | 80 | 2 |
| 10 | 88 | 90 | -2 |
Positive residual โ model underestimated Negative residual โ model overestimated
๐ Residual Plot
Residuals are often visualized using a Residual Plot.
In this plot:
- X-axis โ predicted values
- Y-axis โ residual values
๐ What Residual Patterns Tell Us
| Residual Pattern | Interpretation |
|---|---|
| Random scatter | Model fits data well |
| Curved pattern | Relationship may not be linear |
| Funnel shape | Variance not constant (heteroscedasticity) |
| Clusters | Possible missing variables |
๐ Sum of Squared Errors (SSE)
One way to evaluate residuals is by calculating the total squared prediction error.
\[ SSE = \sum (y_i - \hat{y}_i)^2 \]
The regression model attempts to minimize this value using the least squares method.
โ ๏ธ Detecting Model Problems
Residual analysis helps identify problems such as:
- Incorrect model form
- Outliers
- Non-linear relationships
- Heteroscedasticity (unequal error variance)
- Missing variables
๐ค Residuals in Machine Learning
Residual analysis is a key step in evaluating machine learning models.
- Detect overfitting
- Detect underfitting
- Evaluate prediction errors
- Improve model performance
๐ง Key Insights
- Residuals measure prediction errors.
- Residual plots help diagnose model problems.
- A good model produces residuals randomly distributed around zero.
- Residual analysis improves model reliability.
- Machine learning evaluation metrics are built upon residual error concepts.