๐ Linear Regression โ Foundations of Predictive Modeling
It helps us understand how one variable changes when another variable changes, and it allows us to make predictions based on that relationship.
๐ฏ Purpose of Linear Regression
Linear regression is used to answer questions such as:
- How does study time affect exam scores?
- How does advertising spending influence sales?
- How does house size affect house price?
- How does training data size affect ML model accuracy?
๐ Variables in Regression
Regression analysis involves two types of variables:
1๏ธโฃ Independent Variable (Predictor)
The variable used to predict or explain another variable.
Example: Study hours
2๏ธโฃ Dependent Variable (Response)
The variable we want to predict or explain.
Example: Exam score
๐ Simple Linear Regression Model
The relationship between two variables can be represented by the linear equation:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Where:
- y = Dependent variable
- x = Independent variable
- ฮฒโ = Intercept
- ฮฒโ = Slope (rate of change)
- ฮต = Random error
๐ Meaning of Regression Components
Intercept (ฮฒโ)
The value of y when x = 0.
Slope (ฮฒโ)
Represents how much y changes when x increases by one unit.
๐ Example โ Study Hours vs Exam Score
Suppose we collect the following data:
| Study Hours | Exam Score |
|---|---|
| 2 | 50 |
| 4 | 60 |
| 6 | 70 |
| 8 | 80 |
| 10 | 90 |
From the pattern we observe, the regression line becomes:
\[ Score = 40 + 5 \times StudyHours \]
This means:
- Every additional hour of study increases the score by about 5 points.
๐ Regression Line
The regression line is the best fitting straight line through the data points in a scatter plot.
It minimizes the total error between the observed data and predicted values.
๐ Least Squares Method
The regression line is calculated by minimizing the sum of squared errors.
\[ SSE = \sum (y_i - \hat{y}_i)^2 \]
Where:
- yแตข = actual value
- ลทแตข = predicted value
Squaring ensures that positive and negative errors do not cancel each other.
๐ Measuring Model Fit โ Rยฒ
The coefficient of determination (Rยฒ) measures how well the regression model explains the variation in the data.
\[ R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} \]
Rยฒ ranges from 0 to 1.
| Rยฒ Value | Interpretation |
|---|---|
| 0 | No explanatory power |
| 0.5 | Moderate explanatory power |
| 1 | Perfect prediction |
๐ค Linear Regression in Machine Learning
Linear regression is one of the most widely used algorithms in machine learning.
- House price prediction
- Stock price forecasting
- Demand prediction
- Sales forecasting
- Risk prediction
โ ๏ธ Assumptions of Linear Regression
- Linear relationship between variables
- Independence of errors
- Constant variance of errors (homoscedasticity)
- Errors are normally distributed
๐ง Key Insights
- Linear regression models relationships between variables.
- The regression line predicts outcomes.
- The slope measures how variables influence each other.
- The least squares method finds the best fitting line.
- Rยฒ measures how well the model explains the data.
- Linear regression is the foundation of many ML models.