๐Ÿ“˜ Linear Regression โ€” Foundations of Predictive Modeling

Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.

It helps us understand how one variable changes when another variable changes, and it allows us to make predictions based on that relationship.

๐ŸŽฏ Purpose of Linear Regression

Linear regression is used to answer questions such as:

  • How does study time affect exam scores?
  • How does advertising spending influence sales?
  • How does house size affect house price?
  • How does training data size affect ML model accuracy?
Linear regression models the relationship between variables and predicts outcomes.

๐Ÿ“Š Variables in Regression

Regression analysis involves two types of variables:

1๏ธโƒฃ Independent Variable (Predictor)

The variable used to predict or explain another variable.

Example: Study hours


2๏ธโƒฃ Dependent Variable (Response)

The variable we want to predict or explain.

Example: Exam score

๐Ÿ“ˆ Simple Linear Regression Model

The relationship between two variables can be represented by the linear equation:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where:

  • y = Dependent variable
  • x = Independent variable
  • ฮฒโ‚€ = Intercept
  • ฮฒโ‚ = Slope (rate of change)
  • ฮต = Random error

๐Ÿ“ Meaning of Regression Components

Intercept (ฮฒโ‚€)

The value of y when x = 0.

Slope (ฮฒโ‚)

Represents how much y changes when x increases by one unit.

Slope tells us the strength and direction of the relationship.

๐Ÿ” Example โ€” Study Hours vs Exam Score

Suppose we collect the following data:

Study Hours Exam Score
2 50
4 60
6 70
8 80
10 90

From the pattern we observe, the regression line becomes:

\[ Score = 40 + 5 \times StudyHours \]

This means:

  • Every additional hour of study increases the score by about 5 points.

๐Ÿ“Š Regression Line

The regression line is the best fitting straight line through the data points in a scatter plot.

It minimizes the total error between the observed data and predicted values.

The best line is found using the Least Squares Method.

๐Ÿ“ Least Squares Method

The regression line is calculated by minimizing the sum of squared errors.

\[ SSE = \sum (y_i - \hat{y}_i)^2 \]

Where:

  • yแตข = actual value
  • ลทแตข = predicted value

Squaring ensures that positive and negative errors do not cancel each other.

๐Ÿ“Š Measuring Model Fit โ€” Rยฒ

The coefficient of determination (Rยฒ) measures how well the regression model explains the variation in the data.

\[ R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} \]

Rยฒ ranges from 0 to 1.

Rยฒ Value Interpretation
0 No explanatory power
0.5 Moderate explanatory power
1 Perfect prediction

๐Ÿค– Linear Regression in Machine Learning

Linear regression is one of the most widely used algorithms in machine learning.

  • House price prediction
  • Stock price forecasting
  • Demand prediction
  • Sales forecasting
  • Risk prediction
Many advanced ML algorithms are extensions of linear regression.

โš ๏ธ Assumptions of Linear Regression

  • Linear relationship between variables
  • Independence of errors
  • Constant variance of errors (homoscedasticity)
  • Errors are normally distributed

๐Ÿง  Key Insights

  • Linear regression models relationships between variables.
  • The regression line predicts outcomes.
  • The slope measures how variables influence each other.
  • The least squares method finds the best fitting line.
  • Rยฒ measures how well the model explains the data.
  • Linear regression is the foundation of many ML models.