๐ Correlation: Measuring Relationships Between Variables
In many real-world problems, variables influence each other. Correlation helps us determine whether two variables tend to move together and how strong that relationship is.
๐ฏ Why Correlation is Important
Many scientific and business questions involve relationships between variables.
Examples:
- Does more study time increase exam scores?
- Do higher temperatures increase ice cream sales?
- Does advertising spending increase sales revenue?
- Does training data size improve machine learning accuracy?
๐ Types of Relationships Between Variables
1๏ธโฃ Positive Correlation
Both variables increase or decrease together.
Example:
- Study hours โ โ Exam scores โ
- Advertising budget โ โ Sales โ
2๏ธโฃ Negative Correlation
One variable increases while the other decreases.
Example:
- Product price โ โ Demand โ
- Exercise time โ โ Body fat โ
3๏ธโฃ No Correlation
The variables have no clear relationship.
Example:
- Shoe size vs intelligence
- Hair color vs exam scores
๐ Visualizing Correlation โ Scatter Plot
The most common way to visualize relationships between two variables is using a scatter plot.
- X-axis โ Independent variable
- Y-axis โ Dependent variable
Each point on the graph represents one observation.
๐ Pearson Correlation Coefficient
The strength of correlation is measured using the Pearson correlation coefficient.
\[ r = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})} {\sqrt{\sum (x_i-\bar{x})^2 \sum (y_i-\bar{y})^2}} \]
Where:
- r = correlation coefficient
- xแตข = value of variable X
- yแตข = value of variable Y
- xฬ = mean of X
- ศณ = mean of Y
๐ Range of Correlation Values
| Correlation Value | Interpretation |
|---|---|
| +1 | Perfect positive correlation |
| 0.7 to 0.9 | Strong positive correlation |
| 0.3 to 0.7 | Moderate correlation |
| 0 | No correlation |
| -0.3 to -0.7 | Moderate negative correlation |
| -0.7 to -1 | Strong negative correlation |
| -1 | Perfect negative correlation |
๐ Example โ Study Time vs Exam Score
Suppose we collect the following data:
| Study Hours | Exam Score |
|---|---|
| 2 | 50 |
| 4 | 60 |
| 6 | 70 |
| 8 | 80 |
| 10 | 90 |
When plotted on a scatter plot, these points show a strong upward pattern.
โ ๏ธ Correlation Does Not Imply Causation
A common mistake is assuming that correlation means one variable causes the other.
Example:
- Ice cream sales and drowning incidents are correlated.
- But both increase because of hot weather.
๐ Correlation in Machine Learning
Correlation is widely used in machine learning and data science.
- Feature selection
- Detecting redundant variables
- Understanding relationships in datasets
- Building predictive models
- Reducing dimensionality
๐ง Key Insights
- Correlation measures relationships between variables.
- Values range from โ1 to +1.
- Scatter plots help visualize relationships.
- Correlation does not prove causation.
- Correlation analysis is essential in machine learning and predictive modeling.