📘 Correlation: Measuring Relationships Between Variables

Correlation measures the strength and direction of the relationship between two numerical variables.

In many real-world problems, variables influence each other. Correlation helps us determine whether two variables tend to move together and how strong that relationship is.

🎯 Why Correlation is Important

Many scientific and business questions involve relationships between variables.

Examples:

Does more study time increase exam scores?
Do higher temperatures increase ice cream sales?
Does advertising spending increase sales revenue?
Does training data size improve machine learning accuracy?

Correlation helps quantify relationships between variables.

📊 Types of Relationships Between Variables

1️⃣ Positive Correlation

Both variables increase or decrease together.

Example:

Study hours ↑ → Exam scores ↑
Advertising budget ↑ → Sales ↑

2️⃣ Negative Correlation

One variable increases while the other decreases.

Example:

Product price ↑ → Demand ↓
Exercise time ↑ → Body fat ↓

3️⃣ No Correlation

The variables have no clear relationship.

Example:

Shoe size vs intelligence
Hair color vs exam scores

📈 Visualizing Correlation — Scatter Plot

The most common way to visualize relationships between two variables is using a scatter plot.

X-axis → Independent variable
Y-axis → Dependent variable

Each point on the graph represents one observation.

Scatter plots help identify patterns between variables.

📐 Pearson Correlation Coefficient

The strength of correlation is measured using the Pearson correlation coefficient.

\[ r = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})} {\sqrt{\sum (x_i-\bar{x})^2 \sum (y_i-\bar{y})^2}} \]

Where:

r = correlation coefficient
xᵢ = value of variable X
yᵢ = value of variable Y
x̄ = mean of X
ȳ = mean of Y

📊 Range of Correlation Values

Correlation Value	Interpretation
+1	Perfect positive correlation
0.7 to 0.9	Strong positive correlation
0.3 to 0.7	Moderate correlation
0	No correlation
-0.3 to -0.7	Moderate negative correlation
-0.7 to -1	Strong negative correlation
-1	Perfect negative correlation

🔍 Example — Study Time vs Exam Score

Suppose we collect the following data:

Study Hours	Exam Score
2	50
4	60
6	70
8	80
10	90

When plotted on a scatter plot, these points show a strong upward pattern.

The correlation coefficient will be close to +1, indicating strong positive correlation.

⚠️ Correlation Does Not Imply Causation

A common mistake is assuming that correlation means one variable causes the other.

Example:

Ice cream sales and drowning incidents are correlated.
But both increase because of hot weather.

Correlation measures association, not causation.

📊 Correlation in Machine Learning

Correlation is widely used in machine learning and data science.

Feature selection
Detecting redundant variables
Understanding relationships in datasets
Building predictive models
Reducing dimensionality

Highly correlated features may cause multicollinearity in ML models.

🧠 Key Insights

Correlation measures relationships between variables.
Values range from −1 to +1.
Scatter plots help visualize relationships.
Correlation does not prove causation.
Correlation analysis is essential in machine learning and predictive modeling.

Correlation — Measuring Relationships Between Variables

📘 Correlation: Measuring Relationships Between Variables

🎯 Why Correlation is Important

📊 Types of Relationships Between Variables

1️⃣ Positive Correlation

2️⃣ Negative Correlation

3️⃣ No Correlation

📈 Visualizing Correlation — Scatter Plot

📐 Pearson Correlation Coefficient

📊 Range of Correlation Values

🔍 Example — Study Time vs Exam Score

⚠️ Correlation Does Not Imply Causation

📊 Correlation in Machine Learning

🧠 Key Insights

Recommended Posts