📊 Plots for the Relationship Between Two Variables

In statistics, we often want to know whether two variables are related (associated) and how they are related.

Graphs help us visualize relationships between variables clearly and quickly.

The type of graph we use depends on whether the variables are:

  • 📁 Categorical (labels or groups)
  • 🔢 Numerical (numbers that can be measured)

🧩 Types of Variable Combinations

There are three common combinations:

  • 1️⃣ One Categorical + One Numerical Variable
  • 2️⃣ Two Categorical Variables
  • 3️⃣ Two Numerical Variables

1️⃣ One Categorical and One Numerical Variable

Best Plot: Side-by-Side Box Plots

These plots compare the distribution of numerical values across categories.

📌 Example (Association Present)

Variable 1 (Categorical): Skin cancer → Yes / No
Variable 2 (Numerical): Hours spent in the sun per day

If people with skin cancer tend to spend more hours in the sun, their box plot will appear higher than the group without skin cancer.

This visual difference shows an association between the variables.

📌 Example (No Association)

Variable 1: Biological Sex → Male / Female
Variable 2: Body Temperature

Both box plots would look very similar because body temperature does not depend on biological sex.

If box plots look similar → No relationship If box plots look different → Possible relationship

2️⃣ Two Categorical Variables

These plots compare proportions or percentages across categories.

Best Plots:
  • Side-by-Side Bar Charts
  • Stacked Bar Charts
  • Mosaic Plots

📊 A. Side-by-Side Bar Chart

Example:

  • Variable 1: Smoking → Smoker / Non-smoker
  • Variable 2: Lung Cancer → Yes / No

We draw bars showing the percentage of lung cancer cases in smokers and non-smokers.

If smokers have a much higher percentage, the bars will look noticeably different, showing an association.

📊 B. Stacked Bar Chart

Similar to side-by-side bars, but categories are stacked on top of each other to form a full bar representing 100%.

This helps compare proportions within each group.

📊 C. Mosaic Plot (Most Informative)

A mosaic plot is like a stacked bar chart but also shows how large each group is.

  • Bar height → percentage within each group
  • Bar width → size of each group
Mosaic plots show both proportions and group sizes in one graph.

📌 No Association in Mosaic Plot

If patterns look similar across categories, the plot forms a cross-like shape, indicating no relationship.

3️⃣ Two Numerical Variables

Best Plot: Scatter Plot (XY Plot)

A scatter plot shows how two numerical variables change together.

  • X-axis → First variable
  • Y-axis → Second variable
  • Each point → One observation

📌 Example (Association Present)

Variable 1: Age of children
Variable 2: Height

As age increases, height also increases. Points form an upward pattern.

This shows a positive association.

📌 Example (No Association)

Variable 1: Age of adults (20–65 years)
Variable 2: Height

Points scatter randomly with no clear pattern because adult height does not depend on age.

Random cloud of points → No association

🔎 Types of Relationships in Scatter Plots

  • Positive Relationship: Both variables increase together
  • Negative Relationship: One increases while the other decreases
  • No Relationship: No visible pattern

🌍 Real-Life Applications

  • 📚 Study time vs Exam scores
  • 🏃 Exercise hours vs Calories burned
  • 💰 Income level vs Spending habits
  • 🌡️ Temperature vs Ice cream sales
  • 🚬 Smoking vs Health risks

🧠 Summary Table

Variable Types Best Plot Purpose
Categorical + Numerical Side-by-side Box Plot Compare distributions
Categorical + Categorical Bar Charts / Mosaic Plot Compare proportions
Numerical + Numerical Scatter Plot Show correlation/association

✅ Key Takeaways

  • Plots help us visually understand relationships
  • The type of plot depends on variable types
  • Visual patterns suggest possible associations
  • No clear pattern usually means no relationship
  • Graphs help before doing detailed calculations
Choosing the correct plot makes relationships between variables easier to understand.