πŸ“˜ Confidence Interval for Population Mean (Οƒ Unknown)

When the population standard deviation (Οƒ) is unknown, we estimate it using the sample standard deviation (s) and use the Student’s t-distribution instead of the normal (Z) distribution.

This situation is more realistic because population variability is rarely known in practical studies.

🎯 Objective

To estimate the true population mean (ΞΌ) when population variability is unknown.

We account for extra uncertainty by using the t-distribution.

πŸ“ Why Not Use Z-Distribution?

When Οƒ is unknown, replacing it with sample standard deviation (s) introduces additional estimation error.

  • Sample standard deviation varies from sample to sample
  • This adds extra uncertainty
  • Z-distribution underestimates this uncertainty
The t-distribution adjusts for this added variability.

πŸ“Š Properties of the t-Distribution

  • Bell-shaped and symmetric (like normal distribution)
  • Has heavier tails (more spread)
  • Accounts for extra uncertainty in estimating Οƒ
  • Shape depends on Degrees of Freedom (df)
As sample size increases, the t-distribution approaches the normal distribution.

πŸ“ Degrees of Freedom

Degrees of Freedom (df) measure the number of independent values used to estimate variability.

\[ df = n - 1 \]

  • n = sample size
Smaller samples β†’ smaller df β†’ wider t-distribution β†’ larger margin of error

πŸ“Š Standard Error (Estimated)

Since Οƒ is unknown, we estimate Standard Error using sample standard deviation:

\[ SE = \frac{s}{\sqrt{n}} \]

πŸ“ Formula for Confidence Interval

\[ \bar{x} \pm t_{\alpha/2, df} \cdot \frac{s}{\sqrt{n}} \]

Where:

  • xΜ„ = Sample Mean
  • t = t-score from t-table
  • s = Sample Standard Deviation
  • n = Sample Size
  • df = n βˆ’ 1

πŸ”’ Example 1: Estimating Average Battery Life

Given:

  • Sample mean battery life = 10 hours
  • Sample standard deviation = 2 hours
  • Sample size = 25
  • Confidence level = 95%

Step 1: Degrees of Freedom

df = 25 βˆ’ 1 = 24

Step 2: Standard Error

\[ SE = \frac{2}{\sqrt{25}} = \frac{2}{5} = 0.4 \]

Step 3: t-value

From t-table for 95% confidence and df = 24:

t β‰ˆ 2.064

Step 4: Margin of Error

\[ ME = 2.064 \times 0.4 = 0.826 \]

Step 5: Construct Interval

10 Β± 0.826

Confidence Interval = (9.174 hours, 10.826 hours)

πŸ“Š Interpretation

We are 95% confident that the true average battery life lies between 9.174 and 10.826 hours.

The wider interval reflects added uncertainty from estimating Οƒ.

βš–οΈ t-Distribution vs Normal Distribution

Feature Z-Distribution t-Distribution
Population SD Known Unknown
Spread Narrower Wider
Tail Thickness Thin tails Heavy tails
Depends on df? No Yes

πŸ“ˆ When to Use t-Distribution

  • Population standard deviation unknown
  • Sample size small (n < 30)
  • Population approximately normal
  • Random and independent sampling

πŸ€– Applications in Machine Learning

  • Estimating true model performance with small validation sets
  • Evaluating uncertainty in experimental results
  • Comparing algorithms using limited data
  • Estimating real-world prediction accuracy
The t-distribution allows reliable estimation even with limited data.

🧠 Key Insights

  • Use t-distribution when Οƒ is unknown
  • Degrees of freedom control shape of distribution
  • Smaller samples β†’ larger uncertainty
  • Interval is wider than Z-interval
  • t-distribution approaches normal for large samples