📘 Confidence Intervals — Introduction to Estimation

Estimation is the process of using sample data to approximate unknown population parameters.

In real-world situations, studying an entire population is often impossible due to time, cost, or practical limitations. Therefore, we collect data from a smaller group (sample) and use it to estimate characteristics of the whole population.

ðŸŽŊ Why Estimation is Necessary

  • Populations can be extremely large
  • Collecting complete data may be expensive
  • Some measurements are destructive (e.g., testing product strength)
  • Decisions often need to be made quickly
Estimation allows informed decision-making using limited data.

ðŸ‘Ĩ Population Parameters vs Sample Statistics

Concept Population Sample
Mean Ξ (Mu) xĖ„ (Sample Mean)
Standard Deviation σ (Sigma) s (Sample SD)
Proportion p p˂
Population parameters are fixed but usually unknown. Sample statistics are known and used to estimate parameters.

📌 Types of Estimation

1ïļâƒĢ Point Estimation

A point estimate provides a single best value to estimate a population parameter.

Examples:

  • Using sample mean to estimate population mean
  • Using sample proportion to estimate population proportion

Example:

A survey of 100 students shows average study time = 4 hours.

Point estimate of population mean study time = 4 hours

2ïļâƒĢ Interval Estimation

An interval estimate provides a range of values within which the population parameter is likely to lie.

Example:

Average study time is estimated between 3.5 and 4.5 hours.

This range accounts for sampling variability and is more reliable than a single point.

⚠ïļ Why Point Estimates Are Not Enough

Different samples produce different point estimates.

If we repeatedly sample the population, each sample mean will vary.

A single number does not show estimation uncertainty.

Illustration:

  • Sample 1 mean = 68
  • Sample 2 mean = 72
  • Sample 3 mean = 70

Which value is correct? All are possible due to sampling variability.

📏 Sampling Variability

Sampling variability refers to natural differences that occur between sample statistics.

Because samples differ, their estimates also differ.

This variability is explained by the Sampling Distribution and Central Limit Theorem studied earlier.

ðŸŽŊ Motivation for Confidence Intervals

Since point estimates vary, we need a method that:

  • Accounts for variability
  • Provides a range of plausible values
  • Indicates estimation reliability
Confidence intervals provide a scientifically justified range for population parameters.

ðŸ§Ū Example: Estimating Average Height

Suppose we measure the heights of 50 students and find:

  • Sample mean height = 170 cm

This value is a point estimate of the population mean.

However, another sample may produce 168 cm or 172 cm.

Therefore, we estimate the population mean height to lie within a range, not at a single value.

🧠 Key Insights

  • Estimation uses samples to learn about populations
  • Population parameters are unknown but fixed
  • Sample statistics vary from sample to sample
  • Point estimates provide single values
  • Interval estimates provide ranges
  • Interval estimates are more reliable

ðŸĪ– Why Estimation Matters in Machine Learning

  • Estimating true model accuracy
  • Estimating population error rates
  • Parameter tuning
  • Performance reliability measurement
ML models rely on estimates because true population performance is unknown.