📊 Sampling and Population in Statistical Estimation

Statistics uses information from a sample to make reliable conclusions about a population.

In most real-world situations, studying an entire population is impractical due to limitations of time, cost, or accessibility. Sampling provides an efficient and scientifically valid alternative.

ðŸ‘Ĩ Population

A population is the complete set of individuals, objects, or measurements that share a common characteristic being studied.

Examples

  • All citizens of a country
  • All students in a university
  • All manufactured light bulbs in a factory
  • All patients with a specific disease
Population parameters describe the entire population but are usually unknown.

Common population parameters include:

  • Ξ (mu) — population mean
  • σ (sigma) — population standard deviation
  • p — population proportion

🔍 Sample

A sample is a subset of the population selected for study.

Samples are used to estimate population parameters.

Sample statistics are numerical measures computed from sample data.

Common sample statistics include:

  • xĖ„ — sample mean
  • s — sample standard deviation
  • pĖ‚ — sample proportion

ðŸŽŊ Purpose of Sampling

  • Reduce cost and time of data collection
  • Make population estimation feasible
  • Enable scientific inference
  • Allow probability-based modeling
Sampling allows us to estimate unknown population parameters using known sample statistics.

ðŸ§Ū Example 1 — Estimating a Population Proportion (Categorical Variable)

Suppose researchers study whether individuals have a particular disease.

A random sample of 100 people is selected.

Among them, 12 people are found to have the disease.

Sample Proportion

\[ \hat{p} = \frac{12}{100} = 0.12 \]

Estimated proportion = 12%

This means that in the sample, 12% of individuals have the disease.

This value is used to estimate the true population proportion p.

📊 Sample Distribution (Categorical Data)

The sample information can be displayed using a bar chart showing:

  • Probability of disease = 0.12
  • Probability of no disease = 0.88

This graphical summary represents the distribution of the sample data.

🌍 Population Distribution (Theoretical Model)

Suppose medical records reveal that in the entire population, the true disease rate is:

p = 0.10 (10%)

This population behavior can be modeled using a probability distribution.

For categorical outcomes with two possibilities, the Binomial Distribution is used.

The binomial model describes:

  • Number of trials (n)
  • Probability of success (p)

Thus, population behavior is described theoretically, while sample behavior is observed empirically.

ðŸ§Ū Example 2 — Estimating a Population Mean (Numerical Variable)

Suppose a researcher studies the heights of individuals.

A random sample of 100 individuals is collected.

Height is a numerical variable.

Sample Statistics Computed

  • Sample Mean = xĖ„
  • Sample Standard Deviation = s

The distribution of sample heights can be displayed using:

  • Histogram
  • Box plot
These visuals summarize the sample distribution.

🌍 Population Distribution for Numerical Data

Suppose demographic studies reveal:

  • True mean height Ξ = 175 cm
  • True standard deviation σ = 10 cm
  • Heights are approximately normally distributed

This population behavior is modeled using the Normal Distribution.

The normal distribution is symmetric and bell-shaped, centered at the population mean.

Population models help predict how sample statistics behave.

🔁 Connecting Sample and Population

Aspect Sample Population
Scope Subset Entire group
Measures Statistics Parameters
Mean xĖ„ Ξ
Std. Deviation s σ
Proportion p˂ p
Distribution Empirical Theoretical

ðŸŽŊ Why Sampling Works

If samples are randomly selected:

  • They tend to reflect population characteristics
  • Sample statistics cluster around population parameters
  • Larger samples give more accurate estimates
This principle forms the foundation of statistical inference.

🌍 Real-World Applications

  • Election polling
  • Public health surveys
  • Market research
  • Quality testing in manufacturing
  • Machine learning model training
AI systems rely heavily on sampling because entire populations of data are rarely available.

🧠 Key Insights

  • Populations contain all individuals of interest
  • Samples are subsets used for analysis
  • Statistics estimate unknown parameters
  • Probability distributions model population behavior
  • Sampling enables reliable estimation and prediction
Sampling bridges observed data and theoretical population behavior.