📘 Test Statistic & Standardized Testing Logic

A test statistic is a standardized numerical value that measures how far a sample result deviates from what is expected under the null hypothesis.

It converts sample evidence into a measurable scale so we can evaluate whether the result is unusual.

🎯 Why Test Statistics are Needed

When we collect sample data, the result rarely matches the population value exactly due to sampling variability.

We must determine whether the difference is:

  • Small → due to random variation
  • Large → evidence against the null hypothesis
A test statistic helps quantify "how large" the difference is.

📏 General Structure of a Test Statistic

\[ \text{Test Statistic} = \frac{\text{Observed Value} - \text{Expected Value}}{\text{Standard Error}} \]

This measures how many standard errors the sample result is away from the null hypothesis value.

It expresses deviation in standardized units.

🧠 Conceptual Meaning

  • Small test statistic → Sample close to expectation
  • Large test statistic → Sample far from expectation

Larger deviations make the null hypothesis less believable.

📐 Standardization Principle

Standardization converts different measurement scales into a common scale.

This allows comparison using probability distributions.

Just like z-scores standardize observations, test statistics standardize evidence.

🔢 Common Test Statistics

1️⃣ Z-Test Statistic (Known σ or Large Sample)

\[ Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

  • Used when population standard deviation is known
  • Used for large samples

2️⃣ t-Test Statistic (Unknown σ)

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

  • Uses sample standard deviation
  • Used for small samples

3️⃣ Proportion Z-Test

\[ Z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \]

  • Used for binary outcomes
  • Common in surveys and classification accuracy

🔍 Example 1 — Mean Test

Claim: Average battery life is 10 hours

Sample Data:

  • Sample mean = 11 hours
  • Population σ = 2 hours
  • Sample size = 100

Step 1: Standard Error

\[ SE = \frac{2}{\sqrt{100}} = \frac{2}{10} = 0.2 \]

Step 2: Test Statistic

\[ Z = \frac{11 - 10}{0.2} = \frac{1}{0.2} = 5 \]

The sample mean is 5 standard errors above the claimed mean → Strong evidence.

🔍 Example 2 — Proportion Test

Claim: Defect rate is 5%

Sample Data:

  • Sample proportion = 8%
  • Sample size = 400

Test Statistic

\[ Z = \frac{0.08 - 0.05}{\sqrt{0.05(0.95)/400}} = \frac{0.03}{\sqrt{0.00011875}} = \frac{0.03}{0.0109} \approx 2.75 \]

Observed defect rate is 2.75 standard errors above expected.

📊 Interpreting Test Statistics

Test Statistic Value Interpretation
Near 0 Data consistent with H₀
Moderate Some evidence against H₀
Large Strong evidence against H₀

🎯 Critical Regions

If the test statistic falls in extreme regions of the probability distribution, we reject H₀.

Extreme deviations are unlikely under the null hypothesis.

🔗 Link with p-value

The test statistic determines the p-value.

  • Larger test statistic → Smaller p-value
  • Smaller p-value → Stronger evidence

🤖 Importance in Machine Learning

  • Comparing algorithm performance
  • Evaluating model improvements
  • Feature selection testing
  • A/B testing systems
Test statistics quantify improvement significance.

🧠 Key Insights

  • Test statistic standardizes sample evidence
  • Measures deviation from null hypothesis
  • Expressed in standard error units
  • Forms basis for probability-based decisions
  • Used to compute p-values