📘 Chi-Square Test for Independence

The Chi-Square Test for Independence is used to determine if there is an association between two categorical variables in a sample.

This test compares the observed frequencies in each category to the expected frequencies if the variables were independent.

🎯 What is the Chi-Square Test for Independence?

The Chi-Square test for independence tests whether two categorical variables are related or independent. It is based on the idea that if the variables are independent, the distribution of one variable should not depend on the other.

This test is commonly used in fields such as market research, healthcare, and social science to understand relationships between categorical variables.

Null Hypothesis (H₀): The two categorical variables are independent.

Alternative Hypothesis (H₁): The two categorical variables are dependent (associated).

⚖️ The Chi-Square Test Statistic

The test statistic for the Chi-Square test is calculated as:

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Where:

  • \(O\) = Observed frequency (what we actually observed in the data)
  • \(E\) = Expected frequency (what we would expect if the variables were independent)
  • \(\sum\) = Sum over all categories in the contingency table

The Chi-Square statistic follows a Chi-Square distribution with degrees of freedom determined by the number of categories in the two variables.

📊 Expected Frequency Calculation

The expected frequency for each cell in the contingency table is calculated by multiplying the row total by the column total and dividing by the grand total:

\[ E = \frac{(Row Total) \times (Column Total)}{Grand Total} \]

Steps:

  • Step 1: Create a contingency table with the observed frequencies (O).
  • Step 2: Calculate the expected frequencies (E) for each cell in the table.
  • Step 3: Compute the Chi-Square statistic using the formula.
  • Step 4: Determine the degrees of freedom (df) for the Chi-Square distribution.
  • Step 5: Compare the Chi-Square statistic with the critical value from the Chi-Square distribution table.

🔍 Example — Chi-Square Test for Independence

Research Question: Is there an association between gender and preference for a particular brand of soda (A, B, or C)?

Data:

Gender / Brand Brand A Brand B Brand C Total
Male 30 10 20 60
Female 40 20 30 90
Total 70 30 50 150

Step 1: Calculate Expected Frequencies

Formula:

\[ E = \frac{(Row Total) \times (Column Total)}{Grand Total} \]

Example Calculation for Brand A, Male:

\[ E = \frac{60 \times 70}{150} = 28 \]

Repeat this calculation for all cells in the table:

Gender / Brand Brand A (E) Brand B (E) Brand C (E) Total
Male 28 12 20 60
Female 42 18 30 90
Total 70 30 50 150

Step 2: Calculate the Chi-Square Statistic

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \] For each cell, calculate the difference between observed (O) and expected (E), square it, divide by E, and sum over all cells. \[ \chi^2 = \frac{(30 - 28)^2}{28} + \frac{(10 - 12)^2}{12} + \frac{(20 - 20)^2}{20} + \frac{(40 - 42)^2}{42} + \frac{(20 - 18)^2}{18} + \frac{(30 - 30)^2}{30} \] \[ \chi^2 = \frac{4}{28} + \frac{4}{12} + 0 + \frac{4}{42} + \frac{4}{18} + 0 \] \[ \chi^2 = 0.143 + 0.333 + 0 + 0.095 + 0.222 + 0 = 0.793 \]

Step 3: Degrees of Freedom

\[ df = (Rows - 1) \times (Columns - 1) = (2 - 1) \times (3 - 1) = 1 \times 2 = 2 \]

Step 4: Compare with Critical Value

From the Chi-Square distribution table, for df = 2 and α = 0.05, the critical value is 5.991. Since \(\chi^2 = 0.793 < 5.991\), we fail to reject the null hypothesis.

Conclusion: There is no significant association between gender and preference for soda brands at the 5% significance level.

📊 Key Takeaways

  • The Chi-Square Test for Independence tests whether two categorical variables are related.
  • It uses observed and expected frequencies to calculate the test statistic.
  • We reject H₀ if the test statistic exceeds the critical value or if the p-value is smaller than α.
  • The test is widely used in various fields such as market research, healthcare, and education to understand relationships between variables.

🧠 Real-World Applications

  • Market Research: Determining if there's a relationship between customer demographics and product preferences.
  • Medical Studies: Testing whether a particular treatment is effective across different groups (e.g., age or gender).
  • Social Science: Understanding the relationship between social factors and behavior patterns.
Chi-Square Tests are crucial in analyzing categorical data and making informed decisions based on associations between variables.