BackChi-square Tests: Goodness-of-Fit and Contingency Analysis
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chi-square Tests in Business Statistics
Introduction to Chi-square Tests
Chi-square tests are essential statistical tools for analyzing categorical (count or frequency) data. They help determine whether observed sample counts are compatible with expected counts under a specified null hypothesis. The two primary applications are the Goodness-of-Fit Test and Contingency Analysis.
Count (Attribute or Nominal) Data: Data expressed as the number of sample items in each category (frequency).
Purpose: Assess whether observed frequencies differ significantly from expected frequencies.
Types of Chi-square Tests
Goodness-of-Fit Test
This test evaluates whether a sample comes from a population with a specific probability distribution (e.g., Normal, Uniform, Multinomial).
Hypotheses:
H0: The distribution of X follows a specified probability distribution.
HA: The distribution of X does not follow the specified probability distribution.
Process:
Identify categories for the variable of interest.
Count occurrences in each category (observed counts).
Calculate expected counts based on the hypothesized distribution.
Compare observed and expected counts using the chi-square statistic.
Example: Testing if wooden dowel diameters follow a normal distribution N(4, 0.10).
Contingency Test (Contingency Analysis)
This test determines whether two categorical variables are independent or associated.
Applications:
Assessing relationships between two nominal variables.
Comparing proportions across groups.
Hypotheses:
H0: The two variables are independent.
HA: The two variables are not independent.
Data Summary: Data are organized in a contingency (cross-classification) table.
Example: Is payment method related to customer age group?
Chi-square Test Statistic
Observed and Expected Counts
Observed Count (oi): The actual number of items in a category.
Expected Count (ei): The number expected in a category if H0 is true. Calculated as .
Example: If 68 students are both female and finance specialists out of 500, oi = 68.
Chi-square Test Statistic Formula
Formula:
Interpretation: Large differences between observed and expected counts yield a large value, suggesting incompatibility with H0.
Chi-square Probability Distribution
Definition and Degrees of Freedom
Chi-square Distribution: Used to evaluate the test statistic. Characterized by degrees of freedom (df).
Degrees of Freedom (df):
Goodness-of-Fit: (k = number of categories). If m parameters are estimated from data, .
Contingency Analysis: (r = rows, c = columns in the table).
Example: For 3 payment methods (rows) and 4 age groups (columns): .
Decision Rule
Reject H0 if the calculated value exceeds the critical value (from tables or software) for the chosen significance level and df.
Alternatively, reject H0 if the P-value < .
Using Probability Tables and Excel
Chi-square tables provide critical values for given df and .
Excel functions:
CHISQ.INV.RT(probability, df): Returns the critical value for the right-tailed test.
CHISQ.DIST.RT(x, df): Returns the right-tailed p-value for a given test statistic.
Data Conditions (Assumptions)
Goodness-of-Fit:
Simple Random Sample (SRS).
Data summarized as counts per category.
Sample size and all expected counts .
Contingency Analysis:
Simple Random Sample (SRS).
Data summarized as counts per category.
All expected counts .
Goodness-of-Fit Application Example: Multinomial Test
Suppose we want to determine if the market share distribution for three companies (A, B, C) has changed after Company A introduced a new product.
Hypotheses:
H0: pA = 0.20, pB = 0.30, pC = 0.50 (distribution unchanged).
HA: At least one population proportion has changed.
Sample Data: n = 200 consumers surveyed.
Observed Counts: A = 54, B = 48, C = 98.
Expected Counts:
eA = 200 × 0.20 = 40
eB = 200 × 0.30 = 60
eC = 200 × 0.50 = 100
Company | Observed (oi) | Expected (ei) | (oi - ei)2 / ei |
|---|---|---|---|
A | 54 | 40 | 4.90 |
B | 48 | 60 | 2.40 |
C | 98 | 100 | 0.04 |
Total | 7.34 | ||
Degrees of Freedom: df = k - 1 = 3 - 1 = 2.
P-value: For with 2 df, .
Conclusion: If , since P-value < 0.05, reject H0. There is evidence that the market share distribution has changed.
Review Topics
Probability distributions (including Normal distribution).
Logic of hypothesis testing: test statistic, critical value, P-value.
Contingency table concepts: marginal, joint, and conditional probabilities.
Probability rule of independence.
Additional info: Students should be familiar with basic probability, hypothesis testing, and the use of statistical tables or software for critical values and P-values.