Chi-square Tests: Goodness-of-Fit and Contingency Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chi-square Tests in Business Statistics

Introduction to Chi-square Tests

Chi-square tests are essential statistical tools for analyzing categorical (count or frequency) data. They help determine whether observed sample counts are compatible with expected counts under a specified null hypothesis. The two primary applications are the Goodness-of-Fit Test and Contingency Analysis.

Count (Attribute or Nominal) Data: Data expressed as the number of sample items in each category (frequency).
Purpose: Assess whether observed frequencies differ significantly from expected frequencies.

Types of Chi-square Tests

Goodness-of-Fit Test

This test evaluates whether a sample comes from a population with a specific probability distribution (e.g., Normal, Uniform, Multinomial).

Hypotheses:
- H0: The distribution of X follows a specified probability distribution.
- HA: The distribution of X does not follow the specified probability distribution.
Process:
1. Identify categories for the variable of interest.
2. Count occurrences in each category (observed counts).
3. Calculate expected counts based on the hypothesized distribution.
4. Compare observed and expected counts using the chi-square statistic.
Example: Testing if wooden dowel diameters follow a normal distribution N(4, 0.10).

Contingency Test (Contingency Analysis)

This test determines whether two categorical variables are independent or associated.

Applications:
- Assessing relationships between two nominal variables.
- Comparing proportions across groups.
Hypotheses:
- H0: The two variables are independent.
- HA: The two variables are not independent.
Data Summary: Data are organized in a contingency (cross-classification) table.
Example: Is payment method related to customer age group?

Chi-square Test Statistic

Observed and Expected Counts

Observed Count (oi): The actual number of items in a category.
Expected Count (ei): The number expected in a category if H0 is true. Calculated as .
Example: If 68 students are both female and finance specialists out of 500, oi = 68.

Chi-square Test Statistic Formula

Formula:

Interpretation: Large differences between observed and expected counts yield a large value, suggesting incompatibility with H0.

Chi-square Probability Distribution

Definition and Degrees of Freedom

Chi-square Distribution: Used to evaluate the test statistic. Characterized by degrees of freedom (df).
Degrees of Freedom (df):
- Goodness-of-Fit: (k = number of categories). If m parameters are estimated from data, .
- Contingency Analysis: (r = rows, c = columns in the table).
Example: For 3 payment methods (rows) and 4 age groups (columns): .

Decision Rule

Reject H0 if the calculated value exceeds the critical value (from tables or software) for the chosen significance level and df.
Alternatively, reject H0 if the P-value < .

Using Probability Tables and Excel

Chi-square tables provide critical values for given df and .
Excel functions:
- CHISQ.INV.RT(probability, df): Returns the critical value for the right-tailed test.
- CHISQ.DIST.RT(x, df): Returns the right-tailed p-value for a given test statistic.

Data Conditions (Assumptions)

Goodness-of-Fit:
- Simple Random Sample (SRS).
- Data summarized as counts per category.
- Sample size and all expected counts .
Contingency Analysis:
- Simple Random Sample (SRS).
- Data summarized as counts per category.
- All expected counts .

Goodness-of-Fit Application Example: Multinomial Test

Suppose we want to determine if the market share distribution for three companies (A, B, C) has changed after Company A introduced a new product.

Hypotheses:
- H0: pA = 0.20, pB = 0.30, pC = 0.50 (distribution unchanged).
- HA: At least one population proportion has changed.
Sample Data: n = 200 consumers surveyed.
Observed Counts: A = 54, B = 48, C = 98.
Expected Counts:
- eA = 200 × 0.20 = 40
- eB = 200 × 0.30 = 60
- eC = 200 × 0.50 = 100

Company	Observed (oi)	Expected (ei)	(oi - ei)2 / ei
A	54	40	4.90
B	48	60	2.40
C	98	100	0.04
Total	7.34

Degrees of Freedom: df = k - 1 = 3 - 1 = 2.
P-value: For with 2 df, .
Conclusion: If , since P-value < 0.05, reject H0. There is evidence that the market share distribution has changed.

Review Topics

Probability distributions (including Normal distribution).
Logic of hypothesis testing: test statistic, critical value, P-value.
Contingency table concepts: marginal, joint, and conditional probabilities.
Probability rule of independence.

Additional info: Students should be familiar with basic probability, hypothesis testing, and the use of statistical tables or software for critical values and P-values.