Chi-Square Tests, Simpson’s Paradox, and Assessing Prediction Accuracy

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chi-Square Tests & Goodness of Fit

Overview of the Chi-Square Test

The Chi-square test is a statistical method used to determine if there is a significant association between categorical variables. It is commonly applied to contingency tables to test hypotheses about relationships between variables such as survival rates and passenger class.

Null Hypothesis (H0): No association exists between the variables.
Alternative Hypothesis (Ha): An association exists between the variables.
Test Statistic: The Chi-square statistic is calculated as: where O is the observed frequency and E is the expected frequency.
Critical Value: Compare the calculated statistic to the critical value from the Chi-square distribution table at the chosen significance level (e.g., 5%).

Chi-square distribution table

Titanic Example: Association Between Passenger Class and Survival

The Titanic dataset is used to test whether passenger class affected survival rates.

Observed and expected frequencies are calculated for each class and survival outcome.
Chi-square statistic is computed and compared to the critical value.
Result: The null hypothesis is rejected, indicating a strong association between passenger class and survival.

Observed Frequencies Table

Passenger class	First	Second	Third	Total
Survival: no	80	97	372	549
Survival: yes	136	87	119	342
Total	216	184	491	891

Expected Frequencies Table

Passenger class	First	Second	Third
Survival: no	133	113	302
Survival: yes	82	70	188

Effect of Gender on Survival

Survival rates are further analyzed by gender and passenger class.

Female Passengers: Higher survival rates across all classes.
Male Passengers: Lower survival rates, especially in third class.

Survival rates for female passengers by class Survival rates for male passengers by class

Assumptions of the Chi-Square Test

For the Chi-square test to be valid:

No more than 20% of the cells should have expected frequencies less than 5.
If this condition is violated, categories may be combined to reduce the table size.

Example: Social and Economic Status for Married Couples

Husband's social class	Higher prof	Lower prof	Routine non-manual	Skilled manual	Semi-skilled manual	Unskilled manual	Total
Wife's Higher prof	0	2	0	0	0	0	2
Wife's Lower prof	1	8	5	3	2	1	20
Wife's Routine non-manual	4	13	9	15	6	0	47
Wife's Skilled manual	0	1	0	0	0	0	1
Wife's Semi-skilled manual	2	5	5	13	8	3	36
Wife's Unskilled manual	0	0	2	4	3	0	9
Total	7	29	21	35	19	4	115

Grouped Table (for valid test)

Husband's social class (grouped)	Professional	Skilled	Semi/unskilled	Total
Wife's Professional	11	8	3	22
Wife's Skilled	18	24	6	48
Wife's Semi/unskilled	7	24	14	45
Total	36	56	23	115

Simpson’s Paradox

Definition and Example

Simpson’s Paradox occurs when a trend appears in several groups of data but disappears or reverses when the groups are combined. This paradox highlights the importance of considering confounding variables before drawing conclusions.

Example: University of Berkeley admissions data initially suggested gender discrimination, but further analysis showed that females applied more to departments with lower admission rates.
Another example: A study on heart disease and smoking showed higher survival rates for smokers, but confounding factors may explain this counter-intuitive result.

Berkeley Admissions Table

Department	Males Admitted	Males Refused	% Admitted	Females Admitted	Females Refused	% Admitted
A	511	314	61.9	88	20	81.5
B	353	207	63.0	17	8	68.0
C	120	205	36.9	201	392	33.9
D	137	280	32.9	131	244	34.9
E	53	138	27.7	94	299	23.9
F	16	256	5.9	24	317	7.0

Assessing the Accuracy of Prediction Results

Confusion Matrix and Performance Metrics

In predictive modeling, accuracy is assessed using a confusion matrix, which compares predicted and actual outcomes.

True Positive (TP): Correctly predicted positive cases
True Negative (TN): Correctly predicted negative cases
False Positive (FP): Incorrectly predicted positive cases
False Negative (FN): Incorrectly predicted negative cases

Example: Fraud Detection

	Fraud	Not Fraud	Total
Predicted Fraud	9000	88000	97000
Predicted Not Fraud	1000	352000	353000
Total	10000	440000	450000

Sensitivity (True Positive Rate):
Specificity (True Negative Rate):
Positive Predictive Value:

Example: Diagnostic Accuracy

	Illness: Yes	Illness: No	Total
Test Positive	15	95	110
Test Negative	5	885	890
Total	20	980	1000

Sensitivity:
Specificity:
Positive Predictive Value:

Summary

Validity of Chi-square Test: Requires sufficient sample size and expected frequencies.
Simpson’s Paradox: Associations must be interpreted with caution, considering confounding variables.
Prediction Accuracy: Evaluated using sensitivity, specificity, and positive predictive value from confusion matrices.