Chapter 17: Comparison – Statistical Methods for Comparing Groups

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 17: Comparison

17.1 Data for Comparisons

Comparing two groups is a fundamental task in business statistics, often used to determine if a new treatment, product, or strategy is more effective than an existing one. Inferential statistics allow us to test for differences between two populations using sample data.

Key Point 1: Framing the Comparison: Define the parameter of interest (e.g., difference in proportions or means) and set up hypotheses to test whether the observed difference is statistically significant.
Key Point 2: Example – Diet Comparison: Suppose a fitness chain wants to know if a proprietary diet leads to higher membership renewal rates than the conventional diet. Let pA be the proportion renewing on the Atkins diet, and pC for the conventional diet. The difference pA - pC measures the effect of the new diet.
Key Point 3: Hypotheses: To be profitable, the difference must exceed 4%.
Key Point 4: Confounding: Confounding occurs when the effects of two or more factors are mixed, making it difficult to attribute differences to the treatment alone. Randomization helps eliminate confounding. If randomization is not possible, ensure independent sampling from each population.

Summary Statistics – Diet Comparison

Summary statistics, such as sample proportions and their confidence intervals, are used to compare groups. Overlapping confidence intervals suggest no significant difference, while non-overlapping intervals indicate a statistically significant difference.

Confidence intervals for Atkins and Conventional diets Table of summary statistics for Atkins and Conventional diets

Example: The 95% confidence interval for the difference in renewal proportions (Atkins vs. Conventional) is (0.023, 0.217), which does not include zero, indicating a statistically significant difference.

Interpreting the Confidence Interval

If the confidence interval for the difference between two proportions (or means) does not include zero, we conclude that the groups are statistically significantly different. In the diet example, members on the Atkins diet renew at a higher rate than those on the conventional diet.

17.2 Two-Sample t-Test

The two-sample t-test is used to compare the means of two independent groups. It is commonly applied when comparing average outcomes, such as prices or sales, between two categories.

Key Point 1: Setting Up the Test: Define population means (e.g., μ1 for four-wheel drive cars, μ2 for two-wheel drive cars). The null hypothesis typically states that the difference is less than or equal to a constant D0 (often zero):
Key Point 2: Checklist for Validity:
- No obvious lurking variables
- Simple random samples (SRS)
- Similar variances (though the test can adjust for differences)
- Adequate sample size
Key Point 3: Confidence Interval Interpretation: If the 95% confidence interval for the difference in means does not include zero, the means are statistically significantly different.

t-test output for difference in means

Practice Question: Matched Pairs t-Test

When comparing two measurements from the same group (e.g., sales in 2015 vs. 2016 for the same stores), use a matched pairs t-test. This accounts for the paired nature of the data and increases statistical power.

Key Point 1: Data Structure: Each pair consists of two related observations (e.g., sales for the same store in two years).
Key Point 2: Hypothesis Test: Test whether the mean difference is zero. If the confidence interval for the mean difference does not include zero, there is a statistically significant change.
Key Point 3: Example: For 86 retail outlets, the 95% confidence interval for the difference in sales (2015 - 2016) is (−27.85, −9.84). Since zero is not included, sales changed significantly.

JMP menu for matched pairs analysis JMP matched pairs variable selection Matched pairs t-test output and scatterplot

Formulas and Calculations

Confidence Interval for a Proportion:
Confidence Interval for Difference in Proportions:
Two-Sample t-Test Statistic:
Matched Pairs t-Test Statistic: where is the mean of the differences, is the standard deviation of the differences, and is the number of pairs.

Summary Table: Types of Comparisons

Comparison Type	Test Used	Example
Two independent proportions	z-test for proportions	Diet renewal rates
Two independent means	Two-sample t-test	Car prices by drive type
Paired means	Matched pairs t-test	Sales in 2015 vs. 2016 for same stores