Comparing Two Groups: Means and Proportions

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Comparing Two Groups: Means and Proportions

Introduction

This section provides a comprehensive overview of statistical methods for comparing two groups, focusing on both means and proportions. It distinguishes between paired and independent samples, outlines the calculation of confidence intervals, and describes hypothesis testing procedures.

Comparing Two Groups: Means

Paired Samples

Paired samples involve observations that can be logically linked, allowing for the analysis of differences within pairs. This design reduces extraneous variability and increases statistical efficiency.

Definition: Two samples where each observation in one sample is paired with a unique observation in the other sample.
Purpose: To control for confounding variables by comparing within pairs.
Parameter: (mean of the differences)
Statistic: (sample mean of the differences)
Data Collection Examples:
- Comparing durability of two shoe sole materials on the same pair of shoes (Nike).
- Measuring GMAT scores before and after a review course for the same students (ETS).
Sample Size: (number of pairs)
Sampling Distribution:
- If the population of is Normal, then is Normal.
- If not, is approximately Normal if is large (Central Limit Theorem).
Standard Error:
Confidence Interval Formula:
Example Calculation:
- Given , , , (df = 4):
- Interval:
- Result: (-2.24, -0.16)
- Interpretation: 95% confidence that the mean difference is between -2.24 and -0.16 minutes.
Design Types:
1. One sample measured twice under different conditions.
2. Pairs of twin items.
3. Two samples logically paired (e.g., before and after a campaign).

Independent Samples

Independent samples are drawn such that the selection of one sample does not influence the other. The goal is to compare the means of two populations.

Parameter:
Statistic:
Analysis Approaches:
- Assume equal population variances (pooled variance estimate).
- Do not assume equal variances (use separate variance estimates).
Sampling Distribution:
- Mean:
- Variance:
- Standard Error:
- Approximately Normal if populations are Normal or sample sizes are large (CLT).
Test Statistic (Known Variances):
Conditions:
- Two independent random samples.
- Parent populations are Normal or both and are large.
- Sample sizes small relative to population sizes.
- Population variances are known (for Z-test).
Example Calculation (Known Variances):
- n1 = 120, n2 = 80
- ,
- ,
- 99% CI:
- Interval: (10.25, 23.25)
Unknown Variances:
- If variances are assumed equal, use pooled variance estimate.
- If not, use separate variance estimates (Welch's t-test).

Comparing Two Groups: Proportions

Independent Samples for Proportions

Comparing proportions between two independent groups is common in business statistics, such as comparing market shares or product awareness.

Parameter:
Statistic:
Sampling Distribution:
- Mean:
- Variance:
- Standard Error:
- Approximately Normal if sample sizes are large.
Test Statistic:
Conditions:
- Two independent random samples.
- Sample sizes large enough for Normal approximation: , , , .
- Sample sizes small relative to population sizes.
Example Calculation:
- Market 1: , ,
- Market 2: , ,
- Check: , , , (all > 10)
- 95% CI:
- Interval: (-0.18, 0.02)

General Review Concepts

Normal Distribution: Symmetrical, bell-shaped distribution characterized by mean and standard deviation .
Z-score: Standardized value indicating how many standard deviations an observation is from the mean.
Parameters vs. Statistics:
- Parameters: , , (population values)
- Statistics: , , (sample values)
Sampling Distributions: Probability distributions of statistics based on random samples.
Central Limit Theorem (CLT): For large samples, the sampling distribution of the mean is approximately Normal, regardless of population shape.
Critical Value: The value that defines the boundary of the confidence interval (e.g., or ).
Confidence Level (1-α): The probability that the confidence interval contains the true parameter.
Margin of Error: The maximum expected difference between the true parameter and a sample estimate.
Hypothesis Testing Methods: Three main approaches (not detailed here).
t-distribution: Used when population standard deviation is unknown and sample size is small.
One-sample t inference: Inference about a single mean using the t-distribution.
Rules for Means and Variances: Mathematical properties for combining means and variances.

Summary Table: Comparison of Paired vs. Independent Samples

Aspect	Paired Samples	Independent Samples
Definition	Observations are logically linked in pairs	Observations are unrelated between groups
Parameter
Statistic
Standard Error
Example	Before/after measurements on same subjects	Comparing two different groups (e.g., products)

Additional info: Academic context and formulas have been expanded for clarity and completeness. Table entries and some explanations are inferred for completeness and exam preparation.