Two-Sample Tests: Hypothesis Testing for Means and Variances

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Two-Sample Tests

Introduction

Two-sample tests are statistical procedures used to compare the means or variances of two populations. These tests are fundamental in business analytics for determining whether differences observed between two groups are statistically significant. The most common applications include comparing sales, prices, or other business metrics across different locations, time periods, or conditions.

Difference Between Two Means

Independent Samples

When comparing two population means using independent samples, the goal is to test hypotheses or construct confidence intervals for the difference between the means, denoted as .

Independent samples: Data are collected from two unrelated groups.
Point estimate for the difference:
Variance assumptions:
- Population variances unknown but assumed equal: Use pooled-variance t test.
- Population variances unknown and not assumed equal: Use separate-variance t test.

Assumptions for Independent Samples

Samples are randomly and independently drawn.
Populations are normally distributed, or both sample sizes are at least 30 (Central Limit Theorem).
For pooled-variance t test: Population variances are unknown but assumed equal.

Hypothesis Tests for Two Population Means

Types of Tests

Lower-tail test: vs.
Upper-tail test: vs.
Two-tail test: vs.

Rejection regions are determined by the t-distribution critical values:

Lower-tail: Reject if
Upper-tail: Reject if
Two-tail: Reject if or

Pooled-Variance t Test (Equal Variances Assumed)

Pooled variance formula:
Test statistic:
Degrees of freedom:

Confidence Interval for (Equal Variances Assumed)

Formula:
Degrees of freedom:

Example: Pooled-Variance t Test

Comparing sales at two store locations (Special Front and In-Aisle):
- , ,
- , ,
Test statistic calculation:
Decision: Since (critical value at ), reject .
Conclusion: There is evidence of a difference in mean sales between the two locations.

Summary Table: Hypothesis Test Results

Result	Conclusions
	1. Reject 2. Conclude the mean sales are different for the two locations.
t test p-value = 0.0179 < 0.05	3. The probability of observing a difference in the two sample means this large or larger is 0.0179.
is positive	4. Conclude that the mean sales are higher for the special front location.

Evaluating the Normality Assumption

The pooled-variance t test assumes both populations are normally distributed with equal variances.
Normality can be checked using box plots or other graphical methods (e.g., in Excel).

Related Populations: The Paired Difference Test

Paired Samples

Paired difference tests are used when the samples are related, such as before-and-after measurements or matched pairs.

Paired or matched samples: Each observation in one sample is paired with a related observation in the other sample.
Point estimate for paired difference mean: , where
Sample standard deviation:
Assumptions: Differences are normally distributed, or sample size is large.

Test Statistic for Paired Difference Test

Formula:
Degrees of freedom:

Possible Hypotheses

Lower-tail test: vs.
Upper-tail test: vs.
Two-tail test: vs.

Confidence Interval for Paired Difference Mean

Formula:

Example: Paired Difference Test

Comparing mean prices at Costco and Walmart for items.
Calculated ,
Test statistic:
Decision: Do not reject (test statistic not in rejection region).
Conclusion: There is insufficient evidence of a difference in mean price between Costco and Walmart.

Summary Table: Paired Comparison Test Results

Result	Conclusions
	1. Do not reject
t test p-value = 0.1980 > 0.05	3. Conclude that no evidence exists that there is a difference in the mean price of equivalent items purchased at Costco and Walmart.

The F Distribution

Introduction

The F distribution is used to compare two population variances. The F critical value is found from the F table, and two degrees of freedom are required: one for the numerator and one for the denominator.

F statistic formula: , where is the larger sample variance.
Degrees of freedom: ,
In the F table:
- Numerator degrees of freedom determine the column.
- Denominator degrees of freedom determine the row.

Finding the Rejection Region

Two-tailed test: vs. Reject if
One-tailed test: vs. Reject if

Additional info: These procedures are foundational for business decision-making, such as evaluating the effectiveness of marketing strategies, comparing operational processes, or assessing product performance across different markets.