BackTwo-Sample Tests: Hypothesis Testing for Means and Variances
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Two-Sample Tests
Introduction
Two-sample tests are statistical procedures used to compare the means or variances of two populations. These tests are fundamental in business analytics for determining whether differences observed between two groups are statistically significant. The most common applications include comparing sales, prices, or other business metrics across different locations, time periods, or conditions.
Difference Between Two Means
Independent Samples
When comparing two population means using independent samples, the goal is to test hypotheses or construct confidence intervals for the difference between the means, denoted as .
Independent samples: Data are collected from two unrelated groups.
Point estimate for the difference:
Variance assumptions:
Population variances unknown but assumed equal: Use pooled-variance t test.
Population variances unknown and not assumed equal: Use separate-variance t test.
Assumptions for Independent Samples
Samples are randomly and independently drawn.
Populations are normally distributed, or both sample sizes are at least 30 (Central Limit Theorem).
For pooled-variance t test: Population variances are unknown but assumed equal.
Hypothesis Tests for Two Population Means
Types of Tests
Lower-tail test: vs.
Upper-tail test: vs.
Two-tail test: vs.
Rejection regions are determined by the t-distribution critical values:
Lower-tail: Reject if
Upper-tail: Reject if
Two-tail: Reject if or
Pooled-Variance t Test (Equal Variances Assumed)
Pooled variance formula:
Test statistic:
Degrees of freedom:
Confidence Interval for (Equal Variances Assumed)
Formula:
Degrees of freedom:
Example: Pooled-Variance t Test
Comparing sales at two store locations (Special Front and In-Aisle):
, ,
, ,
Test statistic calculation:
Decision: Since (critical value at ), reject .
Conclusion: There is evidence of a difference in mean sales between the two locations.
Summary Table: Hypothesis Test Results
Result | Conclusions |
|---|---|
1. Reject 2. Conclude the mean sales are different for the two locations. | |
t test p-value = 0.0179 < 0.05 | 3. The probability of observing a difference in the two sample means this large or larger is 0.0179. |
is positive | 4. Conclude that the mean sales are higher for the special front location. |
Evaluating the Normality Assumption
The pooled-variance t test assumes both populations are normally distributed with equal variances.
Normality can be checked using box plots or other graphical methods (e.g., in Excel).
Related Populations: The Paired Difference Test
Paired Samples
Paired difference tests are used when the samples are related, such as before-and-after measurements or matched pairs.
Paired or matched samples: Each observation in one sample is paired with a related observation in the other sample.
Point estimate for paired difference mean: , where
Sample standard deviation:
Assumptions: Differences are normally distributed, or sample size is large.
Test Statistic for Paired Difference Test
Formula:
Degrees of freedom:
Possible Hypotheses
Lower-tail test: vs.
Upper-tail test: vs.
Two-tail test: vs.
Confidence Interval for Paired Difference Mean
Formula:
Example: Paired Difference Test
Comparing mean prices at Costco and Walmart for items.
Calculated ,
Test statistic:
Decision: Do not reject (test statistic not in rejection region).
Conclusion: There is insufficient evidence of a difference in mean price between Costco and Walmart.
Summary Table: Paired Comparison Test Results
Result | Conclusions |
|---|---|
1. Do not reject | |
t test p-value = 0.1980 > 0.05 | 3. Conclude that no evidence exists that there is a difference in the mean price of equivalent items purchased at Costco and Walmart. |
The F Distribution
Introduction
The F distribution is used to compare two population variances. The F critical value is found from the F table, and two degrees of freedom are required: one for the numerator and one for the denominator.
F statistic formula: , where is the larger sample variance.
Degrees of freedom: ,
In the F table:
Numerator degrees of freedom determine the column.
Denominator degrees of freedom determine the row.
Finding the Rejection Region
Two-tailed test: vs. Reject if
One-tailed test: vs. Reject if
Additional info: These procedures are foundational for business decision-making, such as evaluating the effectiveness of marketing strategies, comparing operational processes, or assessing product performance across different markets.