Sampling Distributions, Central Limit Theorem, and Confidence Intervals: Study Notes for Statistics for Business

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Sampling Distributions, Central Limit Theorem, and Confidence Intervals

Introduction

This module covers foundational concepts in inferential statistics, focusing on how sample data can be used to make inferences about populations. Key topics include sampling distributions, the Central Limit Theorem (CLT), and the construction and interpretation of confidence intervals for both proportions and means.

Sampling and Random Processes

Population, Sample, and Random Sampling

Population: The entire group of individuals or items of interest in a study.
Sample: A subset of the population selected for analysis.
Random Sample: Every member of the population has an equal chance of being selected. This ensures unbiased representation.
Key Point: The size of the sample, not the population, determines the precision of statistical conclusions.

Sampling Distributions

Definition and Importance

Sampling Distribution: The probability distribution of a given statistic (e.g., mean, proportion) based on a random sample.
By repeatedly sampling from a population and calculating the statistic each time, we can observe the distribution of that statistic.
This concept allows us to link sample results to population parameters.

Example: Distribution of Sample Means

If we take many samples and compute the mean of each, the distribution of these means forms the sampling distribution of the mean.
The shape of the sampling distribution becomes approximately normal as sample size increases, regardless of the population's distribution (by the CLT).

Central Limit Theorem (CLT)

Statement and Implications

The CLT states that the sampling distribution of the sample mean (or proportion) approaches a normal distribution as the sample size increases, regardless of the population's distribution.
For a binomial random variable with and , the sampling distribution of the sample proportion is:

For a continuous variable with mean and standard deviation , the sampling distribution of the sample mean is:

Application: The CLT justifies the use of normal probability models for inference about means and proportions when sample sizes are sufficiently large (commonly for means, or and for proportions).

Inference About Proportions

Estimating Population Proportion

Let be the true population proportion, and the sample proportion.
For large , is approximately normally distributed with mean and standard deviation .
When is unknown, we estimate the standard deviation using , called the standard error:

Confidence Interval for Proportions

A confidence interval provides a range of plausible values for the population proportion .
For a 95% confidence interval:

Interpretation: We are 95% confident that the interval contains the true population proportion.
Margin of Error (ME): for 95% confidence.

Example: UK Referendum Poll

Sample size , (proportion "Remain").
95% CI:

Inference About Means (Continuous Variables)

Sampling Distribution of the Mean

For a sample of independent observations from a population with mean and standard deviation :

When is unknown, estimate with sample standard deviation .

t-Distribution and t-Statistic

When estimating the mean from a small sample, use the t-distribution to account for extra uncertainty from estimating with .
The t-statistic is:

Degrees of freedom:
As increases, the t-distribution approaches the standard normal distribution.

Confidence Interval for the Mean

For a 95% confidence interval:

is the critical value from the t-distribution with degrees of freedom.
For large , (standard normal critical value).

Example: Property Lease Costs

Sample size , s =
Critical value
Margin of error
95% CI: ()

Comparing Normal and t-Distributions

The t-distribution is wider (has heavier tails) than the normal distribution, especially for small sample sizes.
As degrees of freedom increase, the t-distribution converges to the normal distribution.

Key Assumptions and Interpretation

Random sampling and independence are required for valid inference.
For t-based inference, the population should be approximately normal, especially for small samples. For large samples, the CLT ensures approximate normality of the sampling distribution.
A confidence interval is about the population mean or proportion, not individual observations.
We are never certain that a particular interval contains the true parameter, but over many samples, 95% of such intervals will contain the parameter (for 95% confidence).

Summary Table: Confidence Intervals for Proportions and Means

Parameter	Point Estimate	Standard Error	Critical Value	Confidence Interval
Proportion ()			1.96 (Normal, 95%)
Mean ()			(t-distribution, 95%)

Excel Functions for Confidence Intervals

Proportion: CONFIDENCE.NORM(alpha, SQRT(p*(1-p)), n)
Mean: CONFIDENCE.T(alpha, s, n)
Margin of Error (ME) can be visualized using error bars in Excel charts.

Conclusion

Understanding sampling distributions and the CLT is essential for making valid inferences from sample data.
Confidence intervals provide a practical way to express uncertainty about population parameters based on sample statistics.
Proper interpretation of confidence intervals and awareness of underlying assumptions are crucial for sound statistical reasoning in business contexts.