BackSampling Distributions, Central Limit Theorem, and Confidence Intervals: Study Notes for Statistics for Business
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Sampling Distributions, Central Limit Theorem, and Confidence Intervals
Introduction
This module covers foundational concepts in inferential statistics, focusing on how sample data can be used to make inferences about populations. Key topics include sampling distributions, the Central Limit Theorem (CLT), and the construction and interpretation of confidence intervals for both proportions and means.
Sampling and Random Processes
Population, Sample, and Random Sampling
Population: The entire group of individuals or items of interest in a study.
Sample: A subset of the population selected for analysis.
Random Sample: Every member of the population has an equal chance of being selected. This ensures unbiased representation.
Key Point: The size of the sample, not the population, determines the precision of statistical conclusions.
Sampling Distributions
Definition and Importance
Sampling Distribution: The probability distribution of a given statistic (e.g., mean, proportion) based on a random sample.
By repeatedly sampling from a population and calculating the statistic each time, we can observe the distribution of that statistic.
This concept allows us to link sample results to population parameters.
Example: Distribution of Sample Means
If we take many samples and compute the mean of each, the distribution of these means forms the sampling distribution of the mean.
The shape of the sampling distribution becomes approximately normal as sample size increases, regardless of the population's distribution (by the CLT).
Central Limit Theorem (CLT)
Statement and Implications
The CLT states that the sampling distribution of the sample mean (or proportion) approaches a normal distribution as the sample size increases, regardless of the population's distribution.
For a binomial random variable with and , the sampling distribution of the sample proportion is:
For a continuous variable with mean and standard deviation , the sampling distribution of the sample mean is:
Application: The CLT justifies the use of normal probability models for inference about means and proportions when sample sizes are sufficiently large (commonly for means, or and for proportions).
Inference About Proportions
Estimating Population Proportion
Let be the true population proportion, and the sample proportion.
For large , is approximately normally distributed with mean and standard deviation .
When is unknown, we estimate the standard deviation using , called the standard error:
Confidence Interval for Proportions
A confidence interval provides a range of plausible values for the population proportion .
For a 95% confidence interval:
Interpretation: We are 95% confident that the interval contains the true population proportion.
Margin of Error (ME): for 95% confidence.
Example: UK Referendum Poll
Sample size , (proportion "Remain").
95% CI:
Inference About Means (Continuous Variables)
Sampling Distribution of the Mean
For a sample of independent observations from a population with mean and standard deviation :
When is unknown, estimate with sample standard deviation .
t-Distribution and t-Statistic
When estimating the mean from a small sample, use the t-distribution to account for extra uncertainty from estimating with .
The t-statistic is:
Degrees of freedom:
As increases, the t-distribution approaches the standard normal distribution.
Confidence Interval for the Mean
For a 95% confidence interval:
is the critical value from the t-distribution with degrees of freedom.
For large , (standard normal critical value).
Example: Property Lease Costs
Sample size , s =
Critical value
Margin of error
95% CI: ()
Comparing Normal and t-Distributions
The t-distribution is wider (has heavier tails) than the normal distribution, especially for small sample sizes.
As degrees of freedom increase, the t-distribution converges to the normal distribution.
Key Assumptions and Interpretation
Random sampling and independence are required for valid inference.
For t-based inference, the population should be approximately normal, especially for small samples. For large samples, the CLT ensures approximate normality of the sampling distribution.
A confidence interval is about the population mean or proportion, not individual observations.
We are never certain that a particular interval contains the true parameter, but over many samples, 95% of such intervals will contain the parameter (for 95% confidence).
Summary Table: Confidence Intervals for Proportions and Means
Parameter | Point Estimate | Standard Error | Critical Value | Confidence Interval |
|---|---|---|---|---|
Proportion () | 1.96 (Normal, 95%) | |||
Mean () | (t-distribution, 95%) |
Excel Functions for Confidence Intervals
Proportion: CONFIDENCE.NORM(alpha, SQRT(p*(1-p)), n)
Mean: CONFIDENCE.T(alpha, s, n)
Margin of Error (ME) can be visualized using error bars in Excel charts.
Conclusion
Understanding sampling distributions and the CLT is essential for making valid inferences from sample data.
Confidence intervals provide a practical way to express uncertainty about population parameters based on sample statistics.
Proper interpretation of confidence intervals and awareness of underlying assumptions are crucial for sound statistical reasoning in business contexts.