BackChapter 7: Sampling and Sampling Distributions – Study Notes for Business Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Sampling and Sampling Distributions
Introduction
This chapter introduces the fundamental concepts of sampling and sampling distributions, which are essential for making statistical inferences in business contexts. Understanding how samples are drawn and analyzed allows statisticians to make reliable conclusions about populations without examining every member.
Why Sample?
Population and Sample
Population: The entire set of subjects or items of interest in a statistical study.
Sample: A subset of the population selected for analysis.
Sampling is necessary because:
Examining the entire population is often expensive and time-consuming.
Some measurements may be destructive (e.g., product testing).
If the sample is properly selected and analyzed, it can provide accurate estimates of population parameters.
Types of Sampling
Overview
Sampling methods are classified into two main categories: Probability Sampling and Nonprobability Sampling.
Probability Sampling | Nonprobability Sampling |
|---|---|
Simple Random Systematic Stratified Cluster Resampling | Convenience |
Probability Sampling
Probability Sample: Each member of the population has a known, nonzero chance of being selected.
Advantage: Enables inferential statistical tests and reliable conclusions about the population.
Simple Random Sampling
Every member of the population has an equal chance of being chosen.
Often implemented using random number generators or software tools.
Example: Selecting random customers from a database for a survey.
Systematic Sampling
Every kth member of the population is selected, where .
Advantage: Easy to implement and reduces selection bias.
Disadvantage: May be affected by periodicity in the population.
Example: Selecting every 10th product off an assembly line for quality inspection.
Stratified Sampling
Population is divided into mutually exclusive groups (strata) based on important characteristics.
A random sample is drawn from each stratum.
Advantage: Ensures representation of all subgroups.
Example: Sampling students by class year (freshman, sophomore, junior, senior).
Cluster Sampling
Population is divided into clusters, which are representative mini-subsets of the population.
Randomly select clusters, then sample all or some members within selected clusters.
Advantage: Simplifies sampling, especially for geographically dispersed populations.
Example: Sampling households by city blocks.
Resampling (Bootstrap Method)
Repeatedly draw samples from the population (often with replacement) to estimate parameters.
Used for estimating the variability of sample statistics.
Example: Using computer software to simulate many samples for estimating the mean.
Nonprobability Sampling
Convenience Sampling: Selecting samples that are easily accessible.
Advantage: Quick and easy to gather data.
Disadvantage: May not be representative of the population.
Example: Surveying people who happen to be in a shopping mall.
Sampling and Nonsampling Errors
Definitions
Parameter: A value that describes a characteristic of a population (e.g., mean, median).
Statistic: A value calculated from a sample (e.g., sample mean, sample median).
Sampling Error
The difference between a sample statistic and the corresponding population parameter.
Sampling error decreases as sample size increases.
Formula:
Where is the sample mean and is the population mean.
Nonsampling Error
Errors not related to sampling variability, such as ambiguous survey questions, respondent bias, or data collection mistakes.
The Central Limit Theorem (CLT)
Statement and Importance
The CLT states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the population's distribution shape.
For most practical purposes, a sample size of 30 or more is considered sufficient.
If the population is normally distributed, the sample means will also be normally distributed for any sample size.
Properties
The mean of the sampling distribution of the mean equals the population mean:
The standard deviation of the sampling distribution (standard error) is:
Application Example
Suppose the average driving distance per year is 12,000 miles with a standard deviation of 2,580 miles. The probability that a sample mean exceeds 12,500 miles can be calculated using the CLT if the sample size is large enough.
Testing Claims Using CLT
Use the sampling distribution to assess how unusual a sample result is, given a claim about the population.
If the sample result is very unlikely under the claim, the claim may be rejected.
Example: Testing whether the mean age of health club members is 37 years, given a sample mean of 36.1 years and a population standard deviation of 5 years.
Steps:
Find the standard error:
Calculate the z-score:
Determine the probability using the standard normal distribution.
Effect of Sample Size
As sample size increases, the standard error decreases, and the sample mean gets closer to the population mean.
The interval around the mean narrows, increasing precision.
Sampling Distribution with a Finite Population
Finite Population Correction
When the sample size is more than 5% of the population and sampling is without replacement, adjust the standard error.
Formula:
Where is the population size and is the sample size.
Example
A company with 100 customers (mean rating 7.2, standard deviation 0.7) samples 40 customers (mean rating 7.5). The finite population correction is applied to test if the true mean is greater than 7.5.
Sampling Distribution of the Proportion
Binomial Distribution and Proportion
Used when analyzing the number of successes in trials.
Check that and for normal approximation.
Formulas
Sample proportion:
Standard error of the proportion:
Z-score for the sample proportion:
Finite Population Correction for Proportion
Example
A college claims 70% of its 770 graduates found jobs related to their majors. A sample of 120 students found 97 with related jobs. The finite population correction is used to test the claim.
Comparison Table: Stratified vs. Cluster Sampling
Stratified Sampling | Cluster Sampling |
|---|---|
Strata are defined by a common characteristic (e.g., class year) | Clusters are mini-subsets representing the population |
Strata tend to be homogeneous | Clusters tend to be heterogeneous |
Each stratum is sampled | Some clusters are sampled |
Summary
Sampling allows for efficient and accurate estimation of population parameters.
Probability sampling methods provide the foundation for statistical inference.
The Central Limit Theorem justifies the use of normal distribution for sample means and proportions.
Finite population corrections are necessary when sampling without replacement from small populations.