Chapter 7: Sampling and Sampling Distributions – Study Notes for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Sampling and Sampling Distributions

Introduction

This chapter introduces the fundamental concepts of sampling and sampling distributions, which are essential for making statistical inferences in business contexts. Understanding how samples are drawn and analyzed allows statisticians to make reliable conclusions about populations without examining every member.

Why Sample?

Population and Sample

Population: The entire set of subjects or items of interest in a statistical study.
Sample: A subset of the population selected for analysis.

Sampling is necessary because:

Examining the entire population is often expensive and time-consuming.
Some measurements may be destructive (e.g., product testing).
If the sample is properly selected and analyzed, it can provide accurate estimates of population parameters.

Types of Sampling

Overview

Sampling methods are classified into two main categories: Probability Sampling and Nonprobability Sampling.

Probability Sampling	Nonprobability Sampling
Simple Random Systematic Stratified Cluster Resampling	Convenience

Probability Sampling

Probability Sample: Each member of the population has a known, nonzero chance of being selected.
Advantage: Enables inferential statistical tests and reliable conclusions about the population.

Simple Random Sampling

Every member of the population has an equal chance of being chosen.
Often implemented using random number generators or software tools.

Example: Selecting random customers from a database for a survey.

Systematic Sampling

Every kth member of the population is selected, where .
Advantage: Easy to implement and reduces selection bias.
Disadvantage: May be affected by periodicity in the population.

Example: Selecting every 10th product off an assembly line for quality inspection.

Stratified Sampling

Population is divided into mutually exclusive groups (strata) based on important characteristics.
A random sample is drawn from each stratum.
Advantage: Ensures representation of all subgroups.

Example: Sampling students by class year (freshman, sophomore, junior, senior).

Cluster Sampling

Population is divided into clusters, which are representative mini-subsets of the population.
Randomly select clusters, then sample all or some members within selected clusters.
Advantage: Simplifies sampling, especially for geographically dispersed populations.

Example: Sampling households by city blocks.

Resampling (Bootstrap Method)

Repeatedly draw samples from the population (often with replacement) to estimate parameters.
Used for estimating the variability of sample statistics.

Example: Using computer software to simulate many samples for estimating the mean.

Nonprobability Sampling

Convenience Sampling: Selecting samples that are easily accessible.
Advantage: Quick and easy to gather data.
Disadvantage: May not be representative of the population.

Example: Surveying people who happen to be in a shopping mall.

Sampling and Nonsampling Errors

Definitions

Parameter: A value that describes a characteristic of a population (e.g., mean, median).
Statistic: A value calculated from a sample (e.g., sample mean, sample median).

Sampling Error

The difference between a sample statistic and the corresponding population parameter.
Sampling error decreases as sample size increases.

Formula:

Where is the sample mean and is the population mean.

Nonsampling Error

Errors not related to sampling variability, such as ambiguous survey questions, respondent bias, or data collection mistakes.

The Central Limit Theorem (CLT)

Statement and Importance

The CLT states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the population's distribution shape.
For most practical purposes, a sample size of 30 or more is considered sufficient.
If the population is normally distributed, the sample means will also be normally distributed for any sample size.

Properties

The mean of the sampling distribution of the mean equals the population mean:
The standard deviation of the sampling distribution (standard error) is:

Application Example

Suppose the average driving distance per year is 12,000 miles with a standard deviation of 2,580 miles. The probability that a sample mean exceeds 12,500 miles can be calculated using the CLT if the sample size is large enough.

Testing Claims Using CLT

Use the sampling distribution to assess how unusual a sample result is, given a claim about the population.
If the sample result is very unlikely under the claim, the claim may be rejected.

Example: Testing whether the mean age of health club members is 37 years, given a sample mean of 36.1 years and a population standard deviation of 5 years.

Steps:

Find the standard error:
Calculate the z-score:
Determine the probability using the standard normal distribution.

Effect of Sample Size

As sample size increases, the standard error decreases, and the sample mean gets closer to the population mean.
The interval around the mean narrows, increasing precision.

Sampling Distribution with a Finite Population

Finite Population Correction

When the sample size is more than 5% of the population and sampling is without replacement, adjust the standard error.

Formula:

Where is the population size and is the sample size.

Example

A company with 100 customers (mean rating 7.2, standard deviation 0.7) samples 40 customers (mean rating 7.5). The finite population correction is applied to test if the true mean is greater than 7.5.

Sampling Distribution of the Proportion

Binomial Distribution and Proportion

Used when analyzing the number of successes in trials.
Check that and for normal approximation.

Formulas

Sample proportion:
Standard error of the proportion:
Z-score for the sample proportion:

Finite Population Correction for Proportion

Example

A college claims 70% of its 770 graduates found jobs related to their majors. A sample of 120 students found 97 with related jobs. The finite population correction is used to test the claim.

Comparison Table: Stratified vs. Cluster Sampling

Stratified Sampling	Cluster Sampling
Strata are defined by a common characteristic (e.g., class year)	Clusters are mini-subsets representing the population
Strata tend to be homogeneous	Clusters tend to be heterogeneous
Each stratum is sampled	Some clusters are sampled

Summary

Sampling allows for efficient and accurate estimation of population parameters.
Probability sampling methods provide the foundation for statistical inference.
The Central Limit Theorem justifies the use of normal distribution for sample means and proportions.
Finite population corrections are necessary when sampling without replacement from small populations.