Random Variables, Sampling Distributions, Confidence Intervals, and Hypothesis Testing: Key Concepts for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Random Variables and Probability Distributions

Definition and Types of Random Variables

A random variable is a numeric value associated with the outcome of a probability experiment. Random variables are classified as either discrete or continuous:

Discrete Random Variable: Takes on a countable set of values, often whole numbers. Typically associated with counting (e.g., number of defective items).
Continuous Random Variable: Can take on any value within an interval. Typically associated with measurement (e.g., weight, time, volume).

Common notation for random variables includes X, Y, and Z.

Probability Distributions

Discrete Probability Distribution: Represented by a table or chart listing all possible values and their probabilities.
Continuous Probability Distribution: Represented by a smooth curve called a probability density function (PDF). The total area under the curve equals 1.

Key Properties and Formulas for Discrete Random Variables

Mean (Expected Value):
Variance:

Key Discrete Random Variables

Binomial Random Variable:
- Fixed number of independent trials
- Each trial has two outcomes: success or failure
- Constant probability of success, p
- Counts the number of successes in n trials
- Calculator commands: BINOMPDF (exactly X successes), BINOMCDF (X or fewer successes)
Poisson Random Variable:
- Counts the number of events in a fixed interval of time or space
- Events occur independently
- Constant average rate, λ (lambda)
- Calculator commands: POISSONPDF (exactly X events), POISSONCDF (X or fewer events)
Hypergeometric Random Variable:
- Sampling without replacement from a finite population with known composition
- Useful for probability of a certain composition in the sample (e.g., selecting colored marbles from a jar)

Continuous Random Variables

Result from measurement; can take any value within an interval
Probability distribution described by a probability density function (PDF)
Probability is the area under the PDF curve over an interval

Normal Random Variable

Most important continuous random variable
PDF is bell-shaped, centered at the mean (μ), with inflection points at one standard deviation (σ) from the mean
Calculator commands: NORMALCDF (probabilities), INVNORM (find value for a given percentile)

Standard Normal Random Variable (Z)

Mean = 0, Standard deviation = 1
Z-scores represent the number of standard deviations from the mean
Useful for comparing different normal distributions (e.g., SAT vs. ACT scores)

Sampling Distributions

Definition and Importance

A sampling distribution is the probability distribution of a statistic (such as the sample mean or sample proportion) computed from a random sample. Sampling distributions help us understand the variability of sample statistics and make inferences about population parameters.

Key Properties

Sample averages (\bar{x}) and sample proportions (\hat{p}) are continuous random variables.
Their probability distributions are called sampling distributions.

Formulas for Sampling Distributions

Sample Mean:
- Expected value:
- Standard deviation: (approximated by for large samples)
Sample Proportion:
- Expected value:
- Standard deviation:

Central Limit Theorem (CLT)

If sample size n ≥ 30, the sampling distribution of the sample mean is approximately normal, regardless of the population's distribution.
If the population is normal, the sampling distribution of the sample mean is normal for any sample size.
Larger samples yield a tighter (less variable) sampling distribution.

Estimator Properties

Unbiased Estimator: An estimator whose expected value equals the population parameter (e.g., sample mean for population mean).
Minimum Variance: Among all unbiased estimators, the one with the smallest variance is preferred.

Confidence Intervals for Population Mean (μ) and Proportion (p)

Definition and Interpretation

A confidence interval (CI) is a range of values, derived from sample statistics, that is likely to contain the population parameter with a specified level of confidence (e.g., 95%).

Confidence Level (CL): The probability that the CI contains the parameter in repeated sampling (e.g., 95%).
Margin of Error: The half-width of the CI; reflects sampling variability.

Constructing Confidence Intervals

Large Sample CI for μ: Use the normal distribution (Central Limit Theorem applies). Substitute sample standard deviation s for population σ if unknown. Calculator command: Zinterval Critical value: Find z for area α/2 in each tail using INVNORM.
Small Sample CI for μ: Use the t-distribution if the population is approximately normal. Calculator command: Tinterval Degrees of freedom (DF): n – 1 Critical value: Find t for area α/2 in each tail using INVT.
Large Sample CI for p: Requires at least 15 successes and 15 failures in the sample. Use normal approximation. Calculator command: 1prop-ZINT
Small Sample CI for p: Used when there are fewer than 15 successes or failures. No standard calculator command; see detailed notes for methods.
Determining Sample Size: To achieve a desired margin of error, use p = 0.5 if no prior estimate is available.
Alpha (α): α = 1 – confidence level; determines critical z or t values (area α/2 in each tail).

Hypothesis Testing for Population Mean (μ) and Proportion (p)

Key Concepts and Steps

Null Hypothesis (H₀): Represents the status quo; always contains an equality (e.g., H₀: μ = μ₀).
Alternative Hypothesis (Hₐ): Represents the claim to be tested; uses >, <, or ≠.
Type I Error (α): Rejecting H₀ when it is true.
Type II Error (β): Failing to reject H₀ when it is false.
Significance Level (α): Probability of a Type I error; sets the threshold for rejection.
Test Statistic: Converts the sample statistic to a z or t value for comparison.
Critical Value: The z or t value marking the start of the rejection region.
Rejection Region: Area(s) in the tails where H₀ is rejected.
P-value: Probability of observing a result as extreme as the sample, assuming H₀ is true.
Decision Rule:
- If p-value < α, reject H₀.
- If p-value > α, fail to reject H₀ (H₀ is plausible).
- Alternatively, compare test statistic to critical value(s).
Reporting Results:
- "There is sufficient evidence at α = xx to reject H₀ and accept Hₐ."
- "There is insufficient evidence at α = xx to reject H₀. H₀ is plausible."

Types of Tests

One-Tailed Test (Left): Hₐ: parameter < value
One-Tailed Test (Right): Hₐ: parameter > value
Two-Tailed Test: Hₐ: parameter ≠ value

Test Selection and Calculator Commands

Large Sample Test for μ: Use normal distribution; ZTEST
Small Sample Test for μ: Use t-distribution if population is normal; TTEST
Large Sample Test for p: Use normal distribution if n*p₀ and n*(1–p₀) ≥ 15; 1propZtest

Summary Table: Key Random Variables and Their Properties

Random Variable	Type	Key Properties	Calculator Command
Binomial	Discrete	Fixed n, independent trials, constant p, two outcomes	BINOMPDF, BINOMCDF
Poisson	Discrete	Counts events in interval, constant rate λ, independence	POISSONPDF, POISSONCDF
Hypergeometric	Discrete	Sampling without replacement, known population composition	--
Normal	Continuous	Bell-shaped, mean μ, std dev σ	NORMALCDF, INVNORM
Standard Normal (Z)	Continuous	Mean 0, std dev 1	NORMALCDF, INVNORM

Example: Constructing a 95% Confidence Interval for the Mean

Suppose a sample of n = 36 has a mean of 50 and a standard deviation of 12.
95% CI for μ:
For 95% confidence,
Margin of error:
CI: (50 – 3.92, 50 + 3.92) = (46.08, 53.92)

Example: Hypothesis Test for a Proportion

Suppose in a sample of 100, 60 are successes. Test H₀: p = 0.5 vs. Hₐ: p ≠ 0.5 at α = 0.05.
Test statistic:
, ,
p-value ≈ 0.0455 (two-tailed)
Since p-value < 0.05, reject H₀.

Additional info: Some calculator commands and detailed procedures are referenced for classroom calculators (e.g., TI-84), but the underlying statistical principles apply universally.