Random Variables, Normal Probability Model, Sampling, and Confidence Intervals: Study Notes for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Random Variables for Counts: Bernoulli, Binomial, and Poisson Models

Bernoulli Random Variable

The Bernoulli random variable models a single trial with only two possible outcomes: success or failure. It is foundational for understanding more complex count models.

Definition: A random variable B where B = 1 if success, B = 0 if failure.
Key Properties:
- Only two outcomes: success or failure
- Fixed probability of success: p
- Trials are independent
Expected Value:
Variance:

Binomial Random Variable

The Binomial random variable extends the Bernoulli model to multiple independent trials, counting the number of successes.

Definition: Y = sum of n independent and identically distributed (iid) Bernoulli random variables.
Parameters: n (number of trials), p (probability of success per trial)
When to Use:
- Fixed number of trials (n)
- Each trial is success/failure
- Constant probability of success (p)
- Trials are independent
- Random variable is the count of successes
10% Condition: If sampling without replacement, treat trials as independent if sample size is less than 10% of the population.
Probability of Exactly y Successes:
- Binomial coefficient:
Mean:
Variance:

Example: Basketball Free Throws

Player makes free throws with p = 0.7
Attempts n = 12 free throws
Y ~ Binomial(12, 0.7)
Mean:
Variance:
Probability of exactly 10 made:
Probability of at least 10 made:

Excel Functions for Binomial

Exact k successes: BINOM.DIST(k, n, p, FALSE)
At most k successes: BINOM.DIST(k, n, p, TRUE)
At least k successes: 1 - BINOM.DIST(k-1, n, p, TRUE)

Poisson Model

The Poisson model is used for counting events in a fixed interval of time or space, with no fixed maximum count.

When to Use:
- Counting events during a fixed interval
- Possible values: 0, 1, 2, ... (no fixed max)
- Controlled by λ (average rate per interval)
Estimating λ:
Probability Distribution:
Mean and Variance: ,

Example: Student Emails per Day

Over 20 days, professor received 100 emails
Rate: emails/day
Mean = 5, Variance = 5
Probability of exactly 7 emails:
Probability of at least 7 emails:

Excel Functions for Poisson

Exact x events: POISSON.DIST(x, lambda, FALSE)
At most x events: POISSON.DIST(x, lambda, TRUE)
At least x events: 1 - POISSON.DIST(x-1, lambda, TRUE)

Quick Translator: Wording to Math

Exactly k:
At most k:
At least k:
Between a and b:

Binomial vs Poisson: Fast Checklist

Binomial	Poisson
"Out of n trials/attempts/items" "Success/failure" Constant p	"Per day/per hour/per mile/per page..." Rate/average events λ No fixed maximum count

The Normal Probability Model

Central Limit Theorem (CLT)

The Central Limit Theorem states that the sum or average of many independent random variables with similar variance tends to follow a Normal distribution as the number of variables increases.

Explains why bell-shaped distributions are common in practice.
Example: Stock market value depends on many small contributions.

The Normal Probability Distribution

Continuous and bell-shaped
Probabilities are areas under the curve
Total area under the curve = 1

The Normal Model

Defined by mean () and standard deviation ()
Notation: or

Standardizing: Z-Scores

Standardization converts any normal variable to the standard normal (mean 0, SD 1).

Standard normal variable: Z
Standardization formula:
Interpretation: How many standard deviations x is above/below the mean

Finding Probabilities with the Normal Model

Sketch the desired area (left tail, right tail, between values)
Convert x-values to z-scores
Use Normal tables or software to find probabilities
Use complement and symmetry as needed

Normal Tables: Reading and Conversions

Left-tail probability:
Right-tail:
Between two values:
Symmetry:

Percentiles

The kth percentile is the value below which k% of the distribution falls.
Workflow:
1. Convert percentile to probability (e.g., 90th percentile → 0.90)
2. Find z-score with that left-tail area
3. Convert back:

Departures from Normality

Multimodality: More than one peak; suggests mixed groups.
Skewness: Lack of symmetry; long tail left or right.
Outliers: Extreme values not matching main pattern.

Normal Quantile Plot (QQ Plot)

Scatterplot to check normality
Points tracking a diagonal line indicate normality
Curved or patterned points suggest non-normality

Exam Cheat Sheet: Normal Model

Measurement/continuous variable/bell curve/average/CLT → Normal/CLT
Probability between/above/below → area under curve + z-score
Percentile value → find z, then
Is Normal appropriate? → check multimodality, skewness, outliers, QQ plot

Must-Know Formulas

Standardize:
Unstandardize:
Right tail:
Between:

Samples and Surveys

Cast of Characters

Population: Entire group of interest (e.g., all undergrads)
Sample: Subset of the population (e.g., 300 students)
Survey: Asking questions to a sample
Representative: Sample reflects population mix
Bias: Systematic error in sample selection

Surprising Properties of Sampling

Random selection is best for representativeness
Large populations do not require proportionally large samples; sample size depends on desired precision

Randomization

Reduces bias
Allows inference from sample to population
Excel method: Add random number with =RAND(), sort, pick first n
Nonresponse can introduce bias; replace non-responders from randomized list

Sampling Frame

List from which sample is drawn
Good frame: complete roster
Risky frame: incomplete or biased lists

Simple Random Sample (SRS) and Systematic Sampling

SRS: Every possible sample of size n has equal chance
Systematic Sampling: Pick every k-th item (random start)
Hidden patterns in list can bias systematic samples

Sampling Variation

Different random samples yield different results
Parameters: describe population (fixed, unknown)
Statistics: describe sample (computed, varies)

Alternative Sampling Methods

Method	Description	Risks/Notes
Stratified	Split into groups, random sample within each	Improves representation
Cluster	Divide into clusters, randomly pick clusters, survey all inside	Efficient, but clusters must be representative
Census	Survey entire population	Often impractical
Voluntary Response	People opt in	Biased toward strong opinions
Convenience	Sample easy-to-reach	Usually not representative

Survey Checklist

What was the sampling frame?
Was it an SRS?
Nonresponse rate?
Question wording?
Interviewer effects?
Survivor bias?

Quick Guide: Spotting Sampling Methods

"Everyone had equal chance" → SRS
"Random + =RAND() + take first n" → SRS
"Every 10th / 25th" → Systematic
"Split into groups, random within each" → Stratified
"Randomly pick groups, survey all inside" → Cluster
"Poll on Instagram / call-in vote" → Voluntary response
"Asked people outside library" → Convenience

Sampling Variation and Quality

Sampling Distribution of the Mean

The sampling distribution describes how a statistic (like the sample mean) varies from sample to sample.

Used to monitor processes (e.g., GPS chip testing)
Control charts help detect process changes

Type I and Type II Errors

Type I error: False alarm; stopping a process that's actually fine
Type II error: Miss; failing to detect a broken process

Benefits of Averaging

Sample means vary less than individual values
Distribution of sample means is more bell-shaped

Normal Models for Sample Means

Sample means are normally distributed if original data is normal or sample size is large (CLT)

Control Limits and Control Charts

Control limits define "safe zone" for process variation
Set by choosing Type I error rate and process parameters
Wide limits: fewer false alarms, more misses
Narrow limits: more false alarms, fewer misses
Convention: focus on Type I error (e.g., 5% or 1%)

X-bar Chart

Tracks sample mean over time
99% limits: process appears in control
95% limits: more false alarms

Repeated Testing Problem

Repeated checks increase chance of false alarms
Type I error accumulates over multiple tests

3-Sigma Limits

Type I error ≈ 0.0027 for a single point
Probability a normal variable falls more than 3 SDs from mean

Recognizing a Problem

Point outside control limit could be false alarm or real issue
Management must confirm actual cause

Control Charts for Variation

S-chart: Tracks sample standard deviation
R-chart: Tracks sample range

Confidence Intervals

Big Picture

True population parameters (p, μ) are usually unknown
Sample statistics (p̂, x̄) are used to estimate parameters
Confidence interval (CI): Range of plausible values for parameter
CIs indicate precision (tight vs wide)

Confidence Interval for a Proportion

Estimate:
Standard error:
95% CI:
General template:

Confidence Interval for the Mean

Estimate:
Sample standard deviation:
Standard error:
Use t-distribution when population SD is unknown
95% CI:
Degrees of freedom:

Interpreting Confidence Intervals

Correct: "We are 95% confident the true parameter is in this interval."
Incorrect: "95% chance the parameter is in the interval" (parameter is fixed)
Incorrect: "95% of all customers have balances between..." (applies to mean, not individuals)
Incorrect: "Mean of 95% of samples will fall between..." (applies to sampling distribution)

Manipulating Confidence Intervals

Do not combine CIs messily; define new variables for new intervals
Example: Define profit per customer, then build CI for profit

Margin of Error (MOE)

MOE is the "±" part of CI
Smaller MOE = more precise interval
MOE affected by:
1. Confidence level (higher → wider interval)
2. Variation in data (higher → wider interval)
3. Number of observations (higher n → narrower interval)
To tighten CI: increase n, reduce variability, or accept lower confidence

How to Do Any CI Problem: Checklist

Data should be SRS from population
Check conditions for procedure
Four-step workflow:
1. Identify parameter (p or μ)
2. Compute estimate (p̂ or x̄)
3. Compute standard error (SE)
4. CI = estimate ± (critical value)(SE)

Excel Cheat Sheet

CI for Proportion (95%)	CI for Mean (95%, t-interval)
p̂ in B1, n in B2 Standard error: =SQRT(B1(1-B1)/B2) z for 95%: =NORM.S.INV(0.975) Lower endpoint: B1 - NORM.S.INV(0.975)SQRT(B1(1-B1)/B2) Upper endpoint: B1 + NORM.S.INV(0.975)SQRT(B1(1-B1)/B2)	x̄ in B1, s in B2, n in B3 Standard error: =B2/SQRT(B3) t* for 95%: =T.INV.2T(0.05, B3-1) Lower endpoint: B1 - T.INV.2T(0.05, B3-1)(B2/SQRT(B3)) Upper endpoint: B1 + T.INV.2T(0.05, B3-1)(B2/SQRT(B3))

End-of-Chapter Practical Tips

Ensure data is SRS
Check conditions before calculation
Use 95% confidence intervals unless otherwise specified
Round endpoints when presenting
Keep full precision in intermediate steps