BackRandom Variables, Normal Probability Model, Sampling, and Confidence Intervals: Study Notes for Business Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Random Variables for Counts: Bernoulli, Binomial, and Poisson Models
Bernoulli Random Variable
The Bernoulli random variable models a single trial with only two possible outcomes: success or failure. It is foundational for understanding more complex count models.
Definition: A random variable B where B = 1 if success, B = 0 if failure.
Key Properties:
Only two outcomes: success or failure
Fixed probability of success: p
Trials are independent
Expected Value:
Variance:
Binomial Random Variable
The Binomial random variable extends the Bernoulli model to multiple independent trials, counting the number of successes.
Definition: Y = sum of n independent and identically distributed (iid) Bernoulli random variables.
Parameters: n (number of trials), p (probability of success per trial)
When to Use:
Fixed number of trials (n)
Each trial is success/failure
Constant probability of success (p)
Trials are independent
Random variable is the count of successes
10% Condition: If sampling without replacement, treat trials as independent if sample size is less than 10% of the population.
Probability of Exactly y Successes:
Binomial coefficient:
Mean:
Variance:
Example: Basketball Free Throws
Player makes free throws with p = 0.7
Attempts n = 12 free throws
Y ~ Binomial(12, 0.7)
Mean:
Variance:
Probability of exactly 10 made:
Probability of at least 10 made:
Excel Functions for Binomial
Exact k successes: BINOM.DIST(k, n, p, FALSE)
At most k successes: BINOM.DIST(k, n, p, TRUE)
At least k successes: 1 - BINOM.DIST(k-1, n, p, TRUE)
Poisson Model
The Poisson model is used for counting events in a fixed interval of time or space, with no fixed maximum count.
When to Use:
Counting events during a fixed interval
Possible values: 0, 1, 2, ... (no fixed max)
Controlled by λ (average rate per interval)
Estimating λ:
Probability Distribution:
Mean and Variance: ,
Example: Student Emails per Day
Over 20 days, professor received 100 emails
Rate: emails/day
Mean = 5, Variance = 5
Probability of exactly 7 emails:
Probability of at least 7 emails:
Excel Functions for Poisson
Exact x events: POISSON.DIST(x, lambda, FALSE)
At most x events: POISSON.DIST(x, lambda, TRUE)
At least x events: 1 - POISSON.DIST(x-1, lambda, TRUE)
Quick Translator: Wording to Math
Exactly k:
At most k:
At least k:
Between a and b:
Binomial vs Poisson: Fast Checklist
Binomial | Poisson |
|---|---|
"Out of n trials/attempts/items" "Success/failure" Constant p | "Per day/per hour/per mile/per page..." Rate/average events λ No fixed maximum count |
The Normal Probability Model
Central Limit Theorem (CLT)
The Central Limit Theorem states that the sum or average of many independent random variables with similar variance tends to follow a Normal distribution as the number of variables increases.
Explains why bell-shaped distributions are common in practice.
Example: Stock market value depends on many small contributions.
The Normal Probability Distribution
Continuous and bell-shaped
Probabilities are areas under the curve
Total area under the curve = 1
The Normal Model
Defined by mean () and standard deviation ()
Notation: or
Standardizing: Z-Scores
Standardization converts any normal variable to the standard normal (mean 0, SD 1).
Standard normal variable: Z
Standardization formula:
Interpretation: How many standard deviations x is above/below the mean
Finding Probabilities with the Normal Model
Sketch the desired area (left tail, right tail, between values)
Convert x-values to z-scores
Use Normal tables or software to find probabilities
Use complement and symmetry as needed
Normal Tables: Reading and Conversions
Left-tail probability:
Right-tail:
Between two values:
Symmetry:
Percentiles
The kth percentile is the value below which k% of the distribution falls.
Workflow:
Convert percentile to probability (e.g., 90th percentile → 0.90)
Find z-score with that left-tail area
Convert back:
Departures from Normality
Multimodality: More than one peak; suggests mixed groups.
Skewness: Lack of symmetry; long tail left or right.
Outliers: Extreme values not matching main pattern.
Normal Quantile Plot (QQ Plot)
Scatterplot to check normality
Points tracking a diagonal line indicate normality
Curved or patterned points suggest non-normality
Exam Cheat Sheet: Normal Model
Measurement/continuous variable/bell curve/average/CLT → Normal/CLT
Probability between/above/below → area under curve + z-score
Percentile value → find z, then
Is Normal appropriate? → check multimodality, skewness, outliers, QQ plot
Must-Know Formulas
Standardize:
Unstandardize:
Right tail:
Between:
Samples and Surveys
Cast of Characters
Population: Entire group of interest (e.g., all undergrads)
Sample: Subset of the population (e.g., 300 students)
Survey: Asking questions to a sample
Representative: Sample reflects population mix
Bias: Systematic error in sample selection
Surprising Properties of Sampling
Random selection is best for representativeness
Large populations do not require proportionally large samples; sample size depends on desired precision
Randomization
Reduces bias
Allows inference from sample to population
Excel method: Add random number with =RAND(), sort, pick first n
Nonresponse can introduce bias; replace non-responders from randomized list
Sampling Frame
List from which sample is drawn
Good frame: complete roster
Risky frame: incomplete or biased lists
Simple Random Sample (SRS) and Systematic Sampling
SRS: Every possible sample of size n has equal chance
Systematic Sampling: Pick every k-th item (random start)
Hidden patterns in list can bias systematic samples
Sampling Variation
Different random samples yield different results
Parameters: describe population (fixed, unknown)
Statistics: describe sample (computed, varies)
Alternative Sampling Methods
Method | Description | Risks/Notes |
|---|---|---|
Stratified | Split into groups, random sample within each | Improves representation |
Cluster | Divide into clusters, randomly pick clusters, survey all inside | Efficient, but clusters must be representative |
Census | Survey entire population | Often impractical |
Voluntary Response | People opt in | Biased toward strong opinions |
Convenience | Sample easy-to-reach | Usually not representative |
Survey Checklist
What was the sampling frame?
Was it an SRS?
Nonresponse rate?
Question wording?
Interviewer effects?
Survivor bias?
Quick Guide: Spotting Sampling Methods
"Everyone had equal chance" → SRS
"Random + =RAND() + take first n" → SRS
"Every 10th / 25th" → Systematic
"Split into groups, random within each" → Stratified
"Randomly pick groups, survey all inside" → Cluster
"Poll on Instagram / call-in vote" → Voluntary response
"Asked people outside library" → Convenience
Sampling Variation and Quality
Sampling Distribution of the Mean
The sampling distribution describes how a statistic (like the sample mean) varies from sample to sample.
Used to monitor processes (e.g., GPS chip testing)
Control charts help detect process changes
Type I and Type II Errors
Type I error: False alarm; stopping a process that's actually fine
Type II error: Miss; failing to detect a broken process
Benefits of Averaging
Sample means vary less than individual values
Distribution of sample means is more bell-shaped
Normal Models for Sample Means
Sample means are normally distributed if original data is normal or sample size is large (CLT)
Control Limits and Control Charts
Control limits define "safe zone" for process variation
Set by choosing Type I error rate and process parameters
Wide limits: fewer false alarms, more misses
Narrow limits: more false alarms, fewer misses
Convention: focus on Type I error (e.g., 5% or 1%)
X-bar Chart
Tracks sample mean over time
99% limits: process appears in control
95% limits: more false alarms
Repeated Testing Problem
Repeated checks increase chance of false alarms
Type I error accumulates over multiple tests
3-Sigma Limits
Type I error ≈ 0.0027 for a single point
Probability a normal variable falls more than 3 SDs from mean
Recognizing a Problem
Point outside control limit could be false alarm or real issue
Management must confirm actual cause
Control Charts for Variation
S-chart: Tracks sample standard deviation
R-chart: Tracks sample range
Confidence Intervals
Big Picture
True population parameters (p, μ) are usually unknown
Sample statistics (p̂, x̄) are used to estimate parameters
Confidence interval (CI): Range of plausible values for parameter
CIs indicate precision (tight vs wide)
Confidence Interval for a Proportion
Estimate:
Standard error:
95% CI:
General template:
Confidence Interval for the Mean
Estimate:
Sample standard deviation:
Standard error:
Use t-distribution when population SD is unknown
95% CI:
Degrees of freedom:
Interpreting Confidence Intervals
Correct: "We are 95% confident the true parameter is in this interval."
Incorrect: "95% chance the parameter is in the interval" (parameter is fixed)
Incorrect: "95% of all customers have balances between..." (applies to mean, not individuals)
Incorrect: "Mean of 95% of samples will fall between..." (applies to sampling distribution)
Manipulating Confidence Intervals
Do not combine CIs messily; define new variables for new intervals
Example: Define profit per customer, then build CI for profit
Margin of Error (MOE)
MOE is the "±" part of CI
Smaller MOE = more precise interval
MOE affected by:
Confidence level (higher → wider interval)
Variation in data (higher → wider interval)
Number of observations (higher n → narrower interval)
To tighten CI: increase n, reduce variability, or accept lower confidence
How to Do Any CI Problem: Checklist
Data should be SRS from population
Check conditions for procedure
Four-step workflow:
Identify parameter (p or μ)
Compute estimate (p̂ or x̄)
Compute standard error (SE)
CI = estimate ± (critical value)(SE)
Excel Cheat Sheet
CI for Proportion (95%) | CI for Mean (95%, t-interval) |
|---|---|
|
|
End-of-Chapter Practical Tips
Ensure data is SRS
Check conditions before calculation
Use 95% confidence intervals unless otherwise specified
Round endpoints when presenting
Keep full precision in intermediate steps