Skip to main content
Back

Random Variables, Normal Probability Model, Sampling, and Confidence Intervals: Study Notes for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Random Variables for Counts: Bernoulli, Binomial, and Poisson Models

Bernoulli Random Variable

The Bernoulli random variable models a single trial with only two possible outcomes: success or failure. It is foundational for understanding more complex count models.

  • Definition: A random variable B where B = 1 if success, B = 0 if failure.

  • Key Properties:

    • Only two outcomes: success or failure

    • Fixed probability of success: p

    • Trials are independent

  • Expected Value:

  • Variance:

Binomial Random Variable

The Binomial random variable extends the Bernoulli model to multiple independent trials, counting the number of successes.

  • Definition: Y = sum of n independent and identically distributed (iid) Bernoulli random variables.

  • Parameters: n (number of trials), p (probability of success per trial)

  • When to Use:

    • Fixed number of trials (n)

    • Each trial is success/failure

    • Constant probability of success (p)

    • Trials are independent

    • Random variable is the count of successes

  • 10% Condition: If sampling without replacement, treat trials as independent if sample size is less than 10% of the population.

  • Probability of Exactly y Successes:

    • Binomial coefficient:

  • Mean:

  • Variance:

Example: Basketball Free Throws

  • Player makes free throws with p = 0.7

  • Attempts n = 12 free throws

  • Y ~ Binomial(12, 0.7)

  • Mean:

  • Variance:

  • Probability of exactly 10 made:

  • Probability of at least 10 made:

Excel Functions for Binomial

  • Exact k successes: BINOM.DIST(k, n, p, FALSE)

  • At most k successes: BINOM.DIST(k, n, p, TRUE)

  • At least k successes: 1 - BINOM.DIST(k-1, n, p, TRUE)

Poisson Model

The Poisson model is used for counting events in a fixed interval of time or space, with no fixed maximum count.

  • When to Use:

    • Counting events during a fixed interval

    • Possible values: 0, 1, 2, ... (no fixed max)

    • Controlled by λ (average rate per interval)

  • Estimating λ:

  • Probability Distribution:

  • Mean and Variance: ,

Example: Student Emails per Day

  • Over 20 days, professor received 100 emails

  • Rate: emails/day

  • Mean = 5, Variance = 5

  • Probability of exactly 7 emails:

  • Probability of at least 7 emails:

Excel Functions for Poisson

  • Exact x events: POISSON.DIST(x, lambda, FALSE)

  • At most x events: POISSON.DIST(x, lambda, TRUE)

  • At least x events: 1 - POISSON.DIST(x-1, lambda, TRUE)

Quick Translator: Wording to Math

  • Exactly k:

  • At most k:

  • At least k:

  • Between a and b:

Binomial vs Poisson: Fast Checklist

Binomial

Poisson

"Out of n trials/attempts/items" "Success/failure" Constant p

"Per day/per hour/per mile/per page..." Rate/average events λ No fixed maximum count

The Normal Probability Model

Central Limit Theorem (CLT)

The Central Limit Theorem states that the sum or average of many independent random variables with similar variance tends to follow a Normal distribution as the number of variables increases.

  • Explains why bell-shaped distributions are common in practice.

  • Example: Stock market value depends on many small contributions.

The Normal Probability Distribution

  • Continuous and bell-shaped

  • Probabilities are areas under the curve

  • Total area under the curve = 1

The Normal Model

  • Defined by mean () and standard deviation ()

  • Notation: or

Standardizing: Z-Scores

Standardization converts any normal variable to the standard normal (mean 0, SD 1).

  • Standard normal variable: Z

  • Standardization formula:

  • Interpretation: How many standard deviations x is above/below the mean

Finding Probabilities with the Normal Model

  • Sketch the desired area (left tail, right tail, between values)

  • Convert x-values to z-scores

  • Use Normal tables or software to find probabilities

  • Use complement and symmetry as needed

Normal Tables: Reading and Conversions

  • Left-tail probability:

  • Right-tail:

  • Between two values:

  • Symmetry:

Percentiles

  • The kth percentile is the value below which k% of the distribution falls.

  • Workflow:

    1. Convert percentile to probability (e.g., 90th percentile → 0.90)

    2. Find z-score with that left-tail area

    3. Convert back:

Departures from Normality

  • Multimodality: More than one peak; suggests mixed groups.

  • Skewness: Lack of symmetry; long tail left or right.

  • Outliers: Extreme values not matching main pattern.

Normal Quantile Plot (QQ Plot)

  • Scatterplot to check normality

  • Points tracking a diagonal line indicate normality

  • Curved or patterned points suggest non-normality

Exam Cheat Sheet: Normal Model

  • Measurement/continuous variable/bell curve/average/CLT → Normal/CLT

  • Probability between/above/below → area under curve + z-score

  • Percentile value → find z, then

  • Is Normal appropriate? → check multimodality, skewness, outliers, QQ plot

Must-Know Formulas

  • Standardize:

  • Unstandardize:

  • Right tail:

  • Between:

Samples and Surveys

Cast of Characters

  • Population: Entire group of interest (e.g., all undergrads)

  • Sample: Subset of the population (e.g., 300 students)

  • Survey: Asking questions to a sample

  • Representative: Sample reflects population mix

  • Bias: Systematic error in sample selection

Surprising Properties of Sampling

  • Random selection is best for representativeness

  • Large populations do not require proportionally large samples; sample size depends on desired precision

Randomization

  • Reduces bias

  • Allows inference from sample to population

  • Excel method: Add random number with =RAND(), sort, pick first n

  • Nonresponse can introduce bias; replace non-responders from randomized list

Sampling Frame

  • List from which sample is drawn

  • Good frame: complete roster

  • Risky frame: incomplete or biased lists

Simple Random Sample (SRS) and Systematic Sampling

  • SRS: Every possible sample of size n has equal chance

  • Systematic Sampling: Pick every k-th item (random start)

  • Hidden patterns in list can bias systematic samples

Sampling Variation

  • Different random samples yield different results

  • Parameters: describe population (fixed, unknown)

  • Statistics: describe sample (computed, varies)

Alternative Sampling Methods

Method

Description

Risks/Notes

Stratified

Split into groups, random sample within each

Improves representation

Cluster

Divide into clusters, randomly pick clusters, survey all inside

Efficient, but clusters must be representative

Census

Survey entire population

Often impractical

Voluntary Response

People opt in

Biased toward strong opinions

Convenience

Sample easy-to-reach

Usually not representative

Survey Checklist

  • What was the sampling frame?

  • Was it an SRS?

  • Nonresponse rate?

  • Question wording?

  • Interviewer effects?

  • Survivor bias?

Quick Guide: Spotting Sampling Methods

  • "Everyone had equal chance" → SRS

  • "Random + =RAND() + take first n" → SRS

  • "Every 10th / 25th" → Systematic

  • "Split into groups, random within each" → Stratified

  • "Randomly pick groups, survey all inside" → Cluster

  • "Poll on Instagram / call-in vote" → Voluntary response

  • "Asked people outside library" → Convenience

Sampling Variation and Quality

Sampling Distribution of the Mean

The sampling distribution describes how a statistic (like the sample mean) varies from sample to sample.

  • Used to monitor processes (e.g., GPS chip testing)

  • Control charts help detect process changes

Type I and Type II Errors

  • Type I error: False alarm; stopping a process that's actually fine

  • Type II error: Miss; failing to detect a broken process

Benefits of Averaging

  • Sample means vary less than individual values

  • Distribution of sample means is more bell-shaped

Normal Models for Sample Means

  • Sample means are normally distributed if original data is normal or sample size is large (CLT)

Control Limits and Control Charts

  • Control limits define "safe zone" for process variation

  • Set by choosing Type I error rate and process parameters

  • Wide limits: fewer false alarms, more misses

  • Narrow limits: more false alarms, fewer misses

  • Convention: focus on Type I error (e.g., 5% or 1%)

X-bar Chart

  • Tracks sample mean over time

  • 99% limits: process appears in control

  • 95% limits: more false alarms

Repeated Testing Problem

  • Repeated checks increase chance of false alarms

  • Type I error accumulates over multiple tests

3-Sigma Limits

  • Type I error ≈ 0.0027 for a single point

  • Probability a normal variable falls more than 3 SDs from mean

Recognizing a Problem

  • Point outside control limit could be false alarm or real issue

  • Management must confirm actual cause

Control Charts for Variation

  • S-chart: Tracks sample standard deviation

  • R-chart: Tracks sample range

Confidence Intervals

Big Picture

  • True population parameters (p, μ) are usually unknown

  • Sample statistics (p̂, x̄) are used to estimate parameters

  • Confidence interval (CI): Range of plausible values for parameter

  • CIs indicate precision (tight vs wide)

Confidence Interval for a Proportion

  • Estimate:

  • Standard error:

  • 95% CI:

  • General template:

Confidence Interval for the Mean

  • Estimate:

  • Sample standard deviation:

  • Standard error:

  • Use t-distribution when population SD is unknown

  • 95% CI:

  • Degrees of freedom:

Interpreting Confidence Intervals

  • Correct: "We are 95% confident the true parameter is in this interval."

  • Incorrect: "95% chance the parameter is in the interval" (parameter is fixed)

  • Incorrect: "95% of all customers have balances between..." (applies to mean, not individuals)

  • Incorrect: "Mean of 95% of samples will fall between..." (applies to sampling distribution)

Manipulating Confidence Intervals

  • Do not combine CIs messily; define new variables for new intervals

  • Example: Define profit per customer, then build CI for profit

Margin of Error (MOE)

  • MOE is the "±" part of CI

  • Smaller MOE = more precise interval

  • MOE affected by:

    1. Confidence level (higher → wider interval)

    2. Variation in data (higher → wider interval)

    3. Number of observations (higher n → narrower interval)

  • To tighten CI: increase n, reduce variability, or accept lower confidence

How to Do Any CI Problem: Checklist

  • Data should be SRS from population

  • Check conditions for procedure

  • Four-step workflow:

    1. Identify parameter (p or μ)

    2. Compute estimate (p̂ or x̄)

    3. Compute standard error (SE)

    4. CI = estimate ± (critical value)(SE)

Excel Cheat Sheet

CI for Proportion (95%)

CI for Mean (95%, t-interval)

  • p̂ in B1, n in B2

  • Standard error: =SQRT(B1*(1-B1)/B2)

  • z* for 95%: =NORM.S.INV(0.975)

  • Lower endpoint: B1 - NORM.S.INV(0.975)*SQRT(B1*(1-B1)/B2)

  • Upper endpoint: B1 + NORM.S.INV(0.975)*SQRT(B1*(1-B1)/B2)

  • x̄ in B1, s in B2, n in B3

  • Standard error: =B2/SQRT(B3)

  • t* for 95%: =T.INV.2T(0.05, B3-1)

  • Lower endpoint: B1 - T.INV.2T(0.05, B3-1)*(B2/SQRT(B3))

  • Upper endpoint: B1 + T.INV.2T(0.05, B3-1)*(B2/SQRT(B3))

End-of-Chapter Practical Tips

  • Ensure data is SRS

  • Check conditions before calculation

  • Use 95% confidence intervals unless otherwise specified

  • Round endpoints when presenting

  • Keep full precision in intermediate steps

Pearson Logo

Study Prep