BackDescribing Data, Probability, and Random Variables: Study Notes for Business Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Describing Data
Types of Data: Discrete vs. Continuous
In statistics, data refers to raw observations of interest, which can include numbers, letters, or images. For business statistics, we primarily focus on numerical data. Data can be classified into two main types:
Discrete Data: Data with a limited number of possible values. Examples include gender, grade, or performance evaluation. Discrete data often arise from counting (e.g., number of visits, number of products sold).
Continuous Data: Data with an infinite or very large number of possible values within a range. Examples include weight, temperature, earnings, or stock price. Continuous data typically result from measuring.
Variable refers to a characteristic that can vary across observations (e.g., height of players), while a constant does not vary (e.g., all players are male in a single-gender team).
Summarizing Discrete Data
Discrete data can be summarized using frequency tables and visualized with bar charts. Frequency tables show the count of occurrences for each value.
Count | Customers |
|---|---|
0 | 475 |
1 | 553 |
2 | 154 |
3 | 36 |
4 | 41 |
5 | 9 |
It is common to use proportions (percentages) instead of raw counts:
Count | Customers | Proportion |
|---|---|---|
0 | 475 | 0.37 |
1 | 553 | 0.44 |
2 | 154 | 0.12 |
3 | 36 | 0.03 |
4 | 41 | 0.03 |
5 | 9 | 0.01 |
Pie charts are sometimes used, but they can violate the area principle if the proportions do not sum to 100% or are misrepresented visually.
Describing Distributions of Data
A distribution characterizes how data values are spread across possible values. Key aspects to describe a distribution include:
Shape: The form of the distribution (e.g., unimodal, bimodal, uniform, symmetric, skewed).
Center: The typical or central value (mean or median).
Spread: The variability or dispersion (variance, standard deviation).
Modes are peaks in a histogram. A distribution can be:
Unimodal: One main peak.
Bimodal: Two peaks.
Multimodal: Three or more peaks.
Uniform: All values occur with roughly equal frequency.
A distribution is symmetric if both sides are mirror images. If one tail is longer, the distribution is skewed (right or left).
Measures of Center
Mean (Arithmetic Average): The sum of all values divided by the number of values. Formula:
Median: The middle value when data are ordered. Preferred for skewed distributions or when outliers are present.
Mode: The most frequently occurring value.
For symmetric, unimodal distributions, mean ≈ median ≈ mode. For right-skewed distributions, the mean is typically higher than the median.
Measures of Spread
Variance: The average squared deviation from the mean. Formula:
Standard Deviation: The square root of the variance, representing spread in the same units as the data. Formula:
For unimodal, symmetric distributions:
About 68% of observations fall within 1 standard deviation of the mean.
About 95% fall within 2 standard deviations.
About 99.7% fall within 3 standard deviations.
z-scores indicate how many standard deviations a value is from the mean:
Probability
Random Phenomena and Sample Space
A random phenomenon is one where individual outcomes are uncertain, but a regular distribution emerges over many repetitions. The sample space is the set of all possible outcomes. An event is a subset of the sample space (one or more outcomes).
Trial: A single performance of a random phenomenon (e.g., rolling a die).
Outcome: The result of a trial (e.g., rolling a 3).
Events: Combinations of outcomes (e.g., rolling an even number).
Independence: Trials are independent if the outcome of one does not affect another.
Probability Rules
Probabilities are numbers between 0 and 1.
means event A cannot occur; means A is certain.
The probability of the complement:
Multiplication Rule (AND): For independent events A and B:
Addition Rule (OR): For mutually exclusive events: For non-mutually exclusive events:
Law of Large Numbers
As the number of trials increases, the observed frequency of an event approaches its theoretical probability. Similarly, the sample mean and standard deviation approach the population mean () and standard deviation ().
Examples: Roulette
Probability of landing on red (18 red numbers out of 40):
Probability of not red:
Probability of red two times in a row:
Probability at least one red in two spins:
Random Variables
Definition and Types
A random variable is a variable whose value is determined by the outcome of a random phenomenon. It is usually denoted by a capital letter (e.g., X). Random variables can be:
Discrete: Takes on a countable number of values (e.g., number of heads in two coin tosses).
Continuous: Takes on any value within a range (e.g., time, weight).
Probability Distribution of a Discrete Random Variable
The probability distribution lists all possible values of the random variable and their associated probabilities. For example, tossing two coins (X = number of heads):
X | P(X = x) |
|---|---|
0 | 0.25 |
1 | 0.50 |
2 | 0.25 |
Expected Value (Mean) of a Random Variable
The expected value (mean) is the long-run average value of the random variable:
Example: Slot machine payouts (simplified):
Outcome | Payout (x) | P(X = x) |
|---|---|---|
Cherry | 2.00 | 0.01 |
Gold Bar | 1.50 | 0.05 |
Neither | 0.00 | 0.94 |
Variance and Standard Deviation of a Random Variable
The variance of a random variable is the expected value of the squared deviation from the mean:
The standard deviation is the square root of the variance:
Standard deviation measures the risk or volatility of the random variable.
Transforming Random Variables
Adding/subtracting a constant :
Multiplying by a constant :
Combining Independent Random Variables
For independent random variables X and Y:
Variance always adds, even when subtracting:
Application: Portfolio Diversification
Suppose a financial portfolio holds 70% in bonds (mean return 0.04, SD 0.046) and 30% in stocks (mean return 0.115, SD 0.199). Assuming independence:
Expected return:
Variance:
Standard deviation:
Law of Large Numbers (Revisited)
As the number of observations increases, the sample mean () and sample standard deviation () converge to the population mean () and standard deviation ().
Summary Table: Key Properties of Probability
Operation | Rule |
|---|---|
NOT | |
AND (independent) | |
OR (mutually exclusive) | |
OR (not mutually exclusive) |
Additional info:
Pie charts can be misleading if the total does not sum to 100% or if the area principle is violated.
For business applications, understanding the spread (risk) is as important as understanding the mean (expected value).
Portfolio diversification reduces risk when assets are independent or not perfectly correlated.