Describing Data, Probability, and Random Variables: Study Notes for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Describing Data

Types of Data: Discrete vs. Continuous

In statistics, data refers to raw observations of interest, which can include numbers, letters, or images. For business statistics, we primarily focus on numerical data. Data can be classified into two main types:

Discrete Data: Data with a limited number of possible values. Examples include gender, grade, or performance evaluation. Discrete data often arise from counting (e.g., number of visits, number of products sold).
Continuous Data: Data with an infinite or very large number of possible values within a range. Examples include weight, temperature, earnings, or stock price. Continuous data typically result from measuring.

Variable refers to a characteristic that can vary across observations (e.g., height of players), while a constant does not vary (e.g., all players are male in a single-gender team).

Summarizing Discrete Data

Discrete data can be summarized using frequency tables and visualized with bar charts. Frequency tables show the count of occurrences for each value.

Count	Customers
0	475
1	553
2	154
3	36
4	41
5	9

It is common to use proportions (percentages) instead of raw counts:

Count	Customers	Proportion
0	475	0.37
1	553	0.44
2	154	0.12
3	36	0.03
4	41	0.03
5	9	0.01

Pie charts are sometimes used, but they can violate the area principle if the proportions do not sum to 100% or are misrepresented visually.

Describing Distributions of Data

A distribution characterizes how data values are spread across possible values. Key aspects to describe a distribution include:

Shape: The form of the distribution (e.g., unimodal, bimodal, uniform, symmetric, skewed).
Center: The typical or central value (mean or median).
Spread: The variability or dispersion (variance, standard deviation).

Modes are peaks in a histogram. A distribution can be:

Unimodal: One main peak.
Bimodal: Two peaks.
Multimodal: Three or more peaks.
Uniform: All values occur with roughly equal frequency.

A distribution is symmetric if both sides are mirror images. If one tail is longer, the distribution is skewed (right or left).

Measures of Center

Mean (Arithmetic Average): The sum of all values divided by the number of values. Formula:
Median: The middle value when data are ordered. Preferred for skewed distributions or when outliers are present.
Mode: The most frequently occurring value.

For symmetric, unimodal distributions, mean ≈ median ≈ mode. For right-skewed distributions, the mean is typically higher than the median.

Measures of Spread

Variance: The average squared deviation from the mean. Formula:
Standard Deviation: The square root of the variance, representing spread in the same units as the data. Formula:

For unimodal, symmetric distributions:

About 68% of observations fall within 1 standard deviation of the mean.
About 95% fall within 2 standard deviations.
About 99.7% fall within 3 standard deviations.

z-scores indicate how many standard deviations a value is from the mean:

Probability

Random Phenomena and Sample Space

A random phenomenon is one where individual outcomes are uncertain, but a regular distribution emerges over many repetitions. The sample space is the set of all possible outcomes. An event is a subset of the sample space (one or more outcomes).

Trial: A single performance of a random phenomenon (e.g., rolling a die).
Outcome: The result of a trial (e.g., rolling a 3).
Events: Combinations of outcomes (e.g., rolling an even number).
Independence: Trials are independent if the outcome of one does not affect another.

Probability Rules

Probabilities are numbers between 0 and 1.
means event A cannot occur; means A is certain.
The probability of the complement:
Multiplication Rule (AND): For independent events A and B:
Addition Rule (OR): For mutually exclusive events: For non-mutually exclusive events:

Law of Large Numbers

As the number of trials increases, the observed frequency of an event approaches its theoretical probability. Similarly, the sample mean and standard deviation approach the population mean () and standard deviation ().

Examples: Roulette

Probability of landing on red (18 red numbers out of 40):
Probability of not red:
Probability of red two times in a row:
Probability at least one red in two spins:

Random Variables

Definition and Types

A random variable is a variable whose value is determined by the outcome of a random phenomenon. It is usually denoted by a capital letter (e.g., X). Random variables can be:

Discrete: Takes on a countable number of values (e.g., number of heads in two coin tosses).
Continuous: Takes on any value within a range (e.g., time, weight).

Probability Distribution of a Discrete Random Variable

The probability distribution lists all possible values of the random variable and their associated probabilities. For example, tossing two coins (X = number of heads):

X	P(X = x)
0	0.25
1	0.50
2	0.25

Expected Value (Mean) of a Random Variable

The expected value (mean) is the long-run average value of the random variable:

Example: Slot machine payouts (simplified):

Outcome	Payout (x)	P(X = x)
Cherry	2.00	0.01
Gold Bar	1.50	0.05
Neither	0.00	0.94

Variance and Standard Deviation of a Random Variable

The variance of a random variable is the expected value of the squared deviation from the mean:

The standard deviation is the square root of the variance:

Standard deviation measures the risk or volatility of the random variable.

Transforming Random Variables

Adding/subtracting a constant :
Multiplying by a constant :

Combining Independent Random Variables

For independent random variables X and Y:
Variance always adds, even when subtracting:

Application: Portfolio Diversification

Suppose a financial portfolio holds 70% in bonds (mean return 0.04, SD 0.046) and 30% in stocks (mean return 0.115, SD 0.199). Assuming independence:

Expected return:
Variance:
Standard deviation:

Law of Large Numbers (Revisited)

As the number of observations increases, the sample mean () and sample standard deviation () converge to the population mean () and standard deviation ().

Summary Table: Key Properties of Probability

Operation	Rule
NOT
AND (independent)
OR (mutually exclusive)
OR (not mutually exclusive)

Additional info:

Pie charts can be misleading if the total does not sum to 100% or if the area principle is violated.
For business applications, understanding the spread (risk) is as important as understanding the mean (expected value).
Portfolio diversification reduces risk when assets are independent or not perfectly correlated.