Skip to main content
Back

Statistics for Business: Study Guide on Data Visualization, Distributions, Percentiles, and Association

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Data Visualization and Misleading Graphs

Bar Charts and Value Plots

Data visualization is a key tool in statistics for summarizing and communicating information. Bar charts and value plots are common methods for displaying categorical and quantitative data.

  • Bar Chart: Represents data with rectangular bars, where the length of each bar is proportional to the value it represents.

  • Value Plot: Plots individual data points, often used for small datasets or to show exact values.

  • Misleading Graphs: Graphs can be misleading if the axes are manipulated or if the visual representation exaggerates differences. For example, truncating the y-axis can make small differences appear large.

Example: A bar chart comparing the number of murders in the U.S. for 2018 and 2019 may exaggerate the difference if the y-axis does not start at zero.

Formula for Percent Change:

Additional info: Always check axis scales and context when interpreting graphs.

Histograms and Density Estimation

Frequency and Density Histograms

Histograms are used to display the distribution of a dataset. There are two main types:

  • Frequency Histogram: Shows the count of data points within each bin.

  • Density Histogram: Adjusts the height of bars so that the total area equals 1, representing probability density.

Example: For bins (0, 0.1), (0.1, 0.2), ..., with sample size 30, a density histogram would have bar heights such that the sum of the areas equals 1.

Area Under Density Histogram: The total area under the histogram equals 1.

Estimating Probabilities: To estimate the probability that a value falls within a range, sum the areas of the bars covering that range.

Additional info: Density histograms are useful for approximating probabilities and visualizing continuous distributions.

Normal Distribution and Percentiles

Properties and Applications

The normal distribution is a bell-shaped curve commonly used in business statistics to model continuous data.

  • Mean (): The average value.

  • Standard Deviation (): Measures the spread of the data.

  • Percentile: The value below which a given percentage of observations fall.

Standard Normal Distribution:

Z-Score Formula:

Percentile Calculation: Use the cumulative distribution function (CDF) to find the percentile:

Example: If birthweights are normally distributed with mean 7 lbs and standard deviation 1.1 lbs, the 10th percentile is found by solving .

Additional info: Percentiles are used for decision-making, such as determining bonuses or terminations based on performance scores.

Five-Number Summary and Interquartile Range (IQR)

Descriptive Statistics

The five-number summary provides a quick overview of the distribution of a dataset:

  • Minimum

  • First Quartile (Q1)

  • Median (Q2)

  • Third Quartile (Q3)

  • Maximum

Interquartile Range (IQR):

Outlier Detection: Values outside are considered outliers.

Additional info: The five-number summary is useful for summarizing data and identifying outliers.

Association and Contingency Tables

Two-Way Tables and Conditional Probability

Contingency tables summarize the relationship between two categorical variables. They are used to compute conditional probabilities and assess association.

  • Conditional Probability:

  • Rate Ratio: Compares the probability of an event under two conditions.

  • Lift: Measures the increase in probability of an event due to another event.

Example Table:

Response

Success

Failure

Drug A

48

42

Drug B

158

42

Rate Ratio Formula:

Lift Formula:

Additional info: Association is present if the rate ratio or lift is significantly different from 1.

Standardization and Z-Scores

Handicapping and Fairness

Standardization transforms scores to a common scale, allowing fair comparison across different groups.

  • Z-Score: Measures how many standard deviations a value is from the mean.

  • Handicapping: Adjusts scores to account for differences in ability or conditions.

Z-Score Formula:

Application: In tournaments, z-scores can be used to compare performance fairly across players with different averages and standard deviations.

Additional info: Standardization is essential for fairness in comparisons and for combining data from different sources.

Pivot Tables and Data Summarization

Excel Tools for Business Statistics

Pivot tables are powerful tools for summarizing and analyzing large datasets. They can be used to create frequency tables, compute conditional distributions, and visualize associations.

  • Two-Way Table: Shows counts or frequencies for combinations of two categorical variables.

  • Conditional Distribution: Distribution of one variable given the value of another.

Example: Use a pivot table to summarize the purchase of Beer and Cauliflower, and compute conditional probabilities.

Additional info: Pivot tables are widely used in business analytics for data exploration and reporting.

Summary Table: Key Statistical Concepts

Concept

Definition

Formula

Percent Change

Change in value as a percentage

Z-Score

Standardized value

IQR

Interquartile Range

Rate Ratio

Relative probability

Lift

Association measure

Additional info: These formulas are foundational for business statistics and data analysis.

Pearson Logo

Study Prep