BackChapter 2: Displays and Summaries – STAT 201 Elementary Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Displays and Summaries in Statistics
Introduction
This chapter introduces fundamental concepts for summarizing and displaying data in statistics. It covers types of variables, tables and graphs, measures of center, position, and variability, as well as the characteristics of distributions and the Empirical Rule.
Types of Variables
Categorical vs. Quantitative Variables
Categorical Variables: Variables that represent categories or groups (e.g., classification, residence type).
Quantitative Variables: Variables that represent numerical values. These can be further classified as:
Discrete Quantitative: Countable values (e.g., number of classes).
Continuous Quantitative: Measurable values that can take any value within a range (e.g., GPA).
Tables and Graphs
Tables for Categorical Variables
Tables are used to summarize categorical data by showing the frequency of each category.
Frequency Table: Shows the count of each category.
Contingency Table: Used when working with two categorical variables, showing the frequency for each combination.
Classification | Frequency |
|---|---|
FR | 4 |
JR | 5 |
SO | 3 |
SR | 6 |
FR | JR | SO | SR | Total | |
|---|---|---|---|---|---|
OFF | 1 | 3 | 1 | 4 | 9 |
ON | 3 | 2 | 2 | 2 | 9 |
Total | 4 | 5 | 3 | 6 | 18 |
Graphs for Categorical Variables
Pie Chart: Displays proportions of categories as slices of a circle.
Bar Graph: Uses bars to show the frequency of each category. Bars do not touch and the axis is not a number line.
Pareto Chart: A bar graph with categories ordered by descending frequency.
Tables for Quantitative Variables
Frequency tables for quantitative variables summarize how often each value or range of values occurs.
Number of Classes Fall 2022 | Frequency | Relative Frequency | Percent of Total | Cumulative Percent of Total |
|---|---|---|---|---|
2 | 1 | 0.0556 | 5.56 | 5.56 |
3 | 2 | 0.1667 | 16.67 | 22.22 |
4 | 7 | 0.3889 | 38.89 | 61.11 |
5 | 8 | 0.4444 | 44.44 | 100 |
Frequency: Number of values in the range.
Relative Frequency: Proportion of values for a given range ().
Percent of Total: Relative frequency as a percentage ().
Cumulative Percent: Running total of percentages up to a given value.
Graphs for Quantitative Variables
Histogram: Most common; bars touch and the axis is a number line.
Dotplot: Works well for small datasets with limited range.
Stemplot (Stem-and-Leaf Plot): Displays data where each value is split into a "stem" and "leaf"; useful for small datasets.
Outliers and Resistance
Definition of Outlier
An outlier is a value that lies an abnormal distance from other values in a dataset.
Outliers can be unusually large or small compared to the rest of the data.
Resistance of Measurements
Non-resistant: Measurements greatly affected by outliers (e.g., mean).
Resistant: Measurements not greatly affected by outliers (e.g., median, mode).
Measures of Center
Mean, Median, and Mode
Mean: The arithmetic average of the dataset.
Formula:
Median: The middle value when data is ordered. If is even, average the two middle values.
Mode: The most common value in the dataset.
Example: For the dataset 6, 8, 14, 18, 23, 30, the mean, median, and mode can be calculated as above.
Measures of Position
Quartiles and Five Number Summary
Quartiles: Divide data into four equal parts.
First Quartile (Q1): 25% of data falls below this value.
Second Quartile (Q2): Median; 50% of data falls below this value.
Third Quartile (Q3): 75% of data falls below this value.
Five Number Summary: Minimum, Q1, Median (Q2), Q3, Maximum.
Minimum | Q1 | Median (Q2) | Q3 | Maximum |
|---|---|---|---|---|
6 | 8 | 14 | 23 | 30 |
Box-and-Whisker Plot: Visualizes the five number summary, showing the spread and center of the data.
Measures of Variability
Range, Interquartile Range, and Standard Deviation
Range: Difference between maximum and minimum values.
Formula:
Interquartile Range (IQR): Difference between Q3 and Q1.
Formula:
Standard Deviation: Measures the typical distance of data points from the mean.
Sample Standard Deviation Formula:
Example: For the dataset 6, 8, 14, 18, 23, 30, calculate the mean, then the squared deviations, sum them, divide by , and take the square root.
Characteristics of Distributions
Shape of Distributions
Unimodal: One peak.
Bimodal: Two peaks.
Symmetric: Both sides are mirror images.
Skewed Left (Negatively Skewed): Majority of data is in the upper portion; tail is to the left.
Skewed Right (Positively Skewed): Majority of data is in the lower portion; tail is to the right.
Mean and Median in Skewed Distributions:
Left Skewed:
Right Skewed:
Z-Scores and Outliers
Z-Score Calculation
Z-Score: Indicates how many standard deviations a value is from the mean.
Formula:
A z-score within -3 and 3 is not considered an outlier; outside this range, it is considered unusual.
Example: If the mean is 22.25 and the standard deviation is 32.15, a value of 6 yields .
The Empirical Rule
Empirical (68-95-99.7) Rule
For symmetric, bell-shaped (normal) distributions:
Approximately 68% of data falls within one standard deviation of the mean.
Approximately 95% within two standard deviations.
Approximately 99.7% within three standard deviations.
Application: If the mean is $100:
68% of data falls between $50.
95% falls between $0.
99.7% falls between and $250$.
Summary of Data Display Methods
Graphs
Categorical Variables: Pie chart, bar graph, Pareto chart.
Quantitative Variables: Histogram, dot plot, stem-and-leaf plot.
Tables
Categorical Variables: Summary table, contingency table.
Quantitative Variables: Frequency table.
Summary Statistics
Measures of center: mean, median, mode.
Measures of position: quartiles, five number summary.
Measures of variability: range, IQR, standard deviation.
Additional info:
Some context and examples were inferred to clarify definitions and applications.
Tables were recreated and expanded for clarity.