Chapter 2: Displays and Summaries – STAT 201 Elementary Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Displays and Summaries in Statistics

Introduction

This chapter introduces fundamental concepts for summarizing and displaying data in statistics. It covers types of variables, tables and graphs, measures of center, position, and variability, as well as the characteristics of distributions and the Empirical Rule.

Types of Variables

Categorical vs. Quantitative Variables

Categorical Variables: Variables that represent categories or groups (e.g., classification, residence type).
Quantitative Variables: Variables that represent numerical values. These can be further classified as:
- Discrete Quantitative: Countable values (e.g., number of classes).
- Continuous Quantitative: Measurable values that can take any value within a range (e.g., GPA).

Tables and Graphs

Tables for Categorical Variables

Tables are used to summarize categorical data by showing the frequency of each category.

Frequency Table: Shows the count of each category.
Contingency Table: Used when working with two categorical variables, showing the frequency for each combination.

Classification	Frequency
FR	4
JR	5
SO	3
SR	6

	FR	JR	SO	SR	Total
OFF	1	3	1	4	9
ON	3	2	2	2	9
Total	4	5	3	6	18

Graphs for Categorical Variables

Pie Chart: Displays proportions of categories as slices of a circle.
Bar Graph: Uses bars to show the frequency of each category. Bars do not touch and the axis is not a number line.
Pareto Chart: A bar graph with categories ordered by descending frequency.

Tables for Quantitative Variables

Frequency tables for quantitative variables summarize how often each value or range of values occurs.

Number of Classes Fall 2022	Frequency	Relative Frequency	Percent of Total	Cumulative Percent of Total
2	1	0.0556	5.56	5.56
3	2	0.1667	16.67	22.22
4	7	0.3889	38.89	61.11
5	8	0.4444	44.44	100

Frequency: Number of values in the range.
Relative Frequency: Proportion of values for a given range ().
Percent of Total: Relative frequency as a percentage ().
Cumulative Percent: Running total of percentages up to a given value.

Graphs for Quantitative Variables

Histogram: Most common; bars touch and the axis is a number line.
Dotplot: Works well for small datasets with limited range.
Stemplot (Stem-and-Leaf Plot): Displays data where each value is split into a "stem" and "leaf"; useful for small datasets.

Outliers and Resistance

Definition of Outlier

An outlier is a value that lies an abnormal distance from other values in a dataset.
Outliers can be unusually large or small compared to the rest of the data.

Resistance of Measurements

Non-resistant: Measurements greatly affected by outliers (e.g., mean).
Resistant: Measurements not greatly affected by outliers (e.g., median, mode).

Measures of Center

Mean, Median, and Mode

Mean: The arithmetic average of the dataset.
- Formula:
Median: The middle value when data is ordered. If is even, average the two middle values.
Mode: The most common value in the dataset.

Example: For the dataset 6, 8, 14, 18, 23, 30, the mean, median, and mode can be calculated as above.

Measures of Position

Quartiles and Five Number Summary

Quartiles: Divide data into four equal parts.
- First Quartile (Q1): 25% of data falls below this value.
- Second Quartile (Q2): Median; 50% of data falls below this value.
- Third Quartile (Q3): 75% of data falls below this value.
Five Number Summary: Minimum, Q1, Median (Q2), Q3, Maximum.

Minimum	Q1	Median (Q2)	Q3	Maximum
6	8	14	23	30

Box-and-Whisker Plot: Visualizes the five number summary, showing the spread and center of the data.

Measures of Variability

Range, Interquartile Range, and Standard Deviation

Range: Difference between maximum and minimum values.
- Formula:
Interquartile Range (IQR): Difference between Q3 and Q1.
- Formula:
Standard Deviation: Measures the typical distance of data points from the mean.
- Sample Standard Deviation Formula:

Example: For the dataset 6, 8, 14, 18, 23, 30, calculate the mean, then the squared deviations, sum them, divide by , and take the square root.

Characteristics of Distributions

Shape of Distributions

Unimodal: One peak.
Bimodal: Two peaks.
Symmetric: Both sides are mirror images.
Skewed Left (Negatively Skewed): Majority of data is in the upper portion; tail is to the left.
Skewed Right (Positively Skewed): Majority of data is in the lower portion; tail is to the right.

Mean and Median in Skewed Distributions:

Left Skewed:
Right Skewed:

Z-Scores and Outliers

Z-Score Calculation

Z-Score: Indicates how many standard deviations a value is from the mean.
- Formula:
A z-score within -3 and 3 is not considered an outlier; outside this range, it is considered unusual.

Example: If the mean is 22.25 and the standard deviation is 32.15, a value of 6 yields .

The Empirical Rule

Empirical (68-95-99.7) Rule

For symmetric, bell-shaped (normal) distributions:

Approximately 68% of data falls within one standard deviation of the mean.
Approximately 95% within two standard deviations.
Approximately 99.7% within three standard deviations.

Application: If the mean is $100:

68% of data falls between $50.
95% falls between $0.
99.7% falls between and $250$.

Summary of Data Display Methods

Graphs

Categorical Variables: Pie chart, bar graph, Pareto chart.
Quantitative Variables: Histogram, dot plot, stem-and-leaf plot.

Tables

Categorical Variables: Summary table, contingency table.
Quantitative Variables: Frequency table.

Summary Statistics

Measures of center: mean, median, mode.
Measures of position: quartiles, five number summary.
Measures of variability: range, IQR, standard deviation.

Additional info:

Some context and examples were inferred to clarify definitions and applications.
Tables were recreated and expanded for clarity.