BackDescriptive Statistics: Frequency Distributions, Graphs, and Measures of Central Tendency
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Descriptive Statistics
Introduction
Descriptive statistics are methods for summarizing and organizing data so that patterns and key features can be easily understood. This chapter covers frequency distributions, graphical representations of data, and measures of central tendency, which are foundational concepts in statistics.
Frequency Distributions
Definition and Construction
A frequency distribution is a table that displays classes (or intervals) of data entries with a count of the number of entries in each class. The frequency of a class is the number of data entries that fall into that class. Frequency distributions help organize data and reveal patterns.
Class Limits: The smallest and largest data values that can belong to a class.
Class Width: The difference between the upper and lower class limits. Calculated as: (rounded up to the nearest whole number)
Lower Class Limit (LCL): The smallest value in a class.
Upper Class Limit (UCL): The largest value in a class.
Class Boundaries: Numbers that separate classes without forming gaps between them, found by averaging consecutive class limits.
Steps to Construct a Frequency Distribution
Find the range:
Determine the class width.
Find the lower class limits.
Find the upper class limits.
Tally the data into classes.
Count the frequency for each class.
Find the midpoint:
Calculate relative frequency: , where is the class frequency and is the total number of data entries.
Find cumulative frequency: The sum of frequencies for the current and all previous classes.
Example Frequency Distribution Table
Class | Frequency | Midpoint | Relative Frequency | Cumulative Frequency |
|---|---|---|---|---|
14–19 | 1 | 16.5 | 0.07 | 1 |
20–25 | 4 | 22.5 | 0.27 | 5 |
26–31 | 3 | 28.5 | 0.20 | 8 |
32–37 | 3 | 34.5 | 0.20 | 11 |
38–43 | 2 | 40.5 | 0.13 | 13 |
44–49 | 1 | 46.5 | 0.07 | 14 |
50–55 | 2 | 52.5 | 0.13 | 16 |
Additional info: Table values inferred for illustration.
Graphical Representation of Data
Histograms
A histogram is a bar graph that represents the frequency distribution of a data set. The horizontal axis shows the classes, and the vertical axis shows the frequencies. Bars must touch, indicating continuous data.
Relative Frequency Histogram: The vertical axis shows relative frequencies instead of raw counts.
Class Boundaries: Used to avoid gaps between bars.
Frequency Polygon
A frequency polygon is a graph that uses line segments to connect points plotted at the class midpoints and frequencies. It emphasizes the continuous change in frequencies.
Stem-and-Leaf Plot
A stem-and-leaf plot separates each number into a stem (all but the last digit) and a leaf (the last digit). It retains the original data values and is useful for small data sets.
Example: For the data set 82, 85, 95, 91, 73, 70, 75, 78, 82, 97, 59, 65, 89, 72, 71, 67, 55, 94, 91, 80, 54, 73, 51, 80, 77, 92, 66, 50, 75, 76, 82, 90, 81, 53, 78, 74, 78, 81, 85, 76, the stem-and-leaf plot would organize values by tens (stems) and units (leaves).
Dot Plot
A dot plot displays each data entry as a point above a horizontal axis. It is useful for visualizing frequency and distribution for small data sets.
Pie Chart
A pie chart presents qualitative data graphically as percents of a whole. Each sector's area is proportional to the frequency of each category.
Pareto Chart
A Pareto chart is a bar graph for qualitative data, with bars arranged in order of decreasing height. It is used to highlight the most significant categories.
Scatter Plot
A scatter plot displays ordered pairs as points in a coordinate plane, showing the relationship between two quantitative variables. It is useful for identifying correlations.
Example: Fisher's Iris data set plots petal length vs. petal width for different species of iris.
Time Series
A time series consists of quantitative entries taken at regular intervals over time. It is used to analyze trends and patterns.
Year | Degrees (thousands) |
|---|---|
2012 | 93.1 |
2013 | 98.7 |
2014 | 103.0 |
2015 | 109.0 |
2016 | 115.1 |
2017 | 123.9 |
2018 | 133.8 |
2019 | 140.7 |
Measures of Central Tendency
Mean, Median, and Mode
Measures of central tendency describe the center of a data set. The most common are the mean, median, and mode.
Population Mean: Where is the population mean, is the sum of all values, and is the number of values in the population.
Sample Mean: Where is the sample mean, is the sum of all sample values, and is the sample size.
Median: The middle value when data is ordered. If even number of entries, median is the mean of the two middle values.
Mode: The value(s) that occur most frequently. A data set may have no mode, one mode, or multiple modes.
Example Calculation
Data: 75, 67, 80, 76, 84, 90, 89, 75, 80, 83, 89, 62, 79, 81, 78
Mean:
Median: Arrange data in order and find the middle value.
Mode: Identify the value(s) that appear most often.
Outliers
An outlier is a data entry that is far removed from other entries. Outliers can greatly affect the mean, making it less representative of the data set's center.
Misleading Graphs
Graphical Integrity
Graphs can be misleading if scales are manipulated or if visual representations exaggerate differences. Always check axis scales and graphical proportions.
Example: Bar heights may exaggerate differences if the vertical axis does not start at zero.
Example: 3D effects or disproportionate images can distort perception of data.
Summary Table: Types of Graphs and Their Uses
Graph Type | Data Type | Main Purpose |
|---|---|---|
Histogram | Quantitative | Show frequency distribution |
Frequency Polygon | Quantitative | Show continuous change in frequency |
Stem-and-Leaf Plot | Quantitative | Retain original data values |
Dot Plot | Quantitative | Visualize frequency for small data sets |
Pie Chart | Qualitative | Show proportions of categories |
Pareto Chart | Qualitative | Highlight most significant categories |
Scatter Plot | Quantitative (paired) | Show relationship between variables |
Time Series | Quantitative (over time) | Analyze trends and patterns |
Key Takeaways
Descriptive statistics organize and summarize data for easier interpretation.
Frequency distributions and graphs reveal patterns and relationships in data.
Measures of central tendency (mean, median, mode) describe the center of a data set.
Outliers and misleading graphs can distort statistical conclusions.