Skip to main content
Back

Describing and Comparing Distributions Using Stemplots

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Describing and Comparing Distributions

Introduction to Data Collection

In statistics, collecting and analyzing data is essential for understanding patterns and making informed decisions. One common classroom example is recording the number of pairs of shoes owned by students, which serves as a sample for larger populations.

  • Data Collection: Students record their responses (e.g., number of pairs of shoes) on a whiteboard or data sheet.

  • Variable Type: The variable "number of pairs of shoes" is quantitative because it represents numerical values, not categories.

  • Example: If students report 3, 5, 7, and 10 pairs, these are quantitative data points.

Stemplots (Stem-and-Leaf Plots)

A stemplot is a graphical method used to display quantitative data, particularly useful for small to moderate-sized data sets. It helps visualize the distribution, center, and spread of the data.

  • Stem: Represents all but the final digit of each data value.

  • Leaf: The last digit of each data value.

  • Key: Always include a key to explain how to read the stemplot (e.g., "3 | 2 = 32 pairs of shoes").

  • Include Empty Stems: Even if a stem has no leaves, include it to show gaps in the data.

  • Orientation: Stemplots are often rotated 90° counterclockwise to resemble a histogram (dotplot).

Example Stemplot:

Stem

Leaves

0

5 7

1

0 2 2 3 4 5 5 5 6 7

2

0 2 3 5 5 6 7

3

2

Key: 3 | 2 = 32 pairs of shoes

Describing Distributions

When describing a distribution, consider the following characteristics:

  1. Shape: Is the distribution symmetric, skewed left, or skewed right?

  2. Center: What is the typical or middle value? (e.g., mean or median)

  3. Variability (Spread): How spread out are the data? (e.g., range, interquartile range)

  4. Outliers: Are there any unusually high or low values?

  • Skewed Right: Most data are on the lower end, with a tail extending to the right (higher values).

  • Skewed Left: Most data are on the higher end, with a tail extending to the left (lower values).

  • Symmetric: Data are evenly distributed around the center.

Example: In the shoe data, if most students have between 10 and 20 pairs, but a few have 30 or more, the distribution is skewed right.

Identifying Outliers

Outliers are data points that are significantly higher or lower than the rest of the data. They can be identified visually in a stemplot or by using mathematical rules (e.g., values more than 1.5 times the interquartile range above the third quartile or below the first quartile).

  • Example: If most students have fewer than 20 pairs of shoes, but one student has 50, 50 is a possible outlier.

Splitting Stems

To provide a clearer picture of the distribution, especially when data are clustered, stems can be split. For example, a stem of '2' can be split into '2L' (0-4) and '2H' (5-9).

  • Example: Splitting the stem '0' into '0L' (0-4) and '0H' (5-9) can show more detail in the distribution.

  • Effect: Splitting stems does not change the overall shape (e.g., still skewed right), but makes the distribution clearer.

Comparing Distributions: Back-to-Back Stemplots

Back-to-back stemplots are used to compare two related distributions, such as the percent of people wearing seat belts in states with different laws.

  • Purpose: To visually compare the shape, center, and spread of two groups.

  • Example: Comparing primary enforcement states vs. secondary enforcement states for seat belt usage.

Primary Enforcement

Stem

Secondary Enforcement

55 56 57

55

54 55 56

60 62 63

60

59 60 61

70 72 74

70

68 70 71

Key: 65 | 2 = 65.2% seat belt usage

Summary Table: Describing Distributions

Characteristic

Description

Example

Shape

Skewed left, skewed right, symmetric

Skewed right: shoe data with a few high values

Center

Typical or middle value (mean, median)

Median number of shoes

Variability

How spread out the data are (range, IQR)

Range from 5 to 50 pairs

Outliers

Unusually high or low values

One student with 50 pairs

Key Formulas

  • Mean:

  • Median: Middle value when data are ordered

  • Range:

  • Interquartile Range (IQR):

Applications

  • Stemplots: Useful for small data sets to quickly visualize distribution.

  • Back-to-Back Stemplots: Effective for comparing two groups, such as different states or classes.

  • Describing Distributions: Essential for summarizing data and making comparisons in research and real-world contexts.

Additional info: Mathematical rules for identifying outliers (such as the 1.5*IQR rule) will be covered in more detail in later lessons.

Pearson Logo

Study Prep