BackStatistics Study Guide: Data Types, Sampling, Descriptive Statistics, and Data Visualization
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Data Types and Measurement Scales
Qualitative and Quantitative Variables
Variables in statistics are classified based on their nature and the type of data they represent.
Qualitative (Categorical) Variables: Describe qualities or categories (e.g., brand of cell phone, color).
Quantitative Variables: Represent numerical values and can be measured.
Quantitative Discrete: Countable values (e.g., number of messages sent).
Quantitative Continuous: Measurable values within a range (e.g., monthly cell bill).
Example: The number of text messages sent in one month is a quantitative discrete variable.
Levels of Measurement
Variables can be measured at different levels:
Nominal: Categories without order (e.g., cell phone brand).
Ordinal: Categories with a meaningful order (e.g., rating stars).
Interval: Ordered, equal intervals, no true zero (e.g., temperature).
Ratio: Ordered, equal intervals, true zero (e.g., weight).
Example: The actual weight of cereal in a box is a ratio variable.
Descriptive and Inferential Statistics
Parameters vs. Statistics
A parameter describes a characteristic of a population, while a statistic describes a characteristic of a sample.
Parameter: The average salary of all employees at a company.
Statistic: The average salary of a sample of employees.
Experimental and Observational Studies
Studies can be classified as:
Experimental Study: Researcher manipulates variables (e.g., dividing patients into treatment and placebo groups).
Observational Study: Researcher observes without intervention (e.g., comparing cancer rates in different populations).
Sampling Methods
Types of Sampling
Sampling is the process of selecting a subset of individuals from a population.
Simple Random Sampling: Every member has an equal chance of selection.
Stratified Sampling: Population divided into subgroups (strata), samples taken from each.
Cluster Sampling: Population divided into clusters, some clusters are randomly selected.
Systematic Sampling: Every nth member is selected.
Convenience Sampling: Sample is taken from easily accessible members.
Example: Selecting every 5th cereal box from a shelf is systematic sampling.
Descriptive Statistics: Measures of Central Tendency and Spread
Mean, Median, and Mode
These are measures of central tendency:
Mean (μ or x̄): The average value.
Median: The middle value when data is ordered.
Mode: The most frequently occurring value.
Formula for Mean:
(population mean)
(sample mean)
Standard Deviation and Variance
These measure the spread of data:
Standard Deviation (σ for population, s for sample): Measures average distance from the mean.
Variance: The square of the standard deviation.
Formulas:
Population standard deviation:
Sample standard deviation:
Population variance:
Sample variance:
Range, Interquartile Range, and Outliers
Range: Difference between maximum and minimum values.
Interquartile Range (IQR):
Outliers: Data points outside or
Five Number Summary
Minimum
First Quartile ()
Median ()
Third Quartile ()
Maximum
Frequency Distributions and Data Visualization
Frequency Tables
Frequency tables summarize data by showing the number of occurrences for each category or interval.
Example Table: Frequency Distribution of Political Affiliation
Political Affiliation | Frequency |
|---|---|
D | 5 |
R | 4 |
I | 3 |
Additional info: Frequencies inferred from visible data. |
Histograms, Bar Graphs, Pie Charts, and Dot Plots
Histogram: Displays frequency of data within intervals (useful for continuous data).
Bar Graph: Compares frequencies of categorical data.
Pie Chart: Shows proportions of categories as slices of a circle.
Dot Plot: Each data point is shown as a dot above its value on a number line.
Example Table: Pie Chart Data for Road Construction Funding
Response | Relative Frequency | Frequency |
|---|---|---|
New Tolls | 51% | Additional info: Frequency not specified |
No New Roads | 34% | Additional info: Frequency not specified |
Increase Gas Tax | 15% | Additional info: Frequency not specified |
Descriptive Statistics from Grouped Data
Frequency Distribution and Histogram
Grouped data can be summarized using class intervals, midpoints, and frequencies.
Relative Frequency: Proportion of total observations in each class.
Cumulative Frequency: Running total of frequencies up to each class.
Example Table: Births by Age of Mother
Age of Mother (yrs) | Midpoints | Births (Frequency) | Relative Frequency | Cumulative Frequency |
|---|---|---|---|---|
10-14.99 | 12.5 | 10 | Additional info: 0.005 | 10 |
15-19.99 | 17.5 | 400 | Additional info: 0.2 | 410 |
20-24.99 | 22.5 | 1050 | Additional info: 0.525 | 1460 |
25-29.99 | 27.5 | 1200 | Additional info: 0.6 | 2660 |
30-34.99 | 32.5 | 500 | Additional info: 0.25 | 3160 |
35-39.99 | 37.5 | 100 | Additional info: 0.05 | 3260 |
40-44.99 | 42.5 | 100 | Additional info: 0.05 | 3360 |
Boxplots and Outlier Detection
Boxplot Construction
Boxplots visually display the five number summary and help identify outliers.
Draw a box from to with a line at the median.
Whiskers extend to minimum and maximum values within 1.5 × IQR.
Points outside whiskers are outliers.
Z-Scores and Standardization
Calculating Z-Scores
A z-score indicates how many standard deviations a value is from the mean.
Formula:
Example: For a female with weight 160 lbs, mean 155 lbs, standard deviation 50 lbs:
Data Analysis Examples
Descriptive Statistics for Egg Weights
Statistic | Value |
|---|---|
Mean | 1.615 |
Median | 1.6 |
Mode | 1.6 |
Standard Deviation | 0.06514 |
Sample Variance | 0.004245 |
Range | 0.27 |
Minimum | 1.47 |
Maximum | 1.74 |
Additional info: Skewness and kurtosis values indicate the shape of the distribution.
Summary
Classify variables and understand measurement scales.
Distinguish between parameters and statistics.
Apply appropriate sampling methods.
Calculate and interpret mean, median, mode, standard deviation, variance, range, IQR, and z-scores.
Construct and interpret frequency tables, histograms, bar graphs, pie charts, dot plots, and boxplots.
Analyze grouped data and use descriptive statistics for data interpretation.