BackIntroduction to Statistics: Exam 1 Study Notes (Lectures 1–6)
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
What is Statistics?
Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data. It provides methods for making sense of data and drawing conclusions about populations based on samples.
Descriptive Statistics: Summarizes data using numbers and graphs.
Inferential Statistics: Uses sample data to make inferences about a population.
Types of Data and Variables
Definitions and Classifications
Understanding the types of data and variables is fundamental in statistics.
Data: The pieces of information you record.
Population: The entire group of individuals or items of interest.
Sample: A subset of the population from which data are actually obtained.
Observation/Subject: An individual we gather data from.
Variable: A characteristic measured about the subjects in the population or sample.
Numerical Variable: A variable that can be measured or counted (quantitative).
Categorical Variable: A variable that can be placed into categories (qualitative).
Example: In a crash-test dummy study, the variables might include car make, model, number of doors (categorical), and weight, head injury (numerical).
Tables: Organizing Data
Tables are used to organize and summarize data. Below is an example of a data table from a crash-test study:
Make | Model | Doors | Weight | Head Injury |
|---|---|---|---|---|
Acura | Integra | 2 | 2630 | 999 |
Chevrolet | Camaro | 2 | 3518 | 594 |
Chevrolet | S-10 Blazer 4X4 | 2 | 4200 | 834 |
Ford | Escort | 2 | 2290 | 851 |
Ford | Taurus | 4 | 3385 | 650 |
Hyundai | Excel | 4 | 2090 | 1182 |
Mazda | 626 | 4 | 2990 | 846 |
Volkswagen | Passat | 4 | 2490 | 1138 |
Toyota | Tercel | 2 | 2120 | 1138 |
Additional info: This table is used to illustrate the difference between categorical and numerical variables.
Organizing Categorical Data
Stacked vs. Unstacked Data
Data can be organized in different formats:
Stacked Data: Each row represents an individual and the columns represent all variables measured for that individual.
Unstacked Data: Data is grouped by one variable in one column and all data for another variable in another column.
Frequency and Two-Way Tables
Tables are essential for summarizing categorical data and exploring relationships between variables.
Frequency Table: Shows all possible outcomes for a variable and how frequently each occurred.
Two-Way Table (Contingency Table): Displays all possible outcomes of two categorical variables and the frequency (or relative frequency) of each outcome.
Example: Survey data on seatbelt use by gender:
Male | Female | |
|---|---|---|
Not Always | 2 | 3 |
Always | 3 | 7 |
Relative Frequency: The proportion of observations in a particular category, calculated as:
Population Proportion:
Sample Proportion:
Percent: Relative frequency expressed as a percentage.
Collecting Data and Causality
Types of Studies
Controlled Experiment: The researcher assigns subjects to treatment and control groups to study the effect of a variable.
Observational Study: The researcher observes subjects without intervention.
Key Terms:
Treatment Variable: Variable under the control of the researcher.
Response Variable: Variable measured during the experiment.
Confounding Variable: Variable that influences both the treatment and response variables.
Placebo Effect: When subjects improve because they believe they are receiving treatment.
Double-Blind Study: Neither the subject nor the experimenter knows who receives the treatment.
Sampling Methods
Simple Random Sample: Each individual has an equal chance of being selected.
Convenience Sample: Individuals are chosen because they are easy to reach.
Voluntary Response Sample: Individuals self-select to be in the sample.
Examples
Observational Study: Studying cannabis use and IQ in New Zealand adolescents. The treatment variable is cannabis use (categorical), and the study is observational.
Controlled Experiment: Pfizer COVID-19 vaccine trial, where participants are randomly assigned to vaccine or placebo groups. The treatment variable is vaccine (categorical), and the response variable is contracting COVID-19 (categorical).
The Data Cycle
Steps in the Data Cycle
Ask Questions: Formulate a question of interest.
Consider Data: Gather and organize relevant data.
Analyze Data: Use statistical tools to summarize and interpret the data.
Interpret Data: Draw conclusions and make inferences.
Example: Predicting the time until the next eruption of Old Faithful geyser using eruption data. Data is collected, organized in a table, and analyzed using a dotplot.
Length of Eruption (minutes) | Wait Time for Next Eruption (minutes) |
|---|---|
1.6 | 54 |
3.3 | 74 |
2.3 | 62 |
Additional info: The data cycle is a framework for conducting statistical investigations, from formulating questions to interpreting results.