Introduction to Statistics: Exam 1 Study Notes (Lectures 1–6)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

What is Statistics?

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data. It provides methods for making sense of data and drawing conclusions about populations based on samples.

Descriptive Statistics: Summarizes data using numbers and graphs.
Inferential Statistics: Uses sample data to make inferences about a population.

Types of Data and Variables

Definitions and Classifications

Understanding the types of data and variables is fundamental in statistics.

Data: The pieces of information you record.
Population: The entire group of individuals or items of interest.
Sample: A subset of the population from which data are actually obtained.
Observation/Subject: An individual we gather data from.
Variable: A characteristic measured about the subjects in the population or sample.
Numerical Variable: A variable that can be measured or counted (quantitative).
Categorical Variable: A variable that can be placed into categories (qualitative).

Example: In a crash-test dummy study, the variables might include car make, model, number of doors (categorical), and weight, head injury (numerical).

Tables: Organizing Data

Tables are used to organize and summarize data. Below is an example of a data table from a crash-test study:

Make	Model	Doors	Weight	Head Injury
Acura	Integra	2	2630	999
Chevrolet	Camaro	2	3518	594
Chevrolet	S-10 Blazer 4X4	2	4200	834
Ford	Escort	2	2290	851
Ford	Taurus	4	3385	650
Hyundai	Excel	4	2090	1182
Mazda	626	4	2990	846
Volkswagen	Passat	4	2490	1138
Toyota	Tercel	2	2120	1138

Additional info: This table is used to illustrate the difference between categorical and numerical variables.

Organizing Categorical Data

Stacked vs. Unstacked Data

Data can be organized in different formats:

Stacked Data: Each row represents an individual and the columns represent all variables measured for that individual.
Unstacked Data: Data is grouped by one variable in one column and all data for another variable in another column.

Frequency and Two-Way Tables

Tables are essential for summarizing categorical data and exploring relationships between variables.

Frequency Table: Shows all possible outcomes for a variable and how frequently each occurred.
Two-Way Table (Contingency Table): Displays all possible outcomes of two categorical variables and the frequency (or relative frequency) of each outcome.

Example: Survey data on seatbelt use by gender:

	Male	Female
Not Always	2	3
Always	3	7

Relative Frequency: The proportion of observations in a particular category, calculated as:

Population Proportion:
Sample Proportion:
Percent: Relative frequency expressed as a percentage.

Collecting Data and Causality

Types of Studies

Controlled Experiment: The researcher assigns subjects to treatment and control groups to study the effect of a variable.
Observational Study: The researcher observes subjects without intervention.

Key Terms:

Treatment Variable: Variable under the control of the researcher.
Response Variable: Variable measured during the experiment.
Confounding Variable: Variable that influences both the treatment and response variables.
Placebo Effect: When subjects improve because they believe they are receiving treatment.
Double-Blind Study: Neither the subject nor the experimenter knows who receives the treatment.

Sampling Methods

Simple Random Sample: Each individual has an equal chance of being selected.
Convenience Sample: Individuals are chosen because they are easy to reach.
Voluntary Response Sample: Individuals self-select to be in the sample.

Examples

Observational Study: Studying cannabis use and IQ in New Zealand adolescents. The treatment variable is cannabis use (categorical), and the study is observational.
Controlled Experiment: Pfizer COVID-19 vaccine trial, where participants are randomly assigned to vaccine or placebo groups. The treatment variable is vaccine (categorical), and the response variable is contracting COVID-19 (categorical).

The Data Cycle

Steps in the Data Cycle

Ask Questions: Formulate a question of interest.
Consider Data: Gather and organize relevant data.
Analyze Data: Use statistical tools to summarize and interpret the data.
Interpret Data: Draw conclusions and make inferences.

Example: Predicting the time until the next eruption of Old Faithful geyser using eruption data. Data is collected, organized in a table, and analyzed using a dotplot.

Length of Eruption (minutes)	Wait Time for Next Eruption (minutes)
1.6	54
3.3	74
2.3	62

Additional info: The data cycle is a framework for conducting statistical investigations, from formulating questions to interpreting results.