Skip to main content
Back

Introduction to Statistics: Exam 1 Study Notes (Lectures 1–6)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

What is Statistics?

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data. It provides methods for making sense of data and drawing conclusions about populations based on samples.

  • Descriptive Statistics: Summarizes data using numbers and graphs.

  • Inferential Statistics: Uses sample data to make inferences about a population.

Types of Data and Variables

Definitions and Classifications

Understanding the types of data and variables is fundamental in statistics.

  • Data: The pieces of information you record.

  • Population: The entire group of individuals or items of interest.

  • Sample: A subset of the population from which data are actually obtained.

  • Observation/Subject: An individual we gather data from.

  • Variable: A characteristic measured about the subjects in the population or sample.

  • Numerical Variable: A variable that can be measured or counted (quantitative).

  • Categorical Variable: A variable that can be placed into categories (qualitative).

Example: In a crash-test dummy study, the variables might include car make, model, number of doors (categorical), and weight, head injury (numerical).

Tables: Organizing Data

Tables are used to organize and summarize data. Below is an example of a data table from a crash-test study:

Make

Model

Doors

Weight

Head Injury

Acura

Integra

2

2630

999

Chevrolet

Camaro

2

3518

594

Chevrolet

S-10 Blazer 4X4

2

4200

834

Ford

Escort

2

2290

851

Ford

Taurus

4

3385

650

Hyundai

Excel

4

2090

1182

Mazda

626

4

2990

846

Volkswagen

Passat

4

2490

1138

Toyota

Tercel

2

2120

1138

Additional info: This table is used to illustrate the difference between categorical and numerical variables.

Organizing Categorical Data

Stacked vs. Unstacked Data

Data can be organized in different formats:

  • Stacked Data: Each row represents an individual and the columns represent all variables measured for that individual.

  • Unstacked Data: Data is grouped by one variable in one column and all data for another variable in another column.

Frequency and Two-Way Tables

Tables are essential for summarizing categorical data and exploring relationships between variables.

  • Frequency Table: Shows all possible outcomes for a variable and how frequently each occurred.

  • Two-Way Table (Contingency Table): Displays all possible outcomes of two categorical variables and the frequency (or relative frequency) of each outcome.

Example: Survey data on seatbelt use by gender:

Male

Female

Not Always

2

3

Always

3

7

Relative Frequency: The proportion of observations in a particular category, calculated as:

  • Population Proportion:

  • Sample Proportion:

  • Percent: Relative frequency expressed as a percentage.

Collecting Data and Causality

Types of Studies

  • Controlled Experiment: The researcher assigns subjects to treatment and control groups to study the effect of a variable.

  • Observational Study: The researcher observes subjects without intervention.

Key Terms:

  • Treatment Variable: Variable under the control of the researcher.

  • Response Variable: Variable measured during the experiment.

  • Confounding Variable: Variable that influences both the treatment and response variables.

  • Placebo Effect: When subjects improve because they believe they are receiving treatment.

  • Double-Blind Study: Neither the subject nor the experimenter knows who receives the treatment.

Sampling Methods

  • Simple Random Sample: Each individual has an equal chance of being selected.

  • Convenience Sample: Individuals are chosen because they are easy to reach.

  • Voluntary Response Sample: Individuals self-select to be in the sample.

Examples

  • Observational Study: Studying cannabis use and IQ in New Zealand adolescents. The treatment variable is cannabis use (categorical), and the study is observational.

  • Controlled Experiment: Pfizer COVID-19 vaccine trial, where participants are randomly assigned to vaccine or placebo groups. The treatment variable is vaccine (categorical), and the response variable is contracting COVID-19 (categorical).

The Data Cycle

Steps in the Data Cycle

  1. Ask Questions: Formulate a question of interest.

  2. Consider Data: Gather and organize relevant data.

  3. Analyze Data: Use statistical tools to summarize and interpret the data.

  4. Interpret Data: Draw conclusions and make inferences.

Example: Predicting the time until the next eruption of Old Faithful geyser using eruption data. Data is collected, organized in a table, and analyzed using a dotplot.

Length of Eruption (minutes)

Wait Time for Next Eruption (minutes)

1.6

54

3.3

74

2.3

62

Additional info: The data cycle is a framework for conducting statistical investigations, from formulating questions to interpreting results.

Pearson Logo

Study Prep