Skip to main content
Back

Chapter 1: Data and Decisions – Foundations of Statistical Thinking

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Data and Decisions

Introduction to Data in Business and Statistics

Modern organizations rely on data to inform planning, improve efficiency, and enhance quality. Data are systematically collected, stored, and analyzed to support decision-making processes. Understanding the nature and context of data is fundamental to statistical analysis.

What Are Data?

Definition and Context of Data

  • Data are recorded values—numbers, names, or other labels—collected about subjects of interest.

  • All data have a context, described by the "Five W's": Who (cases), What (variables), When, Where, and Why. Sometimes How is also included.

  • Data are often organized in data tables, where rows represent cases and columns represent variables.

Example of a data table with cases and variables

Transactional data are collected from business transactions and stored in digital repositories called data warehouses. The process of extracting useful information from these data is known as data mining or predictive analytics. Business analytics refers to any statistical analysis used to drive business decisions.

Cases and Variables

  • Cases are the subjects (people, animals, objects, etc.) about whom or which data are collected.

  • Examples of cases include respondents (survey participants), subjects/participants (experiment participants), and experimental units (non-human subjects).

  • Variables are characteristics recorded about each case, typically shown as columns in a data table.

Variables as columns in a data table

Metadata and Data Organization

  • Metadata provide information about how, when, and where data were collected, who each case represents, and definitions of variables.

  • Data are often stored in spreadsheets, but complex datasets may use relational databases—multiple linked tables, each about a specific set of cases and variables.

Example of a relational database with customer, item, and transaction tables

Example: Credit Card Bank

  • Cases: Individual customers of a credit card bank.

  • Variables: Account ID, Pre Spending, Post Spending, Age, Segment, Enroll?, Offer, Segment Spend.

  • Context: Data from internal records over 6 months (3 months before and after a marketing offer).

Types of Variables

Categorical (Qualitative) Variables

Categorical variables name categories and answer questions about how cases fall into those categories. They may be descriptive (e.g., type of advertising) or coded numerically (e.g., zip code).

  • Examples: Yes/No responses, class standing (Freshman, Sophomore, etc.), satisfaction ratings.

Examples of categorical survey questions and responses

Quantitative Variables

Quantitative variables have measured numerical values with units, indicating how much or how many of something is measured.

  • Units are essential for interpreting quantitative variables (e.g., dollars, years, kilograms).

  • Examples: Age (in years), spending amount (in dollars).

Variables with Dual Roles

Some variables can be either categorical or quantitative, depending on the context and purpose of analysis. For example, Age can be measured in years (quantitative) or classified into groups (categorical).

Identifier Variables

  • Identifier variables uniquely identify each case (e.g., Social Security Number, Student ID).

  • They are a special type of categorical variable, have no units, and are not analyzed statistically but are useful for linking data.

Other Data Types: Ordinal and Nominal Variables

  • Ordinal variables have ordered categories (e.g., employee rank, satisfaction level).

  • Nominal variables are categorical variables without order (e.g., gender, color).

Cross-Sectional and Time Series Data

  • Time series data are measured at regular intervals over time (e.g., monthly sales).

  • Cross-sectional data are measured at a single point in time across multiple cases (e.g., sales revenue for all stores in one month).

Data Sources: Where, How, and When

Importance of Data Collection Context

  • When and where data are collected can affect their meaning and interpretation.

  • How data are collected (experiment, survey, observational study) determines the validity of inferences.

  • Data can be obtained from experiments, surveys, public/private agencies, or online sources.

Example: Credit Card Bank Data Collection

  • Carly's data: Transactional data (not a survey or experiment).

  • Ying Mei's data: Designed survey.

  • Gregg's data: Designed experiment (random assignment of offers).

Best Practices and Cautions in Data Analysis

  • Always consider the context (the W's) before analyzing data.

  • Do not assume a variable is quantitative just because it is numeric; context determines its role.

  • Be skeptical of data sources and question the representativeness and intent behind data collection.

Summary Table: Types of Variables

Variable Type

Description

Examples

Categorical (Nominal)

Names categories without order

Gender, Color, Zip Code

Categorical (Ordinal)

Names categories with order

Class standing, Satisfaction level

Quantitative

Numerical values with units

Age (years), Income ($)

Identifier

Unique label for each case

Student ID, SSN

Key Takeaways

  • Data are values with context; understanding the W's is essential for meaningful analysis.

  • Variables can be categorical, quantitative, ordinal, nominal, or identifiers—classification depends on context and purpose.

  • Data collection methods and context influence the validity and interpretation of statistical results.

Pearson Logo

Study Prep