BackChapter 1: Data and Decisions – Foundations of Statistical Thinking
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Data and Decisions
Introduction to Data in Business and Statistics
Modern organizations rely on data to inform planning, improve efficiency, and enhance quality. Data are systematically collected, stored, and analyzed to support decision-making processes. Understanding the nature and context of data is fundamental to statistical analysis.
What Are Data?
Definition and Context of Data
Data are recorded values—numbers, names, or other labels—collected about subjects of interest.
All data have a context, described by the "Five W's": Who (cases), What (variables), When, Where, and Why. Sometimes How is also included.
Data are often organized in data tables, where rows represent cases and columns represent variables.

Transactional data are collected from business transactions and stored in digital repositories called data warehouses. The process of extracting useful information from these data is known as data mining or predictive analytics. Business analytics refers to any statistical analysis used to drive business decisions.
Cases and Variables
Cases are the subjects (people, animals, objects, etc.) about whom or which data are collected.
Examples of cases include respondents (survey participants), subjects/participants (experiment participants), and experimental units (non-human subjects).
Variables are characteristics recorded about each case, typically shown as columns in a data table.

Metadata and Data Organization
Metadata provide information about how, when, and where data were collected, who each case represents, and definitions of variables.
Data are often stored in spreadsheets, but complex datasets may use relational databases—multiple linked tables, each about a specific set of cases and variables.

Example: Credit Card Bank
Cases: Individual customers of a credit card bank.
Variables: Account ID, Pre Spending, Post Spending, Age, Segment, Enroll?, Offer, Segment Spend.
Context: Data from internal records over 6 months (3 months before and after a marketing offer).
Types of Variables
Categorical (Qualitative) Variables
Categorical variables name categories and answer questions about how cases fall into those categories. They may be descriptive (e.g., type of advertising) or coded numerically (e.g., zip code).
Examples: Yes/No responses, class standing (Freshman, Sophomore, etc.), satisfaction ratings.

Quantitative Variables
Quantitative variables have measured numerical values with units, indicating how much or how many of something is measured.
Units are essential for interpreting quantitative variables (e.g., dollars, years, kilograms).
Examples: Age (in years), spending amount (in dollars).
Variables with Dual Roles
Some variables can be either categorical or quantitative, depending on the context and purpose of analysis. For example, Age can be measured in years (quantitative) or classified into groups (categorical).
Identifier Variables
Identifier variables uniquely identify each case (e.g., Social Security Number, Student ID).
They are a special type of categorical variable, have no units, and are not analyzed statistically but are useful for linking data.
Other Data Types: Ordinal and Nominal Variables
Ordinal variables have ordered categories (e.g., employee rank, satisfaction level).
Nominal variables are categorical variables without order (e.g., gender, color).
Cross-Sectional and Time Series Data
Time series data are measured at regular intervals over time (e.g., monthly sales).
Cross-sectional data are measured at a single point in time across multiple cases (e.g., sales revenue for all stores in one month).
Data Sources: Where, How, and When
Importance of Data Collection Context
When and where data are collected can affect their meaning and interpretation.
How data are collected (experiment, survey, observational study) determines the validity of inferences.
Data can be obtained from experiments, surveys, public/private agencies, or online sources.
Example: Credit Card Bank Data Collection
Carly's data: Transactional data (not a survey or experiment).
Ying Mei's data: Designed survey.
Gregg's data: Designed experiment (random assignment of offers).
Best Practices and Cautions in Data Analysis
Always consider the context (the W's) before analyzing data.
Do not assume a variable is quantitative just because it is numeric; context determines its role.
Be skeptical of data sources and question the representativeness and intent behind data collection.
Summary Table: Types of Variables
Variable Type | Description | Examples |
|---|---|---|
Categorical (Nominal) | Names categories without order | Gender, Color, Zip Code |
Categorical (Ordinal) | Names categories with order | Class standing, Satisfaction level |
Quantitative | Numerical values with units | Age (years), Income ($) |
Identifier | Unique label for each case | Student ID, SSN |
Key Takeaways
Data are values with context; understanding the W's is essential for meaningful analysis.
Variables can be categorical, quantitative, ordinal, nominal, or identifiers—classification depends on context and purpose.
Data collection methods and context influence the validity and interpretation of statistical results.