Skip to main content
Back

Defining and Collecting Data: Foundations of Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Defining and Collecting Data

Objectives of the Chapter

This chapter introduces foundational concepts in business statistics, focusing on the definition and classification of variables, measurement scales, data collection methods, sampling techniques, data preparation, and survey errors. These concepts are essential for understanding how to gather and analyze data for business decision-making.

  • Defining Variables: Learn how to identify and define different types of variables.

  • Measurement Scales: Understand the scales used to measure variables.

  • Data Collection: Explore various methods for collecting data.

  • Sampling: Identify ways to collect samples and understand sampling issues.

  • Data Preparation: Learn about data cleaning and preprocessing.

  • Survey Errors: Recognize types of errors that can occur in surveys.

Classifying Variables By Type

Categorical vs. Numerical Variables

Variables are the characteristics or properties that are measured or observed in a study. They are classified into two main types: categorical and numerical.

  • Categorical (Qualitative) Variables: Take on values that are categories, such as "yes", "no", or colors like "blue", "brown", "green". These variables describe qualities or attributes.

  • Numerical (Quantitative) Variables: Represent quantities that can be counted or measured. These are further divided into:

    • Discrete Variables: Arise from a counting process (e.g., number of text messages sent).

    • Continuous Variables: Arise from a measuring process (e.g., time taken to download an app).

Examples of Types of Variables

Question

Responses

Variable Type

Do you have a Facebook?

Yes or No

Categorical

How many text messages did you send in the past 7 days?

Numerical value

Numerical (Discrete)

How long did the mobile update take to download?

Numerical value

Numerical (Continuous)

Measurement Scales

Types of Measurement Scales

Measurement scales determine how variables are categorized, ordered, and quantified. There are four main types:

  • Nominal Scale: Classifies data into distinct categories with no implied ranking.

  • Ordinal Scale: Classifies data into distinct categories with an implied order or ranking.

  • Interval Scale: An ordered scale where the difference between measurements is meaningful, but there is no true zero point.

  • Ratio Scale: An ordered scale with meaningful differences and a true zero point.

Examples of Measurement Scales

Scale Type

Variable

Categories/Values

Nominal

Do you have a Facebook profile?

Yes, No

Nominal

Type of investment

Growth, Value, Other

Nominal

Cellular Provider

AT&T, Sprint, Verizon, Other, None

Ordinal

Student class designation

Freshman, Sophomore, Junior, Senior

Ordinal

Product satisfaction

Very unsatisfied, Fairly unsatisfied, Neutral, Fairly satisfied, Very satisfied

Ordinal

Faculty rank

Professor, Associate Professor, Assistant Professor, Instructor

Ordinal

Standard & Poor's bond ratings

AAA, AA, A, BBB, BB, B, CCC, CC, C, DDD, DD, D

Ordinal

Student Grades

A, B, C, D, F

  • Interval Scale Example: Temperature in degrees Celsius or Fahrenheit (no true zero).

  • Ratio Scale Example: Height in inches or centimeters, weight in pounds or kilograms, income in dollars (true zero exists).

Data Collection: Populations and Samples

Population vs. Sample

Data can be collected from an entire population or a sample. Understanding the distinction is crucial for statistical inference.

  • Population: All items or individuals of interest in a study.

  • Sample: A subset of the population, selected for analysis.

Sampling is often preferred because it is less time-consuming, less costly, and more practical than studying the entire population.

  • Population Parameter: A summary measure describing a characteristic of the population.

  • Sample Statistic: A summary measure describing a characteristic of the sample, used to estimate population parameters.

Sources of Data

Primary vs. Secondary Data Sources

Data can originate from various activities and sources, which are classified as primary or secondary.

  • Primary Sources: Data collected directly by the analyst (e.g., surveys, experiments, observations).

  • Secondary Sources: Data collected by others and used for analysis (e.g., census data, published reports).

Examples of Data Collection Methods

  • Ongoing Business Activities: Transaction records, web analytics, economic indicators.

  • Distributed Data: Financial reports, market research, published statistics.

  • Surveys: Consumer preferences, political polls, satisfaction ratings.

  • Designed Experiments: Product testing, material evaluation, market testing.

  • Observational Studies: Focus groups, time measurements, traffic counts.

Sampling Methods

Sampling Frame

The sampling frame is a list of items that make up the population. An accurate frame is essential for unbiased sampling.

  • Frames can be population lists, directories, or maps.

  • Excluding groups from the frame can lead to biased results.

Types of Samples

  • Nonprobability Samples: Items are chosen without regard to their probability of occurrence.

    • Convenience Sampling: Selection based on ease or convenience.

    • Judgment Sampling: Selection based on expert opinion.

  • Probability Samples: Items are chosen based on known probabilities.

    • Simple Random Sample: Every item has an equal chance of selection. Selection can be with or without replacement.

    • Systematic Sample: Select every k-th item after a random start. where is population size and is sample size.

    • Stratified Sample: Divide population into subgroups (strata) and sample proportionally from each.

    • Cluster Sample: Divide population into clusters, randomly select clusters, and sample all or some items within selected clusters.

Comparison of Sampling Methods

Method

Advantages

Disadvantages

Simple Random/Systematic

Simple to use

May not represent all population characteristics

Stratified

Ensures representation across subgroups

Requires knowledge of subgroup membership

Cluster

Cost effective

Less efficient; larger sample needed for precision

Data Preparation and Cleaning

Importance of Data Cleaning

Data cleaning is essential before analysis to correct irregularities and ensure data quality.

  • Invalid Variable Values: Non-numeric data for numeric variables, invalid categories, out-of-range values.

  • Coding Errors: Inconsistent values, case sensitivity, extraneous characters.

  • Integration Errors: Redundant columns, duplicated rows, inconsistent units or scales.

  • Missing Values: Data not collected or absent for certain variables.

Data cleaning can be semi-automated using software tools, but manual review is often necessary. Always preserve the original data for reference.

Other Data Preprocessing Tasks

  • Data Formatting: Rearranging structure or encoding.

  • Stacking/Unstacking Data: Grouping or separating variables for analysis.

  • Recoding Variables: Redefining categories or transforming numerical variables into categorical ones. Ensure new categories are mutually exclusive and collectively exhaustive.

Survey Errors and Ethical Issues

Types of Survey Errors

  • Coverage Error (Selection Bias): Some groups are excluded from the sampling frame.

  • Nonresponse Error: Differences between respondents and non-respondents.

  • Sampling Error: Natural variation between samples.

  • Measurement Error: Poor question design or respondent mistakes.

Ethical Issues in Surveys

  • Coverage and nonresponse errors can be manipulated to bias results.

  • Sampling error may be misrepresented if margins of error are omitted.

  • Measurement error can be introduced by leading questions or intentional respondent deception.

Summary

This chapter covers the essential steps in defining and collecting data for business statistics, including variable classification, measurement scales, data sources, sampling methods, data cleaning, and survey errors. Mastery of these concepts is foundational for accurate and ethical statistical analysis in business contexts.

Pearson Logo

Study Prep