BackDefining and Collecting Data: Foundations of Business Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 1: Defining and Collecting Data
Objectives of the Chapter
This chapter introduces foundational concepts in business statistics, focusing on the definition and classification of variables, measurement scales, data collection methods, sampling techniques, data preparation, and survey errors. These concepts are essential for understanding how to gather and analyze data for business decision-making.
Defining Variables: Learn how to identify and define different types of variables.
Measurement Scales: Understand the scales used to measure variables.
Data Collection: Explore various methods for collecting data.
Sampling: Identify ways to collect samples and understand sampling issues.
Data Preparation: Learn about data cleaning and preprocessing.
Survey Errors: Recognize types of errors that can occur in surveys.
Classifying Variables By Type
Categorical vs. Numerical Variables
Variables are the characteristics or properties that are measured or observed in a study. They are classified into two main types: categorical and numerical.
Categorical (Qualitative) Variables: Take on values that are categories, such as "yes", "no", or colors like "blue", "brown", "green". These variables describe qualities or attributes.
Numerical (Quantitative) Variables: Represent quantities that can be counted or measured. These are further divided into:
Discrete Variables: Arise from a counting process (e.g., number of text messages sent).
Continuous Variables: Arise from a measuring process (e.g., time taken to download an app).
Examples of Types of Variables
Question | Responses | Variable Type |
|---|---|---|
Do you have a Facebook? | Yes or No | Categorical |
How many text messages did you send in the past 7 days? | Numerical value | Numerical (Discrete) |
How long did the mobile update take to download? | Numerical value | Numerical (Continuous) |
Measurement Scales
Types of Measurement Scales
Measurement scales determine how variables are categorized, ordered, and quantified. There are four main types:
Nominal Scale: Classifies data into distinct categories with no implied ranking.
Ordinal Scale: Classifies data into distinct categories with an implied order or ranking.
Interval Scale: An ordered scale where the difference between measurements is meaningful, but there is no true zero point.
Ratio Scale: An ordered scale with meaningful differences and a true zero point.
Examples of Measurement Scales
Scale Type | Variable | Categories/Values |
|---|---|---|
Nominal | Do you have a Facebook profile? | Yes, No |
Nominal | Type of investment | Growth, Value, Other |
Nominal | Cellular Provider | AT&T, Sprint, Verizon, Other, None |
Ordinal | Student class designation | Freshman, Sophomore, Junior, Senior |
Ordinal | Product satisfaction | Very unsatisfied, Fairly unsatisfied, Neutral, Fairly satisfied, Very satisfied |
Ordinal | Faculty rank | Professor, Associate Professor, Assistant Professor, Instructor |
Ordinal | Standard & Poor's bond ratings | AAA, AA, A, BBB, BB, B, CCC, CC, C, DDD, DD, D |
Ordinal | Student Grades | A, B, C, D, F |
Interval Scale Example: Temperature in degrees Celsius or Fahrenheit (no true zero).
Ratio Scale Example: Height in inches or centimeters, weight in pounds or kilograms, income in dollars (true zero exists).
Data Collection: Populations and Samples
Population vs. Sample
Data can be collected from an entire population or a sample. Understanding the distinction is crucial for statistical inference.
Population: All items or individuals of interest in a study.
Sample: A subset of the population, selected for analysis.
Sampling is often preferred because it is less time-consuming, less costly, and more practical than studying the entire population.
Population Parameter: A summary measure describing a characteristic of the population.
Sample Statistic: A summary measure describing a characteristic of the sample, used to estimate population parameters.
Sources of Data
Primary vs. Secondary Data Sources
Data can originate from various activities and sources, which are classified as primary or secondary.
Primary Sources: Data collected directly by the analyst (e.g., surveys, experiments, observations).
Secondary Sources: Data collected by others and used for analysis (e.g., census data, published reports).
Examples of Data Collection Methods
Ongoing Business Activities: Transaction records, web analytics, economic indicators.
Distributed Data: Financial reports, market research, published statistics.
Surveys: Consumer preferences, political polls, satisfaction ratings.
Designed Experiments: Product testing, material evaluation, market testing.
Observational Studies: Focus groups, time measurements, traffic counts.
Sampling Methods
Sampling Frame
The sampling frame is a list of items that make up the population. An accurate frame is essential for unbiased sampling.
Frames can be population lists, directories, or maps.
Excluding groups from the frame can lead to biased results.
Types of Samples
Nonprobability Samples: Items are chosen without regard to their probability of occurrence.
Convenience Sampling: Selection based on ease or convenience.
Judgment Sampling: Selection based on expert opinion.
Probability Samples: Items are chosen based on known probabilities.
Simple Random Sample: Every item has an equal chance of selection. Selection can be with or without replacement.
Systematic Sample: Select every k-th item after a random start. where is population size and is sample size.
Stratified Sample: Divide population into subgroups (strata) and sample proportionally from each.
Cluster Sample: Divide population into clusters, randomly select clusters, and sample all or some items within selected clusters.
Comparison of Sampling Methods
Method | Advantages | Disadvantages |
|---|---|---|
Simple Random/Systematic | Simple to use | May not represent all population characteristics |
Stratified | Ensures representation across subgroups | Requires knowledge of subgroup membership |
Cluster | Cost effective | Less efficient; larger sample needed for precision |
Data Preparation and Cleaning
Importance of Data Cleaning
Data cleaning is essential before analysis to correct irregularities and ensure data quality.
Invalid Variable Values: Non-numeric data for numeric variables, invalid categories, out-of-range values.
Coding Errors: Inconsistent values, case sensitivity, extraneous characters.
Integration Errors: Redundant columns, duplicated rows, inconsistent units or scales.
Missing Values: Data not collected or absent for certain variables.
Data cleaning can be semi-automated using software tools, but manual review is often necessary. Always preserve the original data for reference.
Other Data Preprocessing Tasks
Data Formatting: Rearranging structure or encoding.
Stacking/Unstacking Data: Grouping or separating variables for analysis.
Recoding Variables: Redefining categories or transforming numerical variables into categorical ones. Ensure new categories are mutually exclusive and collectively exhaustive.
Survey Errors and Ethical Issues
Types of Survey Errors
Coverage Error (Selection Bias): Some groups are excluded from the sampling frame.
Nonresponse Error: Differences between respondents and non-respondents.
Sampling Error: Natural variation between samples.
Measurement Error: Poor question design or respondent mistakes.
Ethical Issues in Surveys
Coverage and nonresponse errors can be manipulated to bias results.
Sampling error may be misrepresented if margins of error are omitted.
Measurement error can be introduced by leading questions or intentional respondent deception.
Summary
This chapter covers the essential steps in defining and collecting data for business statistics, including variable classification, measurement scales, data sources, sampling methods, data cleaning, and survey errors. Mastery of these concepts is foundational for accurate and ethical statistical analysis in business contexts.