BackIntroduction to Statistics: Data Types and Levels of Measurement
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
Statistical and Critical Thinking
Statistics is the science of collecting, analyzing, interpreting, and presenting data. Critical thinking in statistics involves evaluating the validity of data sources, methods, and conclusions. Understanding the types of data and their properties is essential for choosing appropriate statistical methods.
Statistics refers to the study and use of data to make informed decisions.
Critical thinking is necessary to assess the reliability and relevance of statistical findings.
Key Statistical Concepts
Parameters and Statistics
In statistics, it is important to distinguish between parameters and statistics, as they refer to measurements from different groups.
Parameter: A numerical measurement describing some characteristic of a population.
Statistic: A numerical measurement describing some characteristic of a sample.
Example
Parameter: The mean age of all FAU students is 19.5 years.
Statistic: The mean age of FAU students in a particular class is 18.75 years.
Types of Data
Quantitative Data
Quantitative (or numerical) data consist of numbers representing counts or measurements. These data are used for mathematical calculations and statistical analysis.
Examples: The weights of supermodels, the ages of respondents.
Categorical Data
Categorical (or qualitative or attribute) data consist of names or labels that do not represent counts or measurements. These data classify individuals into groups or categories.
Examples: The gender (male/female) of professional athletes, shirt numbers on professional athletes (as substitutes for names).
Working with Quantitative Data
Quantitative data can be further classified as discrete or continuous based on the nature of the values.
Discrete Data: Result when the data values are quantitative and the number of values is finite or countable. Example: The number of tosses of a coin before getting heads.
Continuous Data: Result from infinitely many possible quantitative values, where the collection of values is not countable. Example: The lengths of distances from 0 cm to 12 cm.
Levels of Measurement
Data can be classified according to four levels of measurement: nominal, ordinal, interval, and ratio. The level of measurement determines the types of statistical analyses that are appropriate.
Nominal Level
The nominal level of measurement is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in any meaningful order.
Example: Survey responses of yes, no, and undecided.
Ordinal Level
The ordinal level of measurement involves data that can be arranged in some order, but differences between data values either cannot be determined or are not meaningful.
Example: Course grades A, B, C, D, or F.
Interval Level
The interval level of measurement involves data that can be arranged in order, and the differences between data values can be found and are meaningful. However, there is no natural zero starting point.
Example: Years 1000, 2000, 1776, and 1492.
Ratio Level
The ratio level of measurement is characterized by data that can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point. Both differences and ratios are meaningful.
Example: Class times of 50 minutes and 100 minutes.
Summary Table: Levels of Measurement
Level | Description | Example |
|---|---|---|
Nominal | Categories only | Eye color, survey responses (yes/no) |
Ordinal | Categories with some order | Course grades, Likert scale responses |
Interval | Differences but no natural zero point | Years, temperature in Celsius |
Ratio | Differences and a natural zero point | Age, time taken to complete a task |
Ratio Level vs Interval Level
To distinguish between ratio and interval levels, consider whether the concept of 'twice as much' makes sense and whether there is a true zero point.
Ratio Test: Does the use of the term 'twice' make sense?
True Zero: Is there a zero quantity that means none of the variable is present?
Examples
Ratio Level: Time (minutes) taken to complete a statistics exam. It makes sense to say 'one student took twice as much time as another,' and 0 minutes is a true zero.
Interval Level: Body temperatures (Celsius) of students. It does not make sense to say 'twice as much,' and 0°C is not a true zero.
Questions: Levels of Measurement
Examples of categorizing data by their level of measurement:
Ratio: Age of students
Nominal: Data with eye colors (brown, blue, green)
Nominal: Survey data with labels (1: Business student, 2: Art student, 3: Engineering student)
Ordinal: Survey using Likert-type questions
Big Data and Data Science
Big Data
Big data refers to data sets so large and complex that traditional software tools cannot analyze them efficiently. Analysis of big data often requires parallel processing on multiple computers.
Data Science
Data science involves the application of statistics, computer science, software engineering, and other relevant fields to analyze and interpret complex data sets.
Handling Missing Data
Types of Missing Data
Missing Completely at Random (MCAR): The likelihood of a data value being missing is independent of its value or any other values in the data set.
Missing Not at Random (MNAR): The missing value is related to the reason that it is missing.
Correcting for Missing Data
Delete Cases: Remove all subjects with any missing values from the analysis.
Impute Missing Values: Substitute missing values with estimated or calculated values.
Additional info: Imputation methods may include mean substitution, regression imputation, or more advanced techniques such as multiple imputation.