BackIntroduction to Statistics: Key Concepts and Sampling Methods
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 1: Stats Starts Here
Introduction to Statistics
Statistics is the science of collecting, analyzing, interpreting, and presenting data. It provides methods for making informed decisions in the presence of uncertainty and variation.
Key Definitions
Data: Collections of observations such as measurements, grades, or survey responses. Data is plural for datum.
Statistic: A calculation carried out on a set of data, such as the average of a data set.
Population: The entire group of subjects that we are interested in studying.
Census: The collection of data from every member of the population.
Sample: A subcollection of members from a population.
Example: If we are interested in the study habits of Douglas College students and survey 200 students, the sample is the 200 surveyed students, and the population is all Douglas College students.
Subjects, Respondents, and Cases
Subjects: Individuals who participate in a study or experiment.
Respondents: Individuals who answer a survey.
Experimental units: Non-human subjects such as animals, plants, or objects.
Cases: The most generic term for the units or subjects in a dataset.
Variables: Specific data categories collected for each subject.
Parameter vs. Statistic
Parameter (Population Parameter): A numerical measurement describing a characteristic of a population.
Statistic (Sample Statistic): A numerical measurement describing a characteristic of a sample.
Note: Population parameters are typically unknown and must be estimated using sample statistics.
Statistical Questions and Significance
Statistical Question: A question that can be answered by collecting data and where variability is expected in the data.
Statistical Significance: A result is statistically significant if it is unlikely to occur by chance, even after accounting for sample variation.
Example: If a sample of 200 women and 150 men is surveyed and 41 women and 30 men smoke, we use these sample statistics to infer about the population proportions and test if the rates of smoking are different.
Types of Variables and Data
Quantitative vs. Qualitative Variables
Quantitative (Numerical) Variables: Variables that consist of numbers representing measurements or counts.
Qualitative (Categorical) Variables: Variables that consist of names or labels that are not measurements or counts.
Types of Quantitative Data
Discrete Data: Data for which the number of possible values is finite or countable (e.g., number of students in a class).
Continuous Data: Data for which the number of possible values is infinite and not countable (e.g., height, weight).
Types of Categorical Data
Nominal Data: Categorical data that does not have an order (e.g., gender, color).
Identifier Variables: Unique identifiers for individuals, such as student numbers. These are categorical but not used for analysis.
Note: Not all variables with numerical values are quantitative. For example, student numbers are categorical because they are labels, not measurements.
Sampling and Bias
Sampling Basics
Sample: A part of the whole population, selected for analysis.
Randomization: Ensures the sample is representative of the population.
Sample Size: The number of subjects in the sample; larger samples generally yield more reliable results.
Bias: Occurs when a portion of the population is over- or underrepresented by a sample.
Example: An online survey posted on a city website may introduce bias if only certain groups are likely to respond.
Sampling Frame
The sampling frame is the subset of the population that has a chance of being selected for the sample.
Simple Random Sample (SRS)
Every member of the population has an equal chance of being selected.
Random selection can be done using random number generators or drawing names from a hat.
Other Sampling Methods
Systematic Sampling: Select every k-th subject from a list after a random start.
Convenience Sampling: Sample subjects that are easiest to reach; often not representative.
Cluster Sampling: Divide the population into clusters, randomly select clusters, and sample all subjects within chosen clusters.
Multistage Sampling: Combine several sampling methods.
Voluntary Response Sampling: Individuals choose to participate; often leads to bias.
Undercoverage: Some groups in the population are left out of the sampling process.
Types of Bias
Nonresponse Bias: Not everyone selected responds; those who do may differ from those who do not.
Response Bias: Survey design or respondent behavior influences answers (e.g., wording of questions, desire to please interviewer).
Survey Design and Question Wording
Good Survey Practices
Ask specific, quantitative questions rather than general ones.
Phrase questions neutrally to avoid influencing responses.
Ensure questions are clear and address the information needed.
Consider who is being asked and whether they are the right respondents.
Example: "How many hours did you sleep last night?" is better than "How much do you sleep?"
Example of Question Wording Effect: Two similar questions about government wiretaps received different approval rates due to differences in wording, demonstrating the impact of question phrasing on survey results.
Summary Table: Types of Variables
Type | Description | Examples |
|---|---|---|
Quantitative (Numerical) | Numbers representing measurements or counts | Height, weight, number of books |
Qualitative (Categorical) | Names or labels, not measurements | Gender, color, student number |
Discrete | Finite/countable values | Number of students in a class |
Continuous | Infinite/not countable values | Height, time, temperature |
Nominal | Categorical data with no order | Color, gender |
Key Formulas
Sample Mean:
Population Mean:
Sample Proportion:
Additional info: This guide expands on the original notes by providing clear definitions, examples, and a summary table for variable types, as well as key formulas for basic statistics.