Back(Lecture 6) Association Between Two Categorical Variables: Contingency Tables, Conditional Proportions, and Measures of Association
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Association Between Two Categorical Variables
Introduction to Association
In statistics, understanding the relationship between two variables is fundamental for data analysis. This section introduces methods for determining whether an association exists between two variables and for describing the strength of that association.
Association refers to a relationship where particular values of one variable are more likely to occur with certain values of another variable.
Key questions include: Does one variable influence or relate to another? For example, does smoking cause premature death, or do schools with higher funding have higher SAT scores?
Response and Explanatory Variables
Definitions and Examples
When studying associations, it is important to distinguish between the response variable (dependent variable) and the explanatory variable (independent variable).
Response Variable: The outcome variable on which comparisons are made.
Explanatory Variable: The variable whose effect on the response is being studied.
Examples:
Carbon dioxide level (response) / Amount of gasoline use for cars (explanatory)
College GPA (response) / Number of hours a week spent studying (explanatory)
Survival status (response) / Smoking status (explanatory)
When the explanatory variable is categorical, it defines groups for comparison. When it is quantitative, it defines changes in numerical values for comparison.
Contingency Tables
Definition and Structure
Contingency tables (also called two-way tables) are used to summarize the relationship between two categorical variables. They are a special type of frequency distribution table.
Rows list the categories of one variable.
Columns list the categories of the other variable.
Entries in the table are frequencies (counts).
Example: Food Type and Pesticide Status
The following contingency table displays the frequencies of foods for all possible category combinations of two variables: food type and pesticide status.
Food Type | Present | Not Present | Total |
|---|---|---|---|
Organic | 29 | 98 | 127 |
Conventional | 19,485 | 7,086 | 26,571 |
Total | 19,514 | 7,184 | 26,698 |
Conditional and Marginal Proportions
Calculating Proportions
Proportions are used to compare groups within contingency tables. Conditional proportions are calculated for each group defined by the explanatory variable, while marginal proportions are calculated for the entire sample.
Proportion of organic foods with pesticides:
Proportion of organic foods without pesticides:
Proportion of conventional foods with pesticides:
Proportion of conventional foods without pesticides:
Proportion of all sampled items with pesticides:
Food Type | Present | Not Present | Total | n |
|---|---|---|---|---|
Organic | 0.23 | 0.77 | 1.00 | 127 |
Conventional | 0.73 | 0.27 | 1.00 | 26,571 |
Visualizing Proportions
Side-by-side bar charts or stacked bar graphs are commonly used to visually compare conditional proportions across groups.
Measuring Strength of Association
Difference of Proportions
The difference of proportions is a simple measure of association between two categorical variables.
Difference of proportions:
This means the percentage of conventionally grown food with pesticide residues is 50 percentage points higher than for organic food.
Ratio of Proportions (Relative Risk)
The ratio of conditional proportions (also called relative risk or risk ratio) compares the likelihood of an outcome between two groups.
Ratio of proportions:
The proportion of food with pesticide residues is 3.2 times higher for conventionally grown food than for organic food.
If the variables are not associated, the ratio equals 1.
Odds Ratio
The odds ratio is another measure of association, using the ratio of odds rather than proportions.
Odds for a group: , where is the probability of the outcome.
Odds ratio:
Commonly used in medical studies to compare the odds of an outcome (e.g., disease) between exposed and unexposed groups.
Example: Odds Ratio for Lung Cancer and Smoking
Smoking Status | Lung Cancer: Yes | Lung Cancer: No | Total |
|---|---|---|---|
Smoker | 1269 | 29 | 1298 |
Non-Smoker | 0 | 0 | 0 |
To calculate the odds ratio:
Odds of lung cancer for smokers:
Odds of lung cancer for non-smokers: (data not provided; typically would be for non-smokers)
Odds ratio:
Interpretation: The odds of lung cancer in smokers is estimated to be many times the odds in non-smokers.
Additional info: In practice, the odds ratio is a key measure in case-control studies and epidemiology.
Summary Table: Measures of Association
Measure | Formula | Interpretation |
|---|---|---|
Difference of Proportions | Absolute difference in proportions between two groups | |
Relative Risk (Risk Ratio) | Ratio of probabilities; how many times more likely the outcome is in one group | |
Odds Ratio | Ratio of odds; commonly used in case-control studies |
Key Takeaways
Contingency tables are essential for summarizing relationships between categorical variables.
Conditional and marginal proportions help describe group differences.
Measures such as difference of proportions, relative risk, and odds ratio quantify the strength of association.
Visual tools like bar charts aid in interpreting associations.