Skip to main content
Back

(Lecture 6) Association Between Two Categorical Variables: Contingency Tables, Conditional Proportions, and Measures of Association

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Association Between Two Categorical Variables

Introduction to Association

In statistics, understanding the relationship between two variables is fundamental for data analysis. This section introduces methods for determining whether an association exists between two variables and for describing the strength of that association.

  • Association refers to a relationship where particular values of one variable are more likely to occur with certain values of another variable.

  • Key questions include: Does one variable influence or relate to another? For example, does smoking cause premature death, or do schools with higher funding have higher SAT scores?

Response and Explanatory Variables

Definitions and Examples

When studying associations, it is important to distinguish between the response variable (dependent variable) and the explanatory variable (independent variable).

  • Response Variable: The outcome variable on which comparisons are made.

  • Explanatory Variable: The variable whose effect on the response is being studied.

  • Examples:

    • Carbon dioxide level (response) / Amount of gasoline use for cars (explanatory)

    • College GPA (response) / Number of hours a week spent studying (explanatory)

    • Survival status (response) / Smoking status (explanatory)

When the explanatory variable is categorical, it defines groups for comparison. When it is quantitative, it defines changes in numerical values for comparison.

Contingency Tables

Definition and Structure

Contingency tables (also called two-way tables) are used to summarize the relationship between two categorical variables. They are a special type of frequency distribution table.

  • Rows list the categories of one variable.

  • Columns list the categories of the other variable.

  • Entries in the table are frequencies (counts).

Example: Food Type and Pesticide Status

The following contingency table displays the frequencies of foods for all possible category combinations of two variables: food type and pesticide status.

Food Type

Present

Not Present

Total

Organic

29

98

127

Conventional

19,485

7,086

26,571

Total

19,514

7,184

26,698

Conditional and Marginal Proportions

Calculating Proportions

Proportions are used to compare groups within contingency tables. Conditional proportions are calculated for each group defined by the explanatory variable, while marginal proportions are calculated for the entire sample.

  • Proportion of organic foods with pesticides:

  • Proportion of organic foods without pesticides:

  • Proportion of conventional foods with pesticides:

  • Proportion of conventional foods without pesticides:

  • Proportion of all sampled items with pesticides:

Food Type

Present

Not Present

Total

n

Organic

0.23

0.77

1.00

127

Conventional

0.73

0.27

1.00

26,571

Visualizing Proportions

Side-by-side bar charts or stacked bar graphs are commonly used to visually compare conditional proportions across groups.

Measuring Strength of Association

Difference of Proportions

The difference of proportions is a simple measure of association between two categorical variables.

  • Difference of proportions:

  • This means the percentage of conventionally grown food with pesticide residues is 50 percentage points higher than for organic food.

Ratio of Proportions (Relative Risk)

The ratio of conditional proportions (also called relative risk or risk ratio) compares the likelihood of an outcome between two groups.

  • Ratio of proportions:

  • The proportion of food with pesticide residues is 3.2 times higher for conventionally grown food than for organic food.

  • If the variables are not associated, the ratio equals 1.

Odds Ratio

The odds ratio is another measure of association, using the ratio of odds rather than proportions.

  • Odds for a group: , where is the probability of the outcome.

  • Odds ratio:

  • Commonly used in medical studies to compare the odds of an outcome (e.g., disease) between exposed and unexposed groups.

Example: Odds Ratio for Lung Cancer and Smoking

Smoking Status

Lung Cancer: Yes

Lung Cancer: No

Total

Smoker

1269

29

1298

Non-Smoker

0

0

0

To calculate the odds ratio:

  • Odds of lung cancer for smokers:

  • Odds of lung cancer for non-smokers: (data not provided; typically would be for non-smokers)

  • Odds ratio:

  • Interpretation: The odds of lung cancer in smokers is estimated to be many times the odds in non-smokers.

  • Additional info: In practice, the odds ratio is a key measure in case-control studies and epidemiology.

Summary Table: Measures of Association

Measure

Formula

Interpretation

Difference of Proportions

Absolute difference in proportions between two groups

Relative Risk (Risk Ratio)

Ratio of probabilities; how many times more likely the outcome is in one group

Odds Ratio

Ratio of odds; commonly used in case-control studies

Key Takeaways

  • Contingency tables are essential for summarizing relationships between categorical variables.

  • Conditional and marginal proportions help describe group differences.

  • Measures such as difference of proportions, relative risk, and odds ratio quantify the strength of association.

  • Visual tools like bar charts aid in interpreting associations.

Pearson Logo

Study Prep