Skip to main content
Back

Simple Linear Regression: Concepts, Computation, and Interpretation

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 11: Simple Regression

Chapter Goals

  • Explain the simple linear regression model

  • Obtain and interpret the simple linear regression equation for a set of data

  • Describe R2 as a measure of explanatory power of the regression model

  • Understand the assumptions behind regression analysis

  • Explain measures of variation and determine whether the independent variable is significant

  • Calculate and interpret confidence intervals for the regression coefficients

  • Use a regression equation for prediction

  • Form forecast intervals around an estimated Y value for a given X

  • Use graphical analysis to recognize potential problems in regression analysis

  • Explain the correlation coefficient and perform a hypothesis test for zero population correlation

Section 11.1 Overview of Linear Models

Simple Linear Regression Model

The simple linear regression model describes the relationship between two variables using a straight line. The general form is:

  • Equation:

  • Y: Dependent variable (the outcome we wish to explain)

  • X: Independent variable (the predictor or explanatory variable)

  • \beta_0: Y-intercept (value of Y when X = 0)

  • \beta_1: Slope (change in Y for a one-unit change in X)

Least Squares Regression

Estimates for the coefficients are found using the least squares regression technique, which minimizes the sum of squared errors between observed and predicted values.

  • Sample regression equation:

  • Slope estimator:

  • Intercept estimator:

Introduction to Regression Analysis

  • Purpose:

    • Predict the value of a dependent variable based on the value of at least one independent variable

    • Explain the impact of changes in an independent variable on the dependent variable

  • Dependent variable: Also called the endogenous variable

  • Independent variable: Also called the exogenous variable

Section 11.2 Linear Regression Model

Population Regression Equation

  • The relationship between X and Y is described by a linear function:

  • and are population coefficients

  • is a random error term

Regression Model Components

  • Linear component:

  • Random error component:

Assumptions of Linear Regression

  • The true relationship is linear: Y is a linear function of X plus random error

  • Error terms are independent of X values

  • Error terms are random variables with mean 0 and constant variance (homoscedasticity):

    • for all

    • for all (no autocorrelation)

Section 11.3 Least Squares Coefficient Estimators

Finding the Best-Fit Line

  • Coefficients and are chosen to minimize the sum of squared errors (SSE):

  • Slope estimator:

  • Intercept estimator:

  • The regression line always passes through the means

Computation Using Software

  • Hand calculations are tedious; statistical software (e.g., Excel) is commonly used

Interpretation of Coefficients

  • Intercept (): Estimated average value of Y when X = 0 (if X = 0 is within the observed range)

  • Slope (): Estimated change in average Y for a one-unit increase in X

Simple Linear Regression Example

Application: House Price and Size

  • Dependent variable (Y): House price in $1000s

  • Independent variable (X): Square feet

  • Sample data (10 houses):

House Price in $1000s (Y)

Square Feet (X)

245

1400

312

1600

279

1700

308

1875

199

1100

319

1550

405

2350

324

2450

319

1425

255

1700

  • Scatter plot and regression line can be generated using Excel

Regression Equation from Excel Output

  • Estimated regression equation:

  • Interpretation of : $98,248.33 is the portion of house price not explained by square feet (within observed range)

  • Interpretation of : For each additional square foot, house price increases by $0.10977 \times 1000 = $109.77 on average

Section 11.4 Explanatory Power of a Linear Regression Equation

Measures of Variation

  • Total Sum of Squares (SST):

  • Regression Sum of Squares (SSR):

  • Error Sum of Squares (SSE):

  • Relationship:

Analysis of Variance (ANOVA)

  • SST: Variation of Y values around their mean

  • SSR: Explained variation due to X

  • SSE: Unexplained variation (random error)

Coefficient of Determination ()

  • Proportion of total variation in Y explained by X

  • Ranges from 0 to 1

  • Interpretation: Higher indicates a better fit

Examples of Values

  • : Perfect linear relationship

  • : Weaker linear relationship

  • : No linear relationship

Excel Output Example

  • In the house price example,

  • Interpretation: 58.08% of the variation in house prices is explained by variation in square feet

Relationship Between Correlation and

  • For simple regression:

Estimation of Model Error Variance

  • Estimator for variance of model error:

  • is the standard error of the estimate

  • Division by (not ) because two parameters are estimated ( and )

Summary

  • Simple linear regression models the relationship between two variables using a straight line

  • Coefficients are estimated using least squares, minimizing the sum of squared errors

  • Interpretation of coefficients provides insight into the relationship between variables

  • Measures of variation and assess the explanatory power of the model

  • Statistical software is commonly used for computation

Pearson Logo

Study Prep