Table of contents

1. Intro to Stats and Collecting Data1h 14m

Intro to Stats
24m

Levels of Measurement
18m

Intro to Collecting Data
8m

Sampling Methods
23m

2. Describing Data with Tables and Graphs1h 56m

Visualizing Qualitative vs. Quantitative Data
4m

Frequency Distributions
35m

Histograms
14m

Bar Graphs and Pareto Charts
11m

Pie Charts
8m

Frequency Polygons
10m

Dot Plots
6m

Stemplots (Stem-and-Leaf Plots)
14m

Time-Series Graph
9m

3. Describing Data Numerically2h 5m

Mean
9m

Median
17m

Mode
7m

Standard Deviation
16m

Interpreting Standard Deviation
20m

Percentiles & Quartiles
14m

Describing Data Numerically Using a Graphing CalculatorBonus
10m

Boxplots
8m

Descriptive Statistics-ExcelBonus
11m

Boxplots-ExcelBonus
8m

4. Probability2h 17m

Basic Concepts of Probability
7m

Complements
6m

Addition Rule
17m

Multiplication Rule: Independent Events
11m

Introduction to Contingency Tables
17m

Multiplication Rule: Dependent Events
17m

Bayes' Theorem
13m

Fundamental Counting Principle
8m

Counting
37m

5. Binomial Distribution & Discrete Random Variables3h 6m

Discrete Random Variables
31m

Binomial Distribution
1h 7m

Finding Binomial Probabilities-ExcelBonus
17m

Poisson Distribution
40m

Finding Poisson Probabilities-ExcelBonus
15m

Hypergeometric Distribution
14m

6. Normal Distribution and Continuous Random Variables2h 11m

Uniform Distribution
18m

Standard Normal Distribution
39m

Probabilities & Z-Scores w/ Graphing CalculatorBonus
19m

Non-Standard Normal Distribution
21m

Finding Probabilities, Z Values, and X Values with the Normal Distribution-ExcelBonus
32m

7. Sampling Distributions & Confidence Intervals: Mean3h 23m

Sampling Distribution of the Sample Mean and Central Limit Theorem
19m

Distribution of Sample Mean - ExcelBonus
23m

Introduction to Confidence Intervals
15m

Confidence Intervals for Population Mean
1h 18m

Determining the Minimum Sample Size Required
12m

Finding Probabilities and T Critical Values - ExcelBonus
28m

Confidence Intervals for Population Means - ExcelBonus
25m

8. Sampling Distributions & Confidence Intervals: Proportion2h 10m

Sampling Distribution of Sample Proportion
29m

Confidence Intervals for Population Proportion
42m

Confidence Intervals for Population Proportion - ExcelBonus
12m

Chi Square Distribution
20m

Confidence Intervals for Population Variance
24m

9. Hypothesis Testing for One Sample5h 8m

Steps in Hypothesis Testing
1h 6m

Performing Hypothesis Tests: Means
1h 4m

Hypothesis Testing: Means - ExcelBonus
42m

Performing Hypothesis Tests: Proportions
37m

Hypothesis Testing: Proportions - ExcelBonus
27m

Performing Hypothesis Tests: Variance
12m

Critical Values and Rejection Regions
28m

Link Between Confidence Intervals and Hypothesis Testing
12m

Type I & Type II Errors
16m

10. Hypothesis Testing for Two Samples5h 37m

Two Proportions
1h 13m

Two Proportions Hypothesis Test - ExcelBonus
28m

Two Means - Unknown, Unequal Variance
1h 3m

Two Means - Unknown Variances Hypothesis Test - ExcelBonus
12m

Two Means - Unknown, Equal Variance
15m

Two Means - Unknown, Equal Variances Hypothesis Test - ExcelBonus
9m

Two Means - Known Variance
12m

Two Means - Sigma Known Hypothesis Test - ExcelBonus
21m

Two Means - Matched Pairs (Dependent Samples)
42m

Matched Pairs Hypothesis Test - ExcelBonus
12m

Two Variances and F Distribution
29m

Two Variances - Graphing CalculatorBonus
16m

11. Correlation1h 24m

Scatterplots & Intro to Correlation
26m

Correlation Coefficient
21m

Creating Scatterplots and FInding Correlation Coefficient - ExcelBonus
6m

Hypothesis Tests for Correlation Coefficient Using TI-84 Bonus
17m

Inferences for the Correlation Coefficient - ExcelBonus
11m

12. Regression3h 33m

Linear Regression & Least Squares Method
26m

Residuals
12m

Coefficient of Determination
12m

Regression Line Equation and Coefficient of Determination - ExcelBonus
8m

Finding Residuals and Creating Residual Plots - ExcelBonus
11m

Inferences for Slope
31m

Enabling Data Analysis ToolpakBonus
1m

Regression Readout of the Data Analysis Toolpak - ExcelBonus
21m

Prediction Intervals
13m

Prediction Intervals - ExcelBonus
19m

Multiple Regression - ExcelBonus
29m

Quadratic Regression
15m

Quadratic Regression - ExcelBonus
10m

13. Chi-Square Tests & Goodness of Fit2h 21m

Goodness of Fit Test
41m

Goodness of FIt Test Using TI-84Bonus
17m

Goodness of Fit Test - ExcelBonus
10m

Contingency Tables
12m

Independence Tests
14m

Homogeneity Tests
11m

Using Matrices on a TI-84Bonus
6m

Independence Test Using TI-84Bonus
12m

Independence Tests - ExcelBonus
13m

14. ANOVA2h 29m

Introduction to ANOVA
31m

One-Way ANOVA - ExcelBonus
12m

Multiple Comparisons: Tukey Test
14m

Multiple Comparisons: Tukey-Kramer Test
15m

Multiple Comparisons: Bonferoni Test
24m

Two-Way ANOVA
32m

Two-Way ANOVA - ExcelBonus
18m

13. Chi-Square Tests & Goodness of Fit

Goodness of Fit Test

Statistics

13. Chi-Square Tests & Goodness of Fit

Goodness of Fit Test

Previous problemNext problem

1:09 minutes

Problem 12.1.13a

Textbook Question

Benford’s Law, Part I Our number system consists of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. The first significant digit in any number must be 1, 2, 3, 4, 5, 6, 7, 8, or 9 because we do not write numbers such as 12 as 012. Although we may think that each first digit appears with equal frequency so that each digit has a 1/9 probability of being the first significant digit, this is not true. In 1881, Simon Newcomb discovered that first digits do not occur with equal frequency. This same result was discovered again in 1938 by physicist Frank Benford. After studying much data, he was able to assign probabilities of occurrence to the first digit in a number as shown.
[Image]
Source: T. P. Hill, “The First Digit Phenomenon,” American Scientist, July—August, 1998.
The probability distribution is now known as Benford’s Law and plays a major role in identifying fraudulent data on tax returns and accounting books. For example, the following distribution represents the first digits in 200 allegedly fraudulent checks written to a bogus company by an employee attempting to embezzle funds from his employer.
a. Because these data are meant to prove that someone is guilty of fraud, what would be an appropriate level of significance when performing a goodness-of-fit test?

Verified step by step guidance

Understand the context: The problem involves testing whether the observed first-digit distribution of 200 checks follows Benford's Law, which provides expected probabilities for each digit from 1 to 9.

Identify the null and alternative hypotheses for the goodness-of-fit test: The null hypothesis (H0) states that the observed data follow Benford's Law distribution, while the alternative hypothesis (H1) states that the data do not follow this distribution.

Choose an appropriate level of significance (α): Since the test is used to detect potential fraud, it is important to minimize the chance of falsely accusing someone (Type I error). Common significance levels are 0.05 or 0.01, but for fraud detection, a stricter level such as 0.01 might be more appropriate to reduce false positives.

Calculate the expected frequencies for each digit by multiplying the total number of observations (200) by the corresponding Benford's Law probabilities: For each digit d, Expected frequency = 200 × P(d), where P(d) is the probability from the table.

Perform the chi-square goodness-of-fit test by comparing observed and expected frequencies, then decide whether to reject H0 based on the chosen significance level and the chi-square test statistic.

Verified video answer for a similar problem:

This video solution was recommended by our tutors as helpful for the problem above

Video duration:

Play a video:

0 Comments

Key Concepts

Here are the essential concepts you must grasp in order to answer the question correctly.

Benford's Law

Benford's Law describes the frequency distribution of the first significant digit in many real-life sets of numerical data. Contrary to intuition, lower digits like 1 appear as the first digit more frequently than higher digits, with probabilities decreasing logarithmically from 1 to 9. This law is useful in detecting anomalies or fraud in datasets such as financial records.

Goodness-of-Fit Test

A goodness-of-fit test evaluates how well observed data match an expected distribution, such as Benford's Law. It compares observed frequencies with expected probabilities to determine if deviations are due to chance or indicate a significant difference. Common tests include the Chi-square test, which requires selecting a significance level to decide whether to reject the null hypothesis.

Level of Significance

The level of significance (alpha) is the threshold probability for rejecting the null hypothesis in hypothesis testing. It represents the risk of a Type I error—incorrectly concluding fraud when none exists. In fraud detection, a common choice is 0.05 or lower to minimize false accusations, balancing sensitivity and reliability in the test results.

Watch next

Master Goodness of Fit Test: Unequal Probabilities with a bite sized video explanation from Patrick Ford