Are the proportions what we expected? Are two variables related?
Introductory Statistics for Accounting
Three applications of the \(\chi^2\) distribution:
The Chi-Squared Distribution
The chi-squared (\(\chi^2\)) test is a hypothesis test for categorical data. It compares what we observed in our sample to what we would expect if a particular hypothesis were true.
The test statistic is always computed the same way:
$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$
where \(O_i\) = observed frequency in category \(i\), and \(E_i\) = expected frequency in category \(i\).
The \(\chi^2\) distribution is a family of curves, each defined by its degrees of freedom (df). As df increases, the distribution shifts right and becomes more symmetric.
All three chi-squared tests follow the same five-step hypothesis testing procedure from Week 9:
| Step 1 | State the null and alternative hypotheses (\(H_0\) and \(H_1\)). |
| Step 2 | Choose the significance level (\(\alpha\)), usually 0.05. |
| Step 3 | Compute the test statistic: \(\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\) |
| Step 4 | Find the critical value from the \(\chi^2\) table using the appropriate df, or compute the p-value. |
| Step 5 | Decision: reject \(H_0\) if \(\chi^2 > \chi^2_{\alpha}\), or if p-value \(< \alpha\). |
The Goodness-of-Fit Test
We have one variable with \(k\) categories. We observe frequencies in each category and compare them to what we'd expect under \(H_0\).
Hypotheses:
\(H_0\): The population proportions are \(p_1, p_2, \ldots, p_k\) (as specified).
\(H_1\): At least one proportion differs from the specified value.
Degrees of freedom: \(\text{df} = k - 1\), where \(k\) = number of categories.
An audit firm claims that invoice errors are equally distributed across four quarters. You sample 200 error reports and count how many fall in each quarter. Does the data support the claim?
BrightPath Financial Services audited 200 invoice processing errors over the past year and recorded which quarter each error occurred in.
| Quarter | Q1 | Q2 | Q3 | Q4 | Total |
|---|---|---|---|---|---|
| Observed (\(O_i\)) | 62 | 48 | 40 | 50 | 200 |
| Expected (\(E_i\)) | 50 | 50 | 50 | 50 | 200 |
If errors are equally distributed (\(H_0\): \(p_1 = p_2 = p_3 = p_4 = 0.25\)), we expect \(200 \times 0.25 = 50\) per quarter.
Step 3 — Compute the test statistic:
$$\chi^2 = \frac{(62-50)^2}{50} + \frac{(48-50)^2}{50} + \frac{(40-50)^2}{50} + \frac{(50-50)^2}{50}$$
$$= \frac{144}{50} + \frac{4}{50} + \frac{100}{50} + \frac{0}{50} = 2.88 + 0.08 + 2.00 + 0.00 = 4.96$$
Step 4 — Critical value: With \(\text{df} = 4 - 1 = 3\) and \(\alpha = 0.05\):
\(\chi^2_{\text{critical}} = 7.815\)
Step 5 — Decision: Since \(4.96 < 7.815\), we do not reject \(H_0\).
Test of Independence
Data is arranged in an \(r \times c\) contingency table (r rows, c columns). The hypotheses are:
\(H_0\): The two variables are independent.
\(H_1\): The two variables are not independent (i.e., they are associated).
Degrees of freedom: \(\text{df} = (r - 1)(c - 1)\)
The expected frequency for each cell is:
$$E_{ij} = \frac{(\text{Row } i \text{ total}) \times (\text{Column } j \text{ total})}{\text{Grand total}}$$
BrightPath surveyed 300 clients about their satisfaction (Satisfied / Neutral / Dissatisfied) across three service types (Tax, Audit, Advisory). Is satisfaction independent of service type?
| Tax | Audit | Advisory | Row Total | |
|---|---|---|---|---|
| Satisfied | 60 | 40 | 55 | 155 |
| Neutral | 30 | 35 | 20 | 85 |
| Dissatisfied | 10 | 25 | 25 | 60 |
| Col Total | 100 | 100 | 100 | 300 |
Computing expected frequencies (example cells):
\(E_{\text{Satisfied, Tax}} = \frac{155 \times 100}{300} = 51.67\) \(E_{\text{Dissatisfied, Audit}} = \frac{60 \times 100}{300} = 20.00\)
Complete expected frequency table:
| Tax | Audit | Advisory | |
|---|---|---|---|
| Satisfied | 51.67 | 51.67 | 51.67 |
| Neutral | 28.33 | 28.33 | 28.33 |
| Dissatisfied | 20.00 | 20.00 | 20.00 |
Since all column totals are equal (100), each row's expected values are simply the row total / 3.
Test statistic:
$$\chi^2 = \frac{(60-51.67)^2}{51.67} + \frac{(40-51.67)^2}{51.67} + \cdots + \frac{(25-20)^2}{20}$$
$$= 1.34 + 2.63 + 0.21 + 0.10 + 1.57 + 2.45 + 5.00 + 1.25 + 1.25 = \mathbf{15.80}$$
\(\text{df} = (3-1)(3-1) = 4\). At \(\alpha = 0.05\): \(\chi^2_{\text{critical}} = 9.488\).
We found a statistically significant association between service type and client satisfaction. But the test tells us that a relationship exists, not where. How do we dig deeper?
Look at which cells contributed most to the test statistic. The largest contributions came from:
Test of Homogeneity
Survey 300 BrightPath clients. Record their service type and satisfaction level. Are they associated?
Sample 100 clients from each of three offices (Sydney, Melbourne, Brisbane). Is the distribution of satisfaction the same?
BrightPath sampled 100 clients from each of its Sydney and Melbourne offices and recorded satisfaction:
| Satisfied | Neutral | Dissatisfied | Total | |
|---|---|---|---|---|
| Sydney | 55 | 30 | 15 | 100 |
| Melbourne | 45 | 25 | 30 | 100 |
| Total | 100 | 55 | 45 | 200 |
\(H_0\): The distribution of satisfaction is the same in both offices.
\(H_1\): The distributions differ.
df = (2 − 1)(3 − 1) = 2. You would compute expected frequencies exactly as before and compare \(\chi^2\) to \(\chi^2_{0.05, 2} = 5.991\).
| Test | Question | Degrees of Freedom |
|---|---|---|
| Goodness-of-Fit | Does one variable's distribution match a specified set of proportions? | \(k - 1\) |
| Independence | Are two variables associated? (single sample) | \((r-1)(c-1)\) |
| Homogeneity | Is the distribution the same across populations? (separate samples) | \((r-1)(c-1)\) |
Press T or Escape to close