Skip to main content

Section 12.4 Two Proportions with Independent Samples

We are finally done discussing inference for differences between population means. So now it’s time to look at the difference between two population proportions. There are many interesting business applications for this scenario. For example, perhaps USAA would like to know if the proportion of young male drivers who have car accidents differs from the proportion of young female drivers who have car accidents.
The sampling distribution for the difference in proportions is approximately normal as long as we have relatively large (\(n\geq 30\)) sample sizes. The mean of this sampling distribution is \(p_1-p_2=\overline{p}_1-\overline{p}_2\text{.}\)
The standard error for this distribution is
\begin{equation*} \boxed{ \sigma_{p_1-p_2} = \sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}} }. \end{equation*}
However, \(p_1\) and \(p_2\) are unknown and so the sample proportions, \(\overline{p}_1\) and \(\overline{p}_2\text{,}\) computed from the sample data, are used as estimates. This allows us to approximate the standard error for the difference in population proportions with:
\begin{equation*} \boxed{ \hat{\sigma}_{p_1-p_2} = \sqrt{\frac{\overline{p}_1(1-\overline{p}_1)}{n_1}+\frac{\overline{p}_2(1-\overline{p}_2)}{n_2}} }. \end{equation*}
So the confidence interval for the difference between the two proportions is found using the formula:
\begin{equation*} \boxed{ (\overline{p}_1-\overline{p}_2)\pm z_{\alpha/2}\cdot\hat{\sigma}_{p_1-p_2}} \end{equation*}
And the test statistic for the hypothesis test for the difference between two proportions is defined by the formula:
\begin{equation*} \boxed{z_p = \frac{(\overline{p}_1-\overline{p}_2)-(p_1-p_2)_{H_0}}{\hat{\sigma}_{p_1-p_2}}} \end{equation*}
However, if we assume that \(p_1=p_2\) in the null hypothesis (i.e. there is no difference in the population proportions, \(p_1-p_2=0\text{,}\) which is very common), the sample data can be pooled. That is, we define the weighted average of the two sample proportions by:
\begin{equation*} \hat{p}=\frac{x_1+x_2}{n_1+n_2}. \end{equation*}
This simplifies the test statistic formula to:
\begin{equation*} z_p = \frac{(\overline{p}_1-\overline{p}_2)-0}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} \end{equation*}

Note 12.4.1.

Unfortunately, the Data Analysis tool in Excel does not perform the calculations for this scenario, so we will do all of these problems “by hand”.

Exercise 12.4.2.

(Donnelly 10.37)
Economists theorize that the 2007-2008 recession affected men more than women because men are typically employed in industries that have been hit hardest by the recession. Women, on the other hand, are typically employed in services which are considered more recession resistant. A sample of 170 men and a sample of 150 women were drawn. In those samples, 21 men were unemployed, and 11 of the women were unemployed. Perform a hypothesis test using \(\alpha=0.05\) to determine if the unemployment rate for men is higher than the rate for women.
Answer.
critical value:
\begin{equation*} NORM.S.INV(0.95)\approx 1.645 \end{equation*}
population 1 = men
population 2 = women
\begin{equation*} n_1=170,\; x_1=21,\; \overline{p}_1=\frac{21}{170}\approx 0.124 \end{equation*}
\begin{equation*} n_2=150,\; x_2=11,\; \overline{p}_2=\frac{11}{150}\approx 0.073 \end{equation*}
\begin{equation*} \hat{p}=\frac{21+11}{170+150}=0.1 \end{equation*}
test statistic:
\begin{equation*} z_p\approx\frac{(0.124-0.073)-0}{0.1(0.9)\left(\frac{1}{170}+\frac{1}{150}\right)}\approx 1.52 \end{equation*}
\begin{equation*} \text{test statistic}\lt \text{ critical value}\text{,} \end{equation*}
so fail to reject \(H_0\)
There is not enough evidence to conclude that the unemployment rate is higher for men than women.

Exercise 12.4.3.

(Donnelly 10.61)
Negative equity (also known as being “underwater”) refers to a scenario where the market value of a residence is worth less than the outstanding balance on the mortgage for that home. Suppose the Federal Housing Administration (FHA), which is the government agency charged with supporting the home financing market would like to test the hypothesis that the proportion of home mortgages with negative equity in Florida is more than \(7\%\) higher than the national proportion. A random sample of \(180\) mortgages from Florida found that 67 were underwater. A random sample of 190 mortgages across the United States found that 42 were underwater.

(a)

Using \(\alpha=0.05\text{,}\) perform this hypothesis test for the FHA.
Answer.
population 1: Florida, population 2: US
\begin{equation*} n_1=180,\; x_1=67,\; p_1=\frac{67}{180}\approx 0.372 \end{equation*}
\begin{equation*} n_2=190,\; x_2=42,\; p_2=\frac{42}{190}\approx 0.221 \end{equation*}
\begin{equation*} H_0: p_1-p_2\leq 0.07 \end{equation*}
\begin{equation*} H_1: p_1-p_2\gt 0.07 \end{equation*}
Critical value: \(NORM.S.INV(0.95)\approx 1.645\)
\begin{equation*} \sigma_{p_1-p_2}\approx \sqrt{\frac{0.372(0.628)}{180}+\frac{0.221(0.779)}{190}}\approx 0.0469 \end{equation*}
Test statistic:
\begin{equation*} z_p\approx \frac{(0.372-0.221)-0.7}{0.0469}\approx 1.73 \end{equation*}
Since the test statistic is bigger than the critical value, and this is a right-tailed test, we reject \(H_0\text{.}\)
There is enough evidence to conclude that the proportion of underwater mortgages in FL is more than \(7\%\) higher than nationwide.

(b)

Construct a \(95\%\) confidence interval to estimate the difference in the proportion of underwater mortgages for these two populations. Interpret the results.
Answer.
Formula:
\begin{equation*} (\overline{p}_1-\overline{p}_2)\pm z_{\alpha/2}\cdot\sigma_{p_1-p_2} \end{equation*}
\begin{equation*} \overline{p}_1-\overline{p}_2\approx 0.372-0.221=0.151 \end{equation*}
\begin{equation*} z_{\alpha/2}=NORM.S.INV(0.975)\approx 1.96 \end{equation*}
\begin{equation*} \sigma_{p_1-p_2}\approx 0.0469 \end{equation*}
Margin of error: \(1.96(0.0469)\approx 0.0919\)
LCL:
\begin{equation*} 0.151-0.0919\approx 0.0591 \end{equation*}
UCL:
\begin{equation*} 0.151+0.0919\approx 0.2429 \end{equation*}
Interval: We are \(95\%\) confident that
\begin{equation*} 0.0591\lt p_1-p_2\lt 0.2429. \end{equation*}
Since \(.07\) is in the interval, we cannot conclude that the difference between \(p_1\) and \(p_2\) is more than \(7\%\text{.}\)