Skip to main content

Section 12.2 Comparing Two Means with Independent Samples

In this section, we study two-sample hypothesis tests and confidence intervals. First, we will assume that the standard deviation, \(\sigma\text{,}\) is known. Then we will look at the more realistic case where \(\sigma\) is unknown.

Definition 12.2.1.

  • The sampling distribution for the difference in means is normal given normal populations or large samples (\(n\geq 30\)). The mean of this sampling distribution is
    \begin{equation*} \mu_{\overline{x}_1-\overline{x}_2}=\mu_{\overline{x}_1}-\mu_{\overline{x}_2}. \end{equation*}
  • The standard error for this sampling distribution is
    \begin{equation*} \sigma_{\overline{x}_1-\overline{x}_2}=\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}. \end{equation*}
  • The test statistic for a hypothesis test comparing the difference between two means with independent samples and known standard deviations is defined by:
    \begin{equation*} z_{\overline{x}} = \frac{(\overline{x}_1 - \overline{x}_2)-(\mu_1-\mu_2)_{H_0}}{\sigma_{\overline{x}_1 - \overline{x}_2}}, \end{equation*}
    where
    \((\mu_1-\mu_2)_{H_0}\) = the hypothesized difference in population means
    (defined by the null hypothesis)
    \(\sigma_{\overline{x}_1-\overline{x}_2}\) = the standard error for the difference between the
    two means
    \(\overline{x}_1-\overline{x}_2\) = the difference in sample means between
    Populations 1 and 2
This formula is a little unwieldy so whenever we are dealing with raw data (not just summary statistics), we will use Excel to do all the time-consuming calculations.
Now let’s demonstrate the two ways we can perform this type of hypothesis test. First, in Exercise 12.2.2 we will complete the hypothesis test by hand, manually computing all of the required calculations. Then in Exercise 12.2.3> we will rely on Excel to do the calculations. After stating the two hypotheses, we will navigate to the Data Analysis tool and choose z-Test Two-Sample for Means to complete the test.

Exercise 12.2.2.

(Donnelly 10.7)
Suppose the Bureau of Labor Statistics would like to investigate if the average retirement age for a worker in Japan is higher than the average retirement age for a worker in the United States. A random sample of 30 retired U.S. workers had an average retirement age of 64.6 years. A random sample of 30 retired Japanese workers had an average retirement age of 67.5 years. Assume the population standard deviation for the retirement age in the U.S. is 4.0 years and for Japan is 4.5 years. Perform a hypothesis test using \(\alpha=0.05\) to determine if the average retirement age in Japan is higher than it is in the United States.
Answer.
Population 1 = average retirement age in Japan
Population 2 = average retirement age in US
\(\alpha=0.05,\; n_1=n_2=30,\; \overline{x}_1=67.5,\; \overline{x}_2=64.6,\; \sigma_1=4.5,\; \sigma_2=4.0\)
\(H_0: \; \mu_1-\mu_2\leq 0\)
\(H_1: \; \mu_1-\mu_2\gt 0\)
critical value: \(NORM.S.INV(0.95)\approx 1.645\)
standard error: \(\sigma_{\overline{x}_1-\overline{x}_2}=\sqrt{\frac{(4.5)^2}{30}+\frac{(4.0)^2}{30}}\approx 1.0992\)
test statistic: \(z_{\overline{x}}\approx \frac{(67.5-64.6)-0}{1.0992}\approx 2.64\)
Since the test statistic is bigger than the critical value and this is a right-tail test, we reject \(H_0\text{.}\)
There is enough evidence to conclude that the average retirement age in Japan is higher than in the USA.

Exercise 12.2.3.

(Donnelly Your Turn 1)
Major League Baseball officials (and many fans) have been concerned about the lengths of games, particularly playoff games. Suppose the officials would like to test the hypothesis that the mean length of a playoff game is longer than the mean length of a regular season game. The data in the Excel file below shows the length of games, in minutes, for randomly selected games from the regular season and from the playoffs. Assume the standard deviations of the playoff and regular season games are 25 and 21 minutes respectively. Using \(\alpha = 0.02\text{,}\) can we conclude that playoff games are longer, on average, than regular season games?
Answer.
The sample provides enough evidence to conclude that the average length of playoff games is longer than the average length of regular season games.

Definition 12.2.4.

The formula for the confidence interval for the difference between two means given known standard deviations is
\begin{equation*} ( \overline{x}_1-\overline{x}_2 ) \pm z_{\alpha/2}\cdot \sigma_{\overline{x}_1-\overline{x}_2} \end{equation*}
where again
\(\sigma_{\overline{x}_1-\overline{x}_2}\) \(=\) the standard error for the difference between two means
\(\overline{x}_1-\overline{x}_2\) \(=\) the difference in sample means between Populations 1 and 2
Remember that the book likes to identify the two sides of the interval as LCL (lower confidence limit) and UCL (upper confidence limit).

Exercise 12.2.5.

(Donnelly 10.9)
Expedia.com would like to estimate the difference between the average rental price of a car with automatic transmission versus the average rental price of a car with manual transmission at a certain airport. The table below shows the average one-week rental prices for two random samples, as well as the population standard deviations and sample sizes for each type of car.
Sample mean Sample size Population standard deviation
Automatic \(\$ 411.30\) \(52\) \(\$ 23\)
Manual \(\$ 372.80\) \(36\) \(\$ 27\)
Construct a \(90 \%\) confidence interval to estimate the difference in the average cost of a one-week rental between these two types of cars at the airport. Can you conclude that a difference exists in the average rental price of the two types of cars?
Answer.
  • Population 1: automatic
  • Population 2: manual
Formula:
\begin{equation*} (\overline{x}_1-\overline{x}_2)\pm z_{\alpha/2}\cdot \sigma_{\overline{x}_1-\overline{x}_2} \end{equation*}
Standard error:
\begin{equation*} \sigma_{\overline{x}_1-\overline{x}_2}=\sqrt{\frac{23^2}{52}+\frac{27^2}{36}}\approx 5.5157 \end{equation*}
\begin{equation*} z_{\alpha/2}=NORM.S.INV(0.95)\approx 1.645 \end{equation*}
MOE:
\begin{equation*} \approx 1.645(5.5157)\approx 9.0733 \end{equation*}
LCL:
\begin{equation*} \approx (411.30-372.80)-9.0733=29.4267 \end{equation*}
UCL:
\begin{equation*} \approx (411.30-372.80)+9.0733=47.5733 \end{equation*}
We are \(90\%\) confident that
\begin{equation*} 29.43\lt \mu_1-\mu_2\lt 47.57. \end{equation*}
Since the interval does not contain 0, we can conclude that there is a difference in the average cost of a rental with different transmissions.
When we don’t know the population standard deviations, we substitute the sample standard deviations in their place. Recall from Section 9.4 that this means the proper sampling distribution is the Student’s \(t\)-distribution instead of the normal distribution (as long as the sample sizes are large and/or the samples are drawn from normal populations).
In addition, we also need to determine whether or not to assume that the unknown variances are equal or unequal. We could use the \(F\)-test from Chapter 11 to draw this conclusion -- but most of the time (in this class) you will be told which situation applies. Let’s examine the formulas associated with each of these two cases.
  • Case 1: The population variances are not equal, i.e., \(\sigma_1^2\neq \sigma_2^2\)
    The test statistic and confidence interval formulas are
    \begin{equation*} t_{\overline{x}}=\frac{(\overline{x}_1-\overline{x}_2)-(\mu_1-\mu_2)_{H_0}}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\;\;\;\;\;\;\; (\overline{x}_1-\overline{x}_2)\pm t_{\alpha/2}\cdot\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} \end{equation*}
    where
    \((\mu_1-\mu_2)_{H_0}\) \(=\) the hypothesized difference in population means (defined by the null hypothesis)
    \(\overline{x}_1-\overline{x}_2\) \(=\) the difference in sample means between Populations 1 and 2
    \(s_1^2, n_1\) \(=\) the variance and size, respectively, of the sample from Population 1
    \(s_2^2, n_2\) \(=\) the variance and size, respectively, of the sample from Population 2
    The test statistic, \(t_{\overline{x}}\text{,}\) has degrees of freedom, \(df\text{,}\) defined by the following formula:
    \begin{equation*} df = \frac{\left( \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2} \right)^2}{\frac{\left(\frac{s_1^2}{n_1}\right)^2}{n_1-1}+\frac{\left(\frac{s_2^2}{n_2}\right)^2}{n_2-1}} \end{equation*}
    This looks messy, but note that you’ve already computed \(\frac{s_1^2}{n_1}\) and \(\frac{s_2^2}{n_2}\) during the test statistic calculation!
  • Case 2: The population variances are equal, i.e. \(\sigma_1^2=\sigma_2^2\)
    In this scenario, the denominator of the test statistic (which recall from Case 1 includes the sample variances) gets simplified because we can pool the two sample variances.
    The test statistic and confidence interval formulas are:
    \begin{equation*} t_{\overline{x}}=\frac{(\overline{x}_1-\overline{x}_2)-(\mu_1-\mu_2)_{H_0}}{\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\;\;\;\;\;\;\; (\overline{x}_1-\overline{x}_2)\pm t_{\alpha/2}\cdot\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} \end{equation*}
    The pooled variance, \(s_p^2\text{,}\) is the weighted average of 2 sample variances drawn from 2 populations.
    This formula is actually quite easy to use:
    \begin{equation*} s_p^2 = \frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{(n_1-1)+(n_2-1)} \end{equation*}
    The test statistic, \(t_{\overline{x}}\text{,}\) has degrees of freedom, \(df=n_1+n_2-2\text{.}\)
Once again, we will rely on Excel to do these computations for us whenever we have raw data. After stating the two hypotheses, go to the Data Analysis tool and choose “t-Test Two-Sample Assuming Equal Variances” or “t-Test Two-Sample Assuming Unequal Variances” to complete the test.

Exercise 12.2.6.

(Donnelly 10.52)
The airline industry measures fuel efficiency by calculating how many miles one seat can travel, whether occupied or not, on one gallon of jet fuel. The data in the Excel file below shows the fuel economy, in miles per seat, for 15 randomly selected flights on Delta and US Airways. Perform a hypothesis test using \(\alpha=0.05\) to determine if the average fuel efficiency differs between the two airlines. Assume the population variances for the fuel efficiency for these two airlines are not equal.

Exercise 12.2.7.

(Donnelly 10.45)
During a decline in the housing market in the 2010s, it appeared that the average size of a newly-constructed house fell. To investigate this trend, the square footages of a random sample of houses built in 2008 were compared to houses in 2018. The following table summarizes the sample means and standard deviations for the two samples drawn in 2008 and 2018. Assume that the population variances for the square footages of houses built in these two years are equal.
2008 2018
Sample mean 2,462.3 2,257.0
Sample standard deviation 760.8 730.2
Sample size 45 40

(a)

Using \(\alpha=0.05\text{,}\) perform a hypothesis test to determine if the average home constructed in 2008 was larger than a home built in 2018.
Answer.
  • Population 1: 2008
  • Population 2: 2018
\(H_0:\) \(\mu_1-\mu_2=0\)
\(H_1:\) \(\mu_1-\mu_2\gt 0\)
\(\alpha=0.05,\;\;\; df=45+40-2=83\)
\begin{equation*} s_p^2 = \frac{(44)(578,816.64)+(39)(533,192.04)}{44+39}\approx 557,378.5749 \end{equation*}
test statistic: \(t_{\overline{x}}=\frac{2462.3-2257-0}{\sqrt{557,378.5749\left( \frac{1}{45}+\frac{1}{40} \right)}}\approx 1.2654\)
\begin{equation*} \text{p-value}=T.DIST.RT(1.2654,83)\approx 0.1046 \end{equation*}
Since the p-value is bigger than \(\alpha\) (and \(t_{\overline{x}}\) is not in the rejection region), we fail to reject \(H_0\text{.}\)
There is not enough evidence to conclude that the average size of a newly constructed house fell in 2018 compared to 2008.

(b)

Construct a \(95\%\) confidence interval to estimate the average difference in the square footages of new homes constructed in these two years.
Answer.
\begin{equation*} T.INV.2T(0.05,83)\approx 1.9889, \end{equation*}
so the critical values are approximately \(-1.9889\) and \(1.9889\text{.}\)
Formula for confidence interval:
\begin{equation*} (\overline{x}_1-\overline{x}_2)\pm t_{\alpha/2}\cdot\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} \end{equation*}
Margin of error:
\begin{equation*} \approx 1.9899\sqrt{557,378.5749\left(\frac{1}{45}+\frac{1}{40}\right)}\approx 322.8 \end{equation*}
LCL:
\begin{equation*} \approx (2462.3-2257)-322.8=-117.5 \end{equation*}
UCL:
\begin{equation*} \approx (2462.3-2257)+322.8=528.1 \end{equation*}
We are \(95\%\) confident that
\begin{equation*} -117.5\lt \mu_1-\mu_2\lt 528.1 \end{equation*}

Exercise 12.2.8.

A statistics student wants to verify the claim that in a certain Fortune 500 company the average salary of female employees is different from the average salary of male employees. Two samples of male and female workers were obtained independently and analyzed. The sample data is in the Excel file below. (Assume that the samples are from normal populations.)

(a)

Which of the samples has a larger mean?
Answer.
The sample of men’s salaries has a larger mean than the sample of women’s salaries. (The sample mean women’s salary is about \(\$ 57,200\text{,}\) and the sample mean men’s salary is about \(\$ 69,338\text{.}\))

(b)

Is there sufficient evidence to conclude that male employees in this company have higher salaries than female employees?
Answer.
Population 1: women
Population 2: men
(Let’s use \(\alpha=0.10\text{.}\))
\(H_0: \; \mu_1 - \mu_2 \geq 0\)
\(H_1: \; \mu_1 - \mu_2 \lt 0\)
No, there is not sufficient evidence to conclude that male employees in this company have higher salaries than female employees because we got a p-value of \(\approx 0.4182\text{,}\) which is bigger than \(\alpha\text{.}\)