QUAN 2010 Comparing Variances from Two Populations

Section 11.4 Comparing Variances from Two Populations

When we want to compare the variances of two samples, we do this by conducting a test of the ratio of the variances. If the ratio is equal to \(1\text{,}\) then the variances are equal; if not, then they are unequal.

The sample variances is a good estimate of the population variance. Not surprisingly, the ratio of the two sample variances, \(\frac{s_1^2}{s_2^2}\text{,}\) drawn from their respective populations is a good estimate for the ratio of the two population variances, \(\frac{\sigma_1^2}{\sigma_2^2}\text{.}\) The sampling distribution of \(\frac{s_1^2}{s_2^2}\) is \(F\)-distributed with \(D_1=n_1-1\) and \(D_2=n_2-1\) degrees of freedom if we have independent samples from two normal populations. Since we are comparing two variances, the test statistic is:

\begin{equation*} F=\frac{s_1^2}{s_2^2}. \end{equation*}

The formula for this test statistic is easy to compute by hand, nevertheless, we can also use Excel to do the work if we have raw data (not summary statistics). After stating the two hypotheses, we will go to the Data Analysis tool and choose “F-Test Two-Sample for Variances” to complete the test.

Note 11.4.1.

In general, if we are testing to see if one variance is larger than the other, we want to choose population 1 to be the one with the larger sample standard deviation so that we will be doing a right-tail test.

Exercise 11.4.2.

A company is doing a hypothesis test on the variation in quality from two suppliers. Both distributions are normal, and the populations are independent. Use \(\alpha=0.05\text{.}\) A sample of 29 products were selected from Supplier 1 and a standard deviation of quality was found to be 5.8426. A sample of 30 products were selected from Supplier 2 and a standard deviation of quality was found to be 3.2156. Test to see if the variance in quality for Supplier 1 is larger than Supplier 2.

(a)

What are the correct hypotheses?

\(H_0: \sigma_1^2 \leq \sigma_2^2\)

\(H_1: \sigma_1^2 \gt \sigma_2^2\)
\(H_0: \sigma_1^2 \geq \sigma_2^2\)

\(H_1: \sigma_1^2 \lt \sigma_2^2\)
\(H_0: \sigma_1^2 = \sigma_2^2\)

\(H_1: \sigma_1^2 \neq \sigma_2^2\)

(b)

What test statistic should we use for this problem?

\(z\)
\(t\)
\(F\)
\(\chi^2\)

(c)

Find the test statistic.

Answer.

\begin{equation*} F=\frac{s_1^2}{s_2^2}=\frac{(4.147)^2}{(2.7311)^2}\approx 2.305647 \end{equation*}

(d)

What is the p-value?

Answer.

\begin{equation*} \text{p-value}\approx F.DIST.RT(2.305647,33-1,17-1)\approx 0.396 \end{equation*}

(e)

The decision is to...

Reject the null hypothesis.
Fail to reject the null hypothesis.

(f)

The correct summary would be:

There is enough evidence to support the claim that the variance in quality for Supplier 1 is larger than Supplier 2.
There is not enough evidence to support the claim that the variance in quality for Supplier 1 is larger than Supplier 2.
There is enough evidence to reject the claim that the variance in quality for Supplier 1 is larger than Supplier 2.
There is enough evidence to reject the claim that the variance in quality for Supplier 1 is larger than Supplier 2.

Exercise 11.4.3.

A statistics student wants to investigate salaries of men and women in a certain Fortune 500 company. Two samples of salaries of men and women employees were obtained independently and analyzed. The sample data is in the Excel file below. (Assume that the samples are from normal populations.)

external/sheets/CompanySalaries.xlsx

(a)

Which of the samples has a larger sample standard deviation?

Answer.

Use STDEV.S in Excel to find the sample standard deviations.

\begin{equation*} s_{\text{men}}=s_1\approx 15.0973 \end{equation*}

\begin{equation*} s_{\text{women}}=s_2\approx 13.9944 \end{equation*}

(b)

Use a \(10\%\) level of significance to decide whether there is sufficient evidence that in the Fortune 500 company, the standard deviation of the salary of men is greater than the standard deviation of the salary of women.

Answer.

\begin{equation*} H_0: \sigma_1^2 \leq \sigma_2^2 \end{equation*}

\begin{equation*} H_1: \sigma_1^2 \gt \sigma_2^2 \end{equation*}

Since we have the raw data from the samples, we can use “F-Test Two-Sample for Variances” in the Data Analysis tool in Excel.

external/sheets/CompanySalariesSolution.xlsx

From this, we’ll get a p-value of approximately 0.4182, which is bigger than \(\alpha=0.10\text{.}\) So we fail to reject the null hypothesis and do not have enough evidence to conclude that \(\sigma_1^2 \gt \sigma_2^2\text{.}\)

Exercises Exercises

1.

2.

A pharmaceutical company is about to launch a new manufacturing process in addition to the existing one. The quality control manager believes that the new method results in a different variation in the weights of the capsules. To verify the claim, the samples from each production line were obtained and the results are in the Excel file below (in mg):

external/sheets/1434306_data.xlsx

Use a \(5\%\) significance level to test the claim that the standard deviation of the capsule weights in the production line 1 is smaller than the standard deviation of the capsule weights in the production line 2. If normality plots are not provided assume that the samples are from normal populations.

Solution.

In the question, look for the keywords such as mean/average, proportion/percentage, variance/standard deviation! In this case, we have "standard deviation of the capsule weights in the production line" and two populations, thus we are going to perform Two Variances F Hypothesis Test.

Find the sample standard deviations for each production line:

Population 1:
\(STDEV.S(A2:E4)\approx 1.074\)
Population 2:
\(STDEV.S(A8:E11)\approx 1.909\)

Since \(1.909\gt 1.074\text{,}\) we’ll let \(s_1\approx 1.909\) and \(s_2\approx 1.074\text{,}\) and this will be a right-tailed test.

Hypotheses:

\begin{equation*} H_0: \;\;\sigma_1^2 \leq \sigma_2^2, \end{equation*}

where \(\sigma_1\) is the standard deviation of the capsule weights in production line 2 and \(\sigma_2\) is the standard deviation of the capsule weights in production line 1 and the units are mg.

\begin{equation*} H_1: \;\;\sigma_1^2 \gt \sigma_2^2 \end{equation*}

Test statistic:

\begin{equation*} F=\frac{s_1^2}{s_2^2}\approx \frac{1.074^2}{1.909^2}\approx 3.1574 \end{equation*}

Now find the p-value:

\begin{equation*} F.DIST.RT(3.1574,16-1,11-1)\approx 0.0359 \end{equation*}

Since the p-value is less than \(\alpha\text{,}\) we reject the null hypothesis and conclude that \(\sigma_1^2 \gt \sigma_2^2\text{,}\) so Production line 2 has a larger standard deviation of capsult weights than Production line 1.

(Alternatively, we can use the “F-Test Two-Sample for Variances” in the Data Analysis tool in Excel.)

3.

A nonprofit wants to verify the claim that in a certain Fortune 500 company the average salary of female employees is different from the average salary of male employees. Two samples of male and female workers were obtained independently and analyzed. The salaries of workers in the samples are shown in the Excel file below.

external/sheets/1434305_data.xlsx

The standard deviation for the first sample appears to be different from the standard deviation of the second sample. Use a \(10\%\) level of significance to decide whether there is sufficient evidence that in the Fortune 500 company, the standard deviation of the salary of males is the same as the standard deviation of the salary of females. Assume that the samples are from normal populations.

Answer.

In the question, look for the keywords such as mean/average, proportion/percentage, variance/standard deviation! In this case, we have "the standard deviation of the salary" and two populations, thus we are going to perform Two Variances F Hypothesis Test.

Unfortunately, the Data Analysis tool in Excel only provides the results for a one-tail hypothesis test for comparing two population variances.

In the sample of females: \(s=STDEV.S(A2:E4)\approx 13.1744\)
In the sample of males: \(s=STDEV.S(A8:E10)\approx 10.7037\)

Let females be population 1 and males be population 2.

Hypotheses:

\(\displaystyle H_0:\;\;\sigma_1^2=\sigma_2^2\)
\(\displaystyle H_1:\;\;\sigma_1^2\neq\sigma_2^2\)

Test statistic:

\begin{equation*} F=\frac{s_1^2}{s_2^2}=\frac{13.1744^2}{10.7037^2}\approx 1.5149 \end{equation*}

Find the critical value, \(F_{\alpha/2}\text{:}\)

\(\alpha=.1\Rightarrow \alpha/2=.05\)

\(F_{\alpha/2}=F.INV.RT(.05, 14-1, 14-1)\approx 2.5769\)

Even though this is a two-tailed test, we only consider the rejection region on the right side of the distribution. This is because when comparing two population variances, we always choose population 1 to be the one with the larger sample standard deviation. This guarantees that \(\frac{s_1^2}{s_2^2}\gt 1\) so the test statistic will be bigger than \(1\text{.}\)

Since the test statistic is not in the rejection region, we fail to reject the null hypothesis, and we do not have enough evidence to conclude that \(\sigma_1^2\neq\sigma_2^2\text{.}\)

4.

A group of economists wants to study the average annual leave among the US and EU workers. Two samples of US and EU workers were obtained independently and analyzed. The sample of 17 US workers had the average annual leave of 20.84 days and the standard deviation 4.378 days. The sample of 11 EU workers had the average annual leave of 25.92 days and the standard deviation 3.655 days. The standard deviation for the first sample appears to be different from the standard deviation of the second sample. Use a \(10\%\) level of significance to decide whether there is sufficient evidence that the standard deviation of the annual leave of US workers is greater than the standard deviation of the annual leave of EU workers. If normality plots are not provided assume that the samples are from normal populations.

Answer.

Population 1: US workers, Population 2: EU workers

\(s_1=4.378\text{,}\) \(s_2=3.655\)

\(\bar{x}_1=20.84\text{,}\) \(\bar{x}_2=25.92\)

\(n_1=17\text{,}\) \(n_2=11\)

\(\alpha=.10\)

\(H_0:\;\;\sigma_1^2\leq \sigma_2^2\)

\(H_1:\;\;\sigma_1^2\gt \sigma_2^2\)

Test statistic:

\begin{equation*} F=\frac{s_1^2}{s_2^2}\approx 1.43475 \end{equation*}

p-value:

\(F.DIST.RT(1.43475, 16,10)\approx 0.2855\gt\alpha\)

Since the p-value is greater than \(\alpha\text{,}\) we do not have enough evidence to reject the null hypothesis. So we do not have enough evidence to conclude that \(\sigma_1^2\gt \sigma_2^2\text{.}\)

5.

6.

7.

Is the variance for the amount of money, in dollars, that shoppers spend on Saturdays at the mall the same as the variance for the amount of money that shoppers spend on Sundays at the mall? Suppose that the Excel file below shows the results of a study:

external/sheets/1434301_data.xlsx

Use a \(5\%\) significance level to test the claim that the standard deviation for the amount of money that shoppers spend on Saturday is the same as the standard deviation for the amount of money that shoppers spend on Sunday. If normality plots are not provided assume that the samples are from normal populations.

Answer.

Procedure: In the question, look for the keywords such as mean/average, proportion/percentage, variance/standard deviation! In this case, we have "standard deviation for the amount of money that shoppers spend" and two populations, thus we are going to perform Two Variances F Hypothesis Test.

Saturday: \(s=STDEV.S(A2:E3)\approx 13.9939 \)

Sunday: \(s=STDEV.S(A7:E10)\approx 13.7633 \)

So let Saturday be “population 1” and Saturday be “population 2”.

\(H_0: \frac{\sigma_1^2}{\sigma_2^2}=1\)

\(H_1: \frac{\sigma_1^2}{\sigma_2^2}\neq 1\)

Test statistic:

\begin{equation*} F=\frac{s_1^2}{s_2^2}=\frac{13.9939^2}{13.7633^2}\approx 1.03377991 \end{equation*}

Critical value:

\(\alpha=5\%\Rightarrow \alpha/2=.025\)

\(F_{.025}=F.INV.RT(.025, 10-1, 19-1)\approx 2.92911\)

Since the test statistic is not in the rejection region, we fail to reject the null hypothesis, and we do not have enough evidence to conclude that \(\frac{\sigma_1^2}{\sigma_2^2}\neq 1\text{.}\)

Prev Top Next