Skip to main content

Section 9.4 Confidence Intervals for the Mean (\(\sigma\) unknown)

Up to this point we have assumed that the population standard deviation, \(\sigma\text{,}\) was known. This is unrealistic -- since we are creating an interval to estimate the population mean, \(\mu\text{,}\) we likely don’t know the population standard deviation either! Hence we will estimate \(\sigma\) with the value of the sample standard deviation, \(s\text{.}\) But this introduces another source of unreliability, especially in small samples. To keep the confidence interval at the desired level, we make the intervals wider by replacing the critical values in our confidence interval formula, \(z_{\alpha/2}\text{,}\) with larger critical values, \(t_{\alpha/2}\text{.}\)

Subsection 9.4.1 The Student’s \(t\)-Distribution

The larger critical values come from the Student’s \(t\)-distribution developed in 1908 by an Irish brewing employee, William S. Gosset. He was a Guinness Brewery employee researching new methods of manufacturing ale. He needed a distribution that could be used with small samples. Employees were not allowed to publish research results, so he published under the pseudonym, “Student”.
The key properties of the Student’s \(t\)-Distribution:
  1. It is symmetric around the mean (which is 0 just like the standard normal distribution) and mound-shaped (similar to bell-shaped).
  2. It is a family of curves based on the concept of degrees of freedom (df), which refer to the number of values that are free to vary. As the degrees of freedom increases, the shape of the \(t\)-distribution becomes similar to the normal distribution.
    When dealing with the sample mean, \(df=n-1\text{.}\)
  3. The area under the curve is equal to \(1\text{.}\)
  4. The \(t\)-distribution is flatter and wider than the normal distribution. This means that the critical score for the \(t\)-distribution is therefore higher than the critical \(z\)-score for the same confidence level. This results in wider confidence intervals when using the \(t\)-distribution.

Subsection 9.4.2 Confidence Interval When \(\sigma\) is Unknown

The formula for the confidence interval when \(\sigma\) is unknown is:
\begin{equation*} \overline{x}\pm t_{\alpha/2}\cdot\hat{\sigma}_{\overline{x}}, \end{equation*}
where
\begin{equation*} \hat{\sigma}_{\overline{x}}=\frac{s}{\sqrt{n}}. \end{equation*}
How do we find the critical values, \(t_{\alpha/2}\text{?}\)
  1. Table: Use this link https://www.brockport.edu/live/files/6866-studentstdistributiontablepdf (or Table 5 in Appendix A of the textbook). Note that you will locate the degrees of freedom along the left column and the confidence level across the top of the table -- the desired critical value is located where the two meet inside the table.
  2. Excel: Use the Excel formula
    \begin{equation*} \boxed{T.INV.2T(\alpha,df)}, \end{equation*}
    where \(\alpha=\text{ the significance level and }df=n-1.\)

Exercise 9.4.1.

(Donnelly 8.19)
Construct a \(90\%\) confidence interval to estimate the population mean when \(\overline{x}=68\) and \(s=13.9\) for the sample sizes below.
(a)
\(n=18\)
Answer.
\(df=17,\; t_{.05}=T.INV.2T(.10,17)\approx 1.74\)
\(\hat{\sigma_{\overline{x}}}=\frac{13.9}{\sqrt{18}}\approx 3.276\)
\(ME_{\overline{x}}\approx 1.74\cdot (3.276)\approx 5.7007\)
\(\overline{x}\pm ME_{\overline{x}}\rightarrow 68\pm 5.7007\)
We are \(90\%\) confident that
\begin{equation*} 62.299\lt \mu\lt 73.701 \end{equation*}
(b)
\(n=41\)
Answer.
\(df=40,\; t_{.05}=T.INV.2T(.10,40)\approx 1.684\)
\(\hat{\sigma_{\overline{x}}}=\frac{13.9}{\sqrt{41}}\approx 2.171\)
\(ME_{\overline{x}}\approx 1.684\cdot (2.171)\approx 3.656\)
\(\overline{x}\pm ME_{\overline{x}}\rightarrow 68\pm 3.656\)
We are \(90\%\) confident that
\begin{equation*} 64.344\lt \mu\lt 71.656 \end{equation*}
(c)
\(n=64\)
Answer.
\(df=63,\; t_{.05}=T.INV.2T(.10,63)\approx 1.669\)
\(\hat{\sigma_{\overline{x}}}=\frac{13.9}{\sqrt{64}}\approx 1.7375\)
\(ME_{\overline{x}}\approx 1.669\cdot (1.7375)\approx 2.8999\)
\(\overline{x}\pm ME_{\overline{x}}\rightarrow 68\pm 2.8999\)
We are \(90\%\) confident that
\begin{equation*} 65.1001\lt \mu\lt 70.8999 \end{equation*}

Exercise 9.4.2.

    Describe the effect on the interval by increasing the sample size.
  • As the sample size increases, the width of the interval decreases.
  • As the sample size increases, the width of the interval increases.
  • As the sample size increases, the width of the interval might increase or decrease.
  • As the sample size increases, the width of the interval stays the same.

Exercise 9.4.3.

(Donnelly 8.23)
A cruise company would like to estimate the average beer consumption to plan its beer inventory levels on future seven-day cruises. (The ship certainly doesn’t want to run out of beer in the middle of the ocean!) The average beer over 18 randomly selected seven-day cruises was \(81,977\) bottles with a standard deviation of \(4,502\) bottles.
(a)
Construct a \(90\%\) confidence interval to estimate the average beer consumption per cruise.
Answer.
\(n=18\rightarrow df=17,\;\; \overline{x}=81977,\;\; s=4502\)
\(\hat{\sigma_{\overline{x}}}=\frac{4502}{\sqrt{18}}\approx 1061,\;\; t_{.05}=T.INV(.95)\approx 1.74\)
\(ME_{\overline{x}}\approx 1.74\cdot (1061)\approx 1846.14\)
\(\overline{x}\pm ME_x \rightarrow 81977\pm 1846.14\)
  • LCL: \(80131\)
  • UCL: \(83823\)
We are \(90\%\) confident that the average number of bottles consumed is between 80131 and 83823.
(b)
What assumptions need to be made to construct this interval?
Answer.
Since \(n\lt 30\text{,}\) the population must be normally distributed.

Exercise 9.4.4.

(Donnelly 8.27)
According to a travel website, workers in a certain country lead the world in vacation days, averaging 41 days per year. The data in the Excel file below shows the number of paid vacation days for a random sample of 20 workers from this country.
(a)
Construct a \(95\%\) confidence interval to estimate the average number of paid vacation days for workers from this country.
Answer.
  • LCL: \(32.712\)
  • UCL: \(47.588\)
(b)
Do the results from this sample validate the travel website’s findings?
Answer.
Since \(41\) falls in this interval, the website’s findings are validated.
(c)
What assumptions need to be made about this population?
Answer.
Since \(n\lt 30\text{,}\) the population must be normally distributed.