Skip to main content

Section 13.1 Correlation Analysis

Recall that a scatterplot is a graph used to explore a relationship between two variables. The two variables can be defined further as the independent and dependent variables.
Figure 13.1.1. Scatter Plot (Made in GeoGebra by Mark Beckwith)

Definition 13.1.2.

  • An independent variable explains the variation in the dependent variable
  • A dependent variable is explained by one or more independent variables
Correlation analysis provides a way to measure the strength and direction of the linear relationship between the two variables (the aforementioned independent and dependent variables). This is done by computing the sample correlation coefficient, \(r\text{.}\)

Exercise 13.1.3.

(a)

    The range of the correlation coefficient is ...
  • -1 to 1
  • 0 to 1
  • -1 to 0
  • 0 to 100

(b)

    A positive value of the correlation coeffficient indicates that as \(x\) increases, ...
  • \(y\) also increases.
  • \(y\) decreases.
  • \(y\) stays the same.

(c)

    A negative value of the correlation coeffficient indicates that as \(x\) increases, ...
  • \(y\) increases.
  • \(y\) decreases.
  • \(y\) stays the same.
Recall that we used the Excel formula, CORREL, to calculate this number in a previous chapter. Now we improve on that skill by learning how to use a hypothesis test to assess the strength of the linear relationship described by \(r\text{.}\)
The population correlation coefficient, \(\rho\text{,}\) refers to the correlation between all values of two variables in a population. A value of \(\rho=0\) means that there is no linear relationship between \(x\) and \(y\text{.}\) We don’t know the value of \(\rho\) so we use the sample correlation coefficient to test whether we have enough evidence from the sample to conclude that there is a linear relationship between the variables in the population. The two hypotheses for this hypothesis test are:
\begin{gather*} H_0 : \rho\leq 0 \\ H_1 : \rho>0 \end{gather*}
The test statistic uses the Student’s \(t\)-distribution with formula:
\begin{equation*} t = \frac{r}{\sqrt{\frac{1-r^2}{n-2}}} \end{equation*}
Degrees of freedom here are: \(df=n-2 \text{.}\)

Exercise 13.1.4.

The housing market in the United States is generally affected by economic conditions and interest rates, but also by the time of the year. More people usually buy and sell homes in the spring summer months. We want to see if there is a strong correlation between new home listings and temperature. The Excel file below includes the number of new housing listings in Colorado, as well as temperature and precipitation at Denver International Airport (DEN). external/sheets/Colorado Housing Inventory and Weather Data.xlsx

(a)

Determine the sample correlation coefficient between the number of new listings and the precipitation.
Answer.
\(r\approx 0.2867\)

(b)

Determine the sample correlation coefficient between the number of new listings and the maximum temperature.
Answer.
\(r\approx 0.6864\)

(c)

Using \(\alpha=0.02\text{,}\) test if the population correlation coefficient between the number of new listings and the maximum temperature is different from zero. What conclusions can you draw?
Answer.
We reject the null hypothesis and can conclude that the population correlation coefficient between the number of new listings and the maximum temperature is different from zero, and there is a linear relationship between these two variables.