Recall that a scatterplot is a graph used to explore a relationship between two variables. The two variables can be defined further as the independent and dependent variables.
Correlation analysis provides a way to measure the strength and direction of the linear relationship between the two variables (the aforementioned independent and dependent variables). This is done by computing the sample correlation coefficient, \(r\text{.}\)
Recall that we used the Excel formula, CORREL, to calculate this number in a previous chapter. Now we improve on that skill by learning how to use a hypothesis test to assess the strength of the linear relationship described by \(r\text{.}\)
The population correlation coefficient, \(\rho\text{,}\) refers to the correlation between all values of two variables in a population. A value of \(\rho=0\) means that there is no linear relationship between \(x\) and \(y\text{.}\) We donβt know the value of \(\rho\) so we use the sample correlation coefficient to test whether we have enough evidence from the sample to conclude that there is a linear relationship between the variables in the population. The two hypotheses for this hypothesis test are:
The housing market in the United States is generally affected by economic conditions and interest rates, but also by the time of the year. More people usually buy and sell homes in the spring summer months. We want to see if there is a strong correlation between new home listings and temperature. The Excel file below includes the number of new housing listings in Colorado, as well as temperature and precipitation at Denver International Airport (DEN). external/sheets/Colorado Housing Inventory and Weather Data.xlsx
Using \(\alpha=0.02\text{,}\) test if the population correlation coefficient between the number of new listings and the maximum temperature is different from zero. What conclusions can you draw?
We reject the null hypothesis and can conclude that the population correlation coefficient between the number of new listings and the maximum temperature is different from zero, and there is a linear relationship between these two variables.