Now we are going to learn techniques to create models that fit our data. Regression analysis is the modeling procedure that we will study. We will discuss how to perform the calculations in the formulas involved in creating and assessing regression models. However, in most cases we will rely on the βRegressionβ tool in the Data Analysis tab of Excel.
\begin{align*}
\hat{y} \amp= \text{the predicted value of $y$ given all the $x$'s in the model} \\
x_1,x_2,\cdots, x_k \amp= \text{ the independent variables in the model}\\
k \amp= \text{ the number of independent variables in the model} \\
b_0 \amp= \text{ the $y$-intercept of the regression line} \\
b_k \amp= \text{ the average change in } \hat{y} \text{ due to a one-unit change in } x_k \text{ with all }\\
\amp \text{ other $x$'s constant}
\end{align*}
As a measure of productivity, Verizon Wireless records the number of customers each of its retail employees activates weekly. An activation is defined as either a new customer signing a cell phone contract or an existing customer renewing a contract. The data table found in this lessonβs Excel file shows the number of weekly activations for eight randomly selected employees along with their job-satisfaction levels rated on a scale of \(1-10\) (\(10=\) Most satisfied). external/sheets/VerizonExample.xlsx
Using \(\alpha=0.10\text{,}\) test to determine if the population correlation coefficient is not equal to zero. What conclusions can be made based on these results?
A hospital would like to develop a regression model to predict the total hospital bill for a patient based on the age of the patient (\(x_1\)), the patientβs length of stay (\(x_2\)), and the number of days in the hospitalβs intensive care unit (ICU) (\(x_3\)). Data for these variables can be found in the table in the Excel file below. external/sheets/HospitalExample.xlsx
For each additional year of age, the hospital bill increases by $113.56. For each additional day spent in the hospital, the hospital bill increases by $1218.63. For each additional day spent in the ICU, the hospital bill increases by $2213.21.
The model predicts that the hospital bill for a 76 year old, in the hospital for 5 days, with 3 days spent in the ICU to be y-hat = -462.5 + 113.555*76 + 1218.626*5 + 2213.213*3 = $20,900.45
The regression line will not pass through each of the data points. Hence, there is error between the true value of \(y\) from the data and the value, \(\hat{y}\text{,}\) predicted by the regression line. This difference is called the residual, \(e_i\text{.}\)
The mathematical procedure that is used to find the regression line is the least squares method. The least squares method aims to minimize the total squared error between the values of \(y\) and \(\hat{y}\text{.}\) This sum is also called the sum of squares error (SSE), and is definted by the formula
The ratio of these two numbers, \(R^2=\frac{SSR}{SST}\text{,}\) is called the coefficient of determination. It measures the percentage of the total variation of the dependent variable that is explained by the independent variable(s) in the model.
As a measure of productivity, Verizon Wireless records the number of customers each of its retail employees activates weekly. An activation is defined as either a new customer signing a cell phone contract or an existing customer renewing a contract. The data table found in this lessonβs Excel file shows the number of weekly activations for eight randomly selected employees along with their job-satisfaction levels rated on a scale of \(1-10\) (\(10=\) Most satisfied). external/sheets/VerizonExample.xlsx