Forrest Young's Notes
Copyright © 1997-9 by Forrest W. Young.
We return to the GPA and Verbal SAT variables. The scatterplot for these data is shown below:
Simple Regression Example:
1999 GPA and Verbal SAT
Note: We are using Verbal SAT divided by 100 to clarify the discussion of the slope (so that we can see a change of one unit on the plot). This change in the variable (dividing by a constant) does not change the relationship between the two variables, and does not change either the correlation or regression analysis.
ViSta Regres: Before doing the regression analysis, we divided the SAT scores by 100 so that we can understand the results better (see below). We do this by clicking on the data object, and then typing in the listener:
:variables '("GPA" "MSAT/100" "VSAT/100")
:data (combine (bind-columns gpa
(/ mathsat 100)
(/ verbsat 100))))
Now we do the regression analysis using ViSta's Regression Analysis module, which can be done by clicking on the Regres button on the workmap.
Report: We then ask for the regression report. It is shown below.
The regression analysis report has three major sections, each containing important information about the analysis:
- Parameter Estimates: The parameter estimates section of the report presents information about the slope and intercept.
Under the "Estimate" column the report presents the values for the intercept and slope of the line that regression analysis estimates produces the best fit to the points.
The intercept and slope are often called the "coefficients", because they are the coefficients of the regression line. They are called "estimates" (short for "estimated coefficients") because they are estimates of what the coefficients are in the population.
- Intercept:The report calls the intercept the "Constant". Regression analysis estimates it to be a=1.41. This means that if we had someone with a Verbal SAT of zero, we would estimate that person's GPA to be 1.41.
Notice that this value doesn't make much sense! In fact, the Intercept is usually not interpreted, especially if a value of zero for the predictor variable can't really be obtained in practice.
- Slope:The report presents the slope for the "VerbSAT/100" variable. Regression analysis estimates it to be b=0.28.(It would have been .0028 if we hadn't divided by 100, which would appear as 0.00 on the report, not very useful1). This means that for every point change in VerbSAT/100 (which corresponds to 100 points change in Verbal SAT) we expect a change in GPA of 0.28.
Thus, for a person whose SAT is 100 points higher than another person's, we would predict that the first person's GPA would be .28 points higher than the second person's. This makes good sense, and is an important part of the results of regression analysis.
- Regression Line: The equation for the regression line is:
GPA = 1.41 + 0.28(VerbSAT/100)
This regression line appears on the regression plot that is part of the visualization. It is shown here.. Notice that it goes up .28 GPA unit for each VerbSAT/100 unit that we move to the right.
- Std. Error: The report has a column labeled Standard Error. This column presents the standard errors of the estimated coefficients. This measures the stability of the estimates.
Note: This is not what the book calls the "Standard Error of Estimate". That value is presented below by the name "Sigma Hat (RMS Error)".
- t-ratio, P-Value: These provide a significance test for each of the estimated coefficients. The test is of the null hypothesis that the tested coefficient (intercept or slope) is zero.
Note: The question of whether the slope is zero gets at the question of the nature of the relationship between the two variables. This is important because the question is: Does one variable change when the other does? (Note that zero intercept makes little interpretive sense and the test is usually ignored).
Summary of Fit
- R Squared: This is the square of the correlation between the two variables. It is the coefficient of determination that measures the variance shared between the variables.
- Sigma Hat (RMS Error): This is what the books calls the "Standard Error of Estimate". It specifies the Root-Mean-Squared (RMS) average --- or standard --- distance between the points and the line, measured vertically.
Analysis of Variance: An analysis of variance is reported that tells us whether the entire regression model significantly fits the response variable. The entire model includes both the slope and intercept simultaneously. The null hypothesis is that there is no relation between the two variables. The F-Ratio and P-Value summarize this test's results. The R-Squared tells us the proportion of variance in GPA that is understood from MSAT.