¶ … Building and Assumptions
Use the Best Subsets approach to refine the predictive models constructed using multiple linear regression
Employ techniques (including residual analysis) to test the assumptions of predictive models obtained through multiple linear regression
The core of predictive modeling is the search for useful predictors. Prediction is centered on a problem that is defined by the size of the data set (the number of cases or observations) and the number or width of potential predictors that can be used to address the problem. A common issue for problem solution is the enormous number of potential predictors that have a weak association with the solution. Computer modeling enables the huge number of models to be fit to subsets of the data and tested across additional data subsets. Each test provides an evaluation of the strength of each individual predictor. The focus, then, of predictive modeling is the search for good subsets of explanatory variables (predictors). Accordingly, models that fit well with the data are desirable, while models that are a poor fit for the data are not desirable. Moreover, generally speaking, simple models are preferred over complex models. The process of predictive modeling is to generate a list of useful explanatory variables and, using the data available, fit many models to the data. The outcome of predictive modeling is achieved by assessing the simplicity of the models plus the fit between the data and the model.
B. Observations
When judging the best subset in a linear regression, the following criteria may be used:
The model with the largest R-squared
The model with the largest adjusted R-squared
The model with the smallest MSE (or S = square root of MSE)
The R-squared criterion and the MSE criterion were used to select the best subset in this activity. [Note: Mallow's Cp-statistic was not used for these observations.]
Step 1. The variables entered in this step include 5 Pre-Test, and the R-Square value is 0.462, and the MSE is 7,440,136.68.
Step 2. The variables entered in this step include 1 Curriculum (CU) and 5 Pre-Test (PT), and the R-Square value is 0.803, and the MSE is 6,469,609.65.
Step 3. The variables entered in this step include 1 Curriculum (CU), 4 Readiness Test (RT), and 5 Pre-Test (PT). And the R-Square value is 0.863, and the MSE is 4,637,852.14.
Step 4. The variables entered in this step include 1 Curriculum (CU), 2 Household Income (IN), 4 Readiness Test (RT), and 5 Pre-Test (PT), and the R-Square value is 0.884, and the MSE is 3,563,385.51.
Step 5. The variables entered in this step include 1 Curriculum (CU), 2 Household Income (IN), 3 Teacher Experience (TE), 4 Readiness Test (RT), and 5 Pre-Test (PT). And the R-Square value is 0.884, and the MSE is 2,850,735.33.
Residuals are shown in the ANOVA as 8,678,396, 57.
Conducting a Stepwise Forward Regression yielded similar results.
C. Findings
Given that the formula for the R2-value is:
R2 = SSR/SSTO = 1?SSE/SSTO
It can be seen that the R2-value increases as more variables are added. This means that the "best" model cannot be the model with the largest R2-value. Instead, the R2-values can be used to find the point where adding more predictors doesn't make sense since they don't fit the criteria. In other words, the additional predictors only yield tiny increases in the R2-value. Essentially, then, both the magnitude (number of predictors) and size of the increase in R2 are considered.
Using the R2-value criterion, it can be seen that increase in R2-value from the best one-predictor model to the best two-predictor model is 0.462 to 0.803. This is a substantial increase, yet the R2-value from the best two-predictor model to the best three-predicator model is from 0.803 to 0.863, which also warrants attention. Finally, the increase from the best three-predictor model to the best four-predictor model is 0.863 to 0.884, after which the best five-predictor model also shows 0.884. It would be reasonable to recommend using the four-predictor model, although the three-predictor model could be considered given that the Mallow's Cp-statistic is not being used here to account for the bias and variation in the predicted responses. The best model is a four-predictor model using the predictors CU, IN, RT, PT.
You’re 80% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.