Problems of Regression Model Using Ordinary Least Squares |
||||
Problem |
Consequences |
Check |
Remedial action |
|
Problems due to assumptions of least squares |
||||
Residuals |
not normal |
inferential test procedures based on F test may be invalid |
Rankit plot: Shapiro W test (and others) |
Transform y values (Box-Cox transformation). Use of different error models (Generalised linear modelling.) |
heteroskedasticity |
Biased estimation of error variance and hence inferential test procedures may be invalid |
Plot residuals against y . x’s & other variables. Anscombe’s test (and others) |
Transform y variable y ˝ .log(y).y |
|
not independent |
Inferential test procedures may be invalid. Underestimate true sampling variance of regression estimates. Inflated R |
Residual plots. Some tests (e.g.: Durbin-Watson; space: Moran) |
Iterated generalised least squares |
|
Non linearity of functional relationship |
Poor fit; meaningless results: non independent residuals |
Scatterplots of y against x’s. Added variable plots. |
transforms x’s and/or y variables |
|
Problems due to the nature of data |
||||
Multicollinearity amongst explanatory variable |
(X |
Correlation measures. tests based on eigenvalues of (X |
Transform explanatory variables. Delete variables. Ridge regression |
|
Difficulties in performing: efficient analysis sifting out variables |
|
Added variable plots for variable selection. Transform x’s and/or y to simplify model. Stepwise regression |
|
|
Outliers and leverage effects |
May severily distort model fit. Model fit is dependent on a few values. |
|
Robust. resistant regression. Data deletion |
|
Inacurate data |
Meaningless results |
Exploratory data methods may highlights errors |
Delete or replace inaccurate values |
|
Incomplete data
|
Missing at random: could be wasteful of other information if this has to be discarded. Not missing at random: suspect inferences |
|
Estimate missing values (missing at random). Reduce data matrix to the cases with full information |
|
Categorical |
“Normal” linear regression model inappropriate |
|
Generalised linear model (e.g. logistic regression). |
|
Source: nfm |