Problems of Regression Model Using Ordinary Least Squares 

Problem 
Consequences 
Check 
Remedial action 

Problems due to assumptions of least squares 

Residuals 
not normal 
inferential test procedures based on F test may be invalid 
Rankit plot: Shapiro W test (and others) 
Transform y values (BoxCox transformation). Use of different error models (Generalised linear modelling.) 
heteroskedasticity 
Biased estimation of error variance and hence inferential test procedures may be invalid 
Plot residuals against y . x’s & other variables. Anscombe’s test (and others) 
Transform y variable y ˝ .log(y).y 

not independent 
Inferential test procedures may be invalid. Underestimate true sampling variance of regression estimates. Inflated R 
Residual plots. Some tests (e.g.: DurbinWatson; space: Moran) 
Iterated generalised least squares 

Non linearity of functional relationship 
Poor fit; meaningless results: non independent residuals 
Scatterplots of y against x’s. Added variable plots. 
transforms x’s and/or y variables 

Problems due to the nature of data 

Multicollinearity amongst explanatory variable 
(X 
Correlation measures. tests based on eigenvalues of (X 
Transform explanatory variables. Delete variables. Ridge regression 

Difficulties in performing: efficient analysis sifting out variables 

Added variable plots for variable selection. Transform x’s and/or y to simplify model. Stepwise regression 


Outliers and leverage effects 
May severily distort model fit. Model fit is dependent on a few values. 

Robust. resistant regression. Data deletion 

Inacurate data 
Meaningless results 
Exploratory data methods may highlights errors 
Delete or replace inaccurate values 

Incomplete data

Missing at random: could be wasteful of other information if this has to be discarded. Not missing at random: suspect inferences 

Estimate missing values (missing at random). Reduce data matrix to the cases with full information 

Categorical 
“Normal” linear regression model inappropriate 

Generalised linear model (e.g. logistic regression). 

Source: nfm 