5.7.2 Prediction and residual values
All regression variants found in microdata.no have associated commands that generate, among other things, residual and prediction values. These are values that can be used to analyze the data spread and for testing regression models. Prediction values can also be used as input for further analyses.
The commands have the same name as the associated regression command plus -predict
.
Syntax:
logit-predict <variable> <variable list> [if <condition>] [, <options>]
probit-predict <variable> <variable list> [if <condition>] [, <options>]
The variables are specified in the same way as for the associated
regression model which is run with the command logit
or probit
.
The following values can be retrieved:
logit-predict
: Probability values, prediction values, and residualsprobit-predict
: Probability values and prediction values
You decide which values you want to generate through the use of options. The result of the runs is a set of variables that contain the different values. By default, the former value type is generated, but it is still recommended to specify value type through options as this makes you able to create names for the generated variables inside parentheses as shown in the syntax example below. If you run several predict commands, you have to create new names for the automatically generated variables.
Syntax example:
logit-predict highwage age man wealth, residuals(res4) predicted(pred4) probabilities(prob4)
The automatically generated variables can be used as input for further
analyses or to be displayed graphically. Current graphical commands are
hexbin
and histogram
. By running a histogram
on the residual variable, one can check whether the residuals are normally distributed. The hexbin
command can also be used to create anonymized scatter plots where one combines two sets of values.
For more details, it is recommended to use the help logit-predict
or help probit-predict
command.