Linear regression analysis
Linear regression analyses (OLS) are used to estimate marginal effects/coefficient values for a set of explanatory variables where the outcome/response variable is metric. Through options, the output can be adapted (hide the fixed term, change the significance level, show model diagnostics, show robust estimates, use cluster estimation, etc.).
//Connect to database
require no.ssb.fdb:23 as db
//Start by importing relevant variables
create-dataset demographydata
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month
import db/BEFOLKNING_STATUSKODE 2020-01-01 as regstat
import db/SIVSTANDFDT_SIVSTAND 2020-01-01 as civstat
import db/INNTEKT_BRUTTOFORM 2020-01-01 as wealth
import db/INNTEKT_WYRKINNT 2021-01-01 as work_income21
//Limit the population
generate age = 2020 - int(birth_year_month / 100)
keep if regstat == '1' & age > 15 & age < 67
//Adapting the independent variables so that they suit the statistical model (most of the variables needs to be transformed into dummy variables)
generate male = 0
replace male = 1 if gender == '1'
generate married = 0
replace married = 1 if civstat == '2'
generate wealth_high = 0
replace wealth_high = 1 if wealth > 1500000
//Test for correlation between two of the independent variables
correlate age wealth_high
//Run regression analysis where the dependent variable is allways listed first
regress work_income21 male married age wealth_high