Multilevel analysis
The script below shows an example of how to prepare data for and run multilevel analyses.
Two cases are demonstrated: Geographic multilevels, and levels linked to education and occupation.
Except for this, population and variable set-up are the same in both cases, albeit with some adjustments to avoid using the same variable both as explanatory variable and group variable (must be avoided). The response variable in both cases is wage.
Note that the variance estimates for the group variables part_of_country (country region) and county are equal in the example below. This is most likely due to the fact that the variance for these two variables are approximately the same and that they capture very similar information about the variance in wage. A solution to this could be to drop one of the group variables and run a two-level analysis. In the second case with group variables for education level and occupational group, the variance are more unequal for the two group variables, and thus the difference in the variance estimates in the multilevel model is also greater.
The commands boxplot
and histogram
are useful tools for studying group-wise distributions regarding the response variable. In practice, you use the response variable (wage) as input to the command, and then group on relevant group variables in line with the syntax conventions for the command in question. In the script below, boxplot
is used to study the group variables.
If you use the command tabulate county, summarize(wage) std
, specific figures for the standard deviation are shown distributed among each group in the group variable county, cf. the examples below (the overall standard deviation for wage can be found at the bottom of the same table). The same can be done for other group variables, such as part_of_country. By comparing the standard deviations for counties and country regions, it is clear that the differences in standard deviations are not large either within each group variable, or between the two group variables. This is not the case for the group variables education_level and occupation_group.
require no.ssb.fdb:30 as db
// Import and prepare data for multilevel analysis
create-dataset mldata
import db/BOSATTEFDT_BOSTED 2022-12-31 as municipality
import db/BEFOLKNING_FOEDSELS_AAR_MND as birthdate
generate age = 2022 - int(birthdate/100)
keep if inrange(age,20,60)
import db/BEFOLKNING_KJOENN as gender
import db/INNTEKT_LONN 2022-12-31 as wage
import db/SIVSTANDFDT_SIVSTAND 2022-12-31 as marital_status
import db/BEFOLKNING_BARN_I_REGSTAT_FAMNR 2022-01-01 as num_children
import db/NUDB_SOSBAK as social_background
import db/NUDB_BU 2022-09-01 as education
import db/REGSYS_ARB_YRKE_STYRK08 2022-11-16 as occupation
generate male = gender == '1'
generate married = marital_status == '2'
generate education_level = substr(education,1,1)
generate occupation_group = substr(occupation,1,1)
generate county = substr(municipality,1,2)
generate high_social_background = inlist(social_background,'1','2')
generate region = 1 if inlist(county,'03','30','34','38') // Østlandet
replace region = 2 if inlist(county,'11','42') // Sørlandet og Rogaland
replace region = 3 if inlist(county,'46') // Vestlandet
replace region = 4 if inlist(county,'15','50') // Midtnorge
replace region = 5 if inlist(county,'18','54') // Nordnorge
generate part_of_country = 1 if inlist(region,1,2,3)
replace part_of_country = 2 if inlist(region,4,5)
define-labels part_of_country_txt 1 "South Norway" 2 "Mid-/North Norway"
assign-labels part_of_country part_of_country_txt
destring county education_level occupation occupation_group
define-labels county_text 3 "Oslo" 11 "Rogaland" 15 "Møre og Romsdal" 18 "Nordland" 30 "Viken" 34 "Innlandet" 38 "Vestfold og Telemark" 42 "Agder" 46 "Vestland" 50 "Trøndelag" 54 "Troms og Finnmark"
assign-labels county county_text
generate education_group = 1 if inrange(education_level,0,4)
replace education_group = 2 if inrange(education_level,5,6)
replace education_group = 3 if inrange(education_level,7,8)
define-labels education_group_txt 1 "Low education" 2 "Medium education" 3 "High education"
assign-labels education_group education_group_txt
// --------- Case 1: Wage vs part of country and county -------------
tabulate county part_of_country
boxplot wage, over(county)
boxplot wage, over(part_of_country)
tabulate county, summarize(wage) std
tabulate part_of_country, summarize(wage) std
// Single-level analysis (OLS)
regress wage male married num_children high_social_background age i.education_group
// Two-level analysis
regress-mml wage male married num_children high_social_background age i.education_group by part_of_country
regress-mml wage male married num_children high_social_background age i.education_group by county
// Three-level analysis
regress-mml wage male married num_children high_social_background age i.education_group by part_of_country county
// ---------- Case 2: Wage vs education level and occupation hierarchy ---------
tabulate education_level, missing
tabulate education_group, missing
tabulate occupation_group, missing
boxplot wage, over(education_level)
boxplot wage, over(education_group)
boxplot wage, over(occupation_group)
tabulate education_level, summarize(wage) std
tabulate education_group, summarize(wage) std
tabulate occupation_group, summarize(wage) std
// Single-level analysis (OLS)
regress wage male married num_children high_social_background age i.county
// Two-level analysis
regress-mml wage male married num_children high_social_background age i.county by education_level
regress-mml wage male married num_children high_social_background age i.county by occupation_group
// Three-level analysis
regress-mml wage male married num_children high_social_background age i.county by education_level occupation_group