How to aggregate data into family level
Individuals can be linked to a family number that can be used to aggregate data on family level. Individuals belonging to the same family are registered with the same family number which consists of the personal id number of the eldest person in the family.
In the example below, an individual level dataset is first created. It is then filtered down to persons in families consisting of married couples with small children (codevalue 2.1.1). Next, demographical information is imported into the dataset.
In order to create data on family level income, one must first create a new dataset for this purpose (variables of different unit types can not be mixed together in a single dataset). A variable measuring work related income on individual level is then imported into the new dataset, before the commando
collapse (sum) is used to sum income into family level (
by(famnr)). This results in a dataset with family as unit type.
Finally, family income is merged into the individual level dataset using the commando
//Create an individual level dataset consisting of persons in families defined by married couples with small children create-dataset persondata import BEFOLKNING_REGSTAT_FAMTYP 2010-01-01 as famtype tabulate famtype keep if famtype == '2.1.1' //Add demographical information import BEFOLKNING_KJOENN as kjønn import BEFOLKNING_FOEDSELS_AAR_MND as faarmnd generate alder = 2010 - int(faarmnd/100) import BOSATTEFDT_BOSTED 2010-01-01 as bosted generate fylke = substr(bosted,1,2) import BEFOLKNING_BARN_I_HUSH 2010-01-01 as antbarn //Create dataset for generating family level income (unit type = family) create-dataset familiedata import BEFOLKNING_REGSTAT_FAMNR 2010-01-01 as famnr import INNTEKT_WYRKINNT 2010-01-01 as yrkesinnt collapse (sum) yrkesinnt, by(famnr) rename yrkesinnt familieinnt //Merge family income into individual level dataset (unit type = persons) merge familieinnt into persondata on PERSONID_1 //Generate family level statistics. The family number consists of the personal id of the eldest person in the family, so by removing individuals with missing family level income the dataset now has unit type = family. All individual information will be assosiated with the eldest person in the family. use persondata drop if sysmiss(familieinnt) rename alder alder_eldst rename kjønn kjønn_eldst define-labels fylketekst '01' 'Østfold' '02' 'Akershus' '03' 'Oslo' '04' 'Hedmark' '05' 'Oppland' '06' 'Buskerud' '07' 'Vestfold' '08' 'Telemark' '09' 'Aust-Agder' '10' 'Vest-Agder' '11' 'Rogaland' '12' 'Hordaland' '14' 'Sogn og Fjordane' '15' 'Møre og Romsdal' '16' 'Sør-Trøndelag' '17' 'Nord-Trøndelag' '18' 'Nordland' '19' 'Troms' '20' 'Finnmark' '99' 'Uoppgitt' assign-labels fylke fylketekst tabulate fylke histogram alder_eldst, discrete histogram antbarn, percent tabulate antbarn tabulate antbarn, cellpct tabulate antbarn kjønn_eldst summarize familieinnt barchart (mean) familieinnt, by(fylke) barchart (mean) familieinnt, by(antbarn) histogram familieinnt, freq histogram familieinnt, by(antbarn) percent