How to aggregate data into family level
Individuals can be linked to a family number that can be used to aggregate data on family level. Individuals belonging to the same family are registered with the same family number which consists of the personal id number of the eldest person in the family.
In the example below, an individual level dataset is first created. It is then filtered down to persons in families consisting of married couples with small children (codevalue 2.1.1). Next, demographical information is imported into the dataset.
In order to create data on family level income, one must first create a new dataset for this purpose (variables of different unit types can not be mixed together in a single dataset). A variable measuring work related income on individual level is then imported into the new dataset, before the commando
collapse (sum) is used to sum income into family level (
by(famnr)). This results in a dataset with family as unit type.
Finally, family income is merged into the individual level dataset using the commando
//Connect to databank require no.ssb.fdb:1 as fdb1 //Create an individual level dataset consisting of persons in families defined by married couples with small children create-dataset persondata import fdb1/BEFOLKNING_REGSTAT_FAMTYP 2010-01-01 as famtype tabulate famtype keep if famtype == '2.1.1' //Add demographical information import fdb1/BEFOLKNING_KJOENN as sex import fdb1/BEFOLKNING_FOEDSELS_AAR_MND as birthyearmonth generate age = 2010 - int(birthyearmonth/100) import fdb1/BOSATTEFDT_BOSTED 2010-01-01 as municipality generate county = substr(municipality,1,2) import fdb1/BEFOLKNING_BARN_I_HUSH 2010-01-01 as children //Create dataset for generating family level income (unit type = family) create-dataset familydata import fdb1/BEFOLKNING_REGSTAT_FAMNR 2010-01-01 as famnr import fdb1/INNTEKT_WYRKINNT 2010-01-01 as income collapse (sum) income, by(famnr) rename income familyincome //Merge family income into individual level dataset (unit type = persons) merge familyincome into persondata on PERSONID_1 //Generate family level statistics. The family number consists of the personal id of the eldest person in the family, so by removing individuals with missing family level income the dataset now has unit type = family. All individual information will be assosiated with the eldest person in the family. use persondata drop if sysmiss(familyincome) rename age age_oldest rename sex sex_oldest define-labels countytxt '01' 'Østfold' '02' 'Akershus' '03' 'Oslo' '04' 'Hedmark' '05' 'Oppland' '06' 'Buskerud' '07' 'Vestfold' '08' 'Telemark' '09' 'Aust-Agder' '10' 'Vest-Agder' '11' 'Rogaland' '12' 'Hordaland' '14' 'Sogn og Fjordane' '15' 'Møre og Romsdal' '16' 'Sør-Trøndelag' '17' 'Nord-Trøndelag' '18' 'Nordland' '19' 'Troms' '20' 'Finnmark' '99' 'Uoppgitt' assign-labels county countytxt tabulate county histogram age_oldest, discrete histogram children, percent tabulate children tabulate children, cellpct tabulate children sex_oldest summarize familyincome barchart (mean) familyincome, by(county) barchart (mean) familyincome, by(children) histogram familyincome, freq histogram familyincome, by(children) percent