How to aggregate data into family level

Individuals can be linked to a family number that can be used to aggregate data on family level. Individuals belonging to the same family are registered with the same family number which consists of the personal id number of the eldest person in the family.

In the example below, an individual level dataset is first created. It is then filtered down to persons in families consisting of married couples with small children (codevalue 2.1.1). Next, demographical information is imported into the dataset.

In order to create data on family level income, one must first create a new dataset for this purpose (variables of different unit types can not be mixed together in a single dataset). A variable measuring work related income on individual level is then imported into the new dataset, before the commando collapse (sum) is used to sum income into family level (by(famnr)). This results in a dataset with family as unit type.

Finally, family income is merged into the individual level dataset using the commando merge.

//Create an individual level dataset consisting of persons in families defined by married couples with small children
create-dataset persondata
import BEFOLKNING_REGSTAT_FAMTYP 2010-01-01 as famtype
tabulate famtype
keep if famtype == '2.1.1'

//Add demographical information
import BEFOLKNING_KJOENN as kjønn
generate alder = 2010 - int(faarmnd/100)
import BOSATTEFDT_BOSTED 2010-01-01 as bosted
generate fylke = substr(bosted,1,2)
import BEFOLKNING_BARN_I_HUSH 2010-01-01 as antbarn

//Create dataset for generating family level income (unit type = family)
create-dataset familiedata
import BEFOLKNING_REGSTAT_FAMNR 2010-01-01 as famnr
import INNTEKT_WYRKINNT 2010-01-01 as yrkesinnt
collapse (sum) yrkesinnt, by(famnr)
rename yrkesinnt familieinnt

//Merge family income into individual level dataset (unit type = persons)
merge familieinnt into persondata on PERSONID_1

//Generate family level statistics. The family number consists of the personal id of the eldest person in the family, so by removing individuals with missing family level income the dataset now has unit type = family. All individual information will be assosiated with the eldest person in the family.
use persondata
drop if sysmiss(familieinnt)

rename alder alder_eldst
rename kjønn kjønn_eldst

define-labels fylketekst '01' 'Østfold' '02' 'Akershus' '03' 'Oslo' '04' 'Hedmark' '05' 'Oppland' '06' 'Buskerud' '07' 'Vestfold' '08' 'Telemark' '09' 'Aust-Agder' '10' 'Vest-Agder' '11' 'Rogaland' '12' 'Hordaland' '14' 'Sogn og Fjordane' '15' 'Møre og Romsdal' '16' 'Sør-Trøndelag' '17' 'Nord-Trøndelag' '18' 'Nordland' '19' 'Troms' '20' 'Finnmark' '99' 'Uoppgitt'

assign-labels fylke fylketekst

tabulate fylke

histogram alder_eldst, discrete
histogram antbarn, percent

tabulate antbarn
tabulate antbarn, cellpct
tabulate antbarn kjønn_eldst

summarize familieinnt
barchart (mean) familieinnt, by(fylke)
barchart (mean) familieinnt, by(antbarn)
histogram familieinnt, freq
histogram familieinnt, by(antbarn) percent