How to aggregate data into family level

Individuals can be linked to a family number that can be used to aggregate data on family level. Individuals belonging to the same family are registered with the same family number which consists of the personal id number of the eldest person in the family.

In the example below, an individual level dataset is first created. It is then filtered down to persons in families consisting of married couples with small children (codevalue 2.1.1). Next, demographical information is imported into the dataset.

In order to create data on family level income, one must first create a new dataset for this purpose (variables of different unit types can not be mixed together in a single dataset). A variable measuring work related income on individual level is then imported into the new dataset, before the commando collapse (sum) is used to sum income into family level (by(famnr)). This results in a dataset with family as unit type.

Finally, family income is merged into the individual level dataset using the commando merge.

//Connect to databank
require no.ssb.fdb:1 as fdb1

//Create an individual level dataset consisting of persons in families defined by married couples with small children
create-dataset persondata
import fdb1/BEFOLKNING_REGSTAT_FAMTYP 2010-01-01 as famtype
tabulate famtype
keep if famtype == '2.1.1'

//Add demographical information
import fdb1/BEFOLKNING_KJOENN as sex
import fdb1/BEFOLKNING_FOEDSELS_AAR_MND as birthyearmonth
generate age = 2010 - int(birthyearmonth/100)

import fdb1/BOSATTEFDT_BOSTED 2010-01-01 as municipality
generate county = substr(municipality,1,2)

import fdb1/BEFOLKNING_BARN_I_HUSH 2010-01-01 as children

//Create dataset for generating family level income (unit type = family)
create-dataset familydata
import fdb1/BEFOLKNING_REGSTAT_FAMNR 2010-01-01 as famnr
import fdb1/INNTEKT_WYRKINNT 2010-01-01 as income
collapse (sum) income, by(famnr)
rename income familyincome

//Merge family income into individual level dataset (unit type = persons)
merge familyincome into persondata on PERSONID_1

//Generate family level statistics. The family number consists of the personal id of the eldest person in the family, so by removing individuals with missing family level income the dataset now has unit type = family. All individual information will be assosiated with the eldest person in the family.
use persondata
drop if sysmiss(familyincome)

rename age age_oldest
rename sex sex_oldest

define-labels countytxt '01' 'Østfold' '02' 'Akershus' '03' 'Oslo' '04' 'Hedmark' '05' 'Oppland' '06' 'Buskerud' '07' 'Vestfold' '08' 'Telemark' '09' 'Aust-Agder' '10' 'Vest-Agder' '11' 'Rogaland' '12' 'Hordaland' '14' 'Sogn og Fjordane' '15' 'Møre og Romsdal' '16' 'Sør-Trøndelag' '17' 'Nord-Trøndelag' '18' 'Nordland' '19' 'Troms' '20' 'Finnmark' '99' 'Uoppgitt'
assign-labels county countytxt

tabulate county

histogram age_oldest, discrete
histogram children, percent

tabulate children
tabulate children, cellpct
tabulate children sex_oldest

summarize familyincome
barchart (mean) familyincome, by(county)
barchart (mean) familyincome, by(children)
histogram familyincome, freq
histogram familyincome, by(children) percent