Skip to main content

2.4 Datasets containing regular time measurements (paneldata)

To be able to perform advanced regression analyses in the form of paneldata analysis, data must be organized in a different way compared to regular regression analyses. Paneldata are datasets in which each unit takes values ​​for all included variables measured over a specified number of times. This has the advantage that the time component can be included in analyses, and the databases become much larger, often resulting in analyses of better quality.

There are a number of paneldata analysis techniques, the distinction goes on which assumptions are made about the variability of the variables over time. Common variants used are fixed effect and random effect analyses. This analysis form will be reviewed in section 5.9.

Data to be used in paneldata analysis must be imported as follows:

create-dataset <dataset>

import-panel <variable list> <measurement date list> as <alias>


Example: Data matrix using import-panel (3 variables, 3 measurements)

IDDateVariable 1Variable 2Variable 3
1234562000-01-0112000000301
1234562001-01-0112100000301
1234562002-01-0122150001201
1357912000-01-0123050111101
1357912001-01-0123010001101
1357912002-01-0132990000301
1470362000-01-0111500002030
1470362001-01-0111590002030
1470362002-01-0131990000301
AWARE
  • Panel datasets quickly become very large, since all units/individuals in the data set are measured T times, where T stands for the number of measurements. This is especially true if you import many variables as well

  • A good practice when creating panel datasets is to first create a population of appropriate size, then duplicate this and finally import paneldata into the empty data set of the duplicate population.


Example: Create population, duplicate units into new data set, and finally import paneldata for the given population (= residents in Oslo per January 1., 2010, aged 18-39)

 
require no.ssb.fdb:23 as db

create-dataset population 
import db/BOSATTEFDT_BOSTED 2010-01-01 as residence
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month
generate age = 2010 - int(birth_year_month/100)
keep if age >= 18 & age < 40 & residence == '0301'

clone-units population paneldata

use paneldata
import-panel db/INNTEKT_WLONN db/SIVSTANDFDT_SIVSTAND db/BOSATTEFDT_BOSTED 2011-12-31 2012-12-31 2013-12-31 2014-12-31
 

Panel datasets are created using a single import-panel command. Multiple batches cannot be imported into the same panel dataset. Nor is it possible to mix cross-sectional data and / or event-based data with paneldata. However, you can connect variables that contain fixed information (gender, date of birth, country of birth, etc.) using the merge command.

It is also possible to create a panel dataset by converting an existing cross-sectional dataset into panel / long format using the reshape-to-panel command. See section 2.9.1 for a review of this command.