2.4 Datasets containing regular time measurements (paneldata)
To be able to perform advanced regression analyses in the form of paneldata analysis, data must be organized in a different way compared to regular regression analyses. Paneldata are datasets in which each unit takes values for all included variables measured over a specified number of times. This has the advantage that the time component can be included in analyses, and the databases become much larger, often resulting in analyses of better quality.
There are a number of paneldata analysis techniques, the distinction goes on which assumptions are made about the variability of the variables over time. Common variants used are fixed effect and random effect analyses. This analysis form will be reviewed in section 5.9.
Data to be used in paneldata analysis must be imported as follows:
create-dataset <dataset>
import-panel <variable list> <measurement date list> as <alias>
Example: Data matrix using import-panel (3 variables, 3 measurements)
ID | Date | Variable 1 | Variable 2 | Variable 3 |
---|---|---|---|---|
123456 | 2000-01-01 | 1 | 200000 | 0301 |
123456 | 2001-01-01 | 1 | 210000 | 0301 |
123456 | 2002-01-01 | 2 | 215000 | 1201 |
135791 | 2000-01-01 | 2 | 305011 | 1101 |
135791 | 2001-01-01 | 2 | 301000 | 1101 |
135791 | 2002-01-01 | 3 | 299000 | 0301 |
147036 | 2000-01-01 | 1 | 150000 | 2030 |
147036 | 2001-01-01 | 1 | 159000 | 2030 |
147036 | 2002-01-01 | 3 | 199000 | 0301 |
-
Panel datasets quickly become very large, since all units/individuals in the data set are measured T times, where T stands for the number of measurements. This is especially true if you import many variables as well
-
A good practice when creating panel datasets is to first create a population of appropriate size, then duplicate this and finally import paneldata into the empty data set of the duplicate population.
Example: Create population, duplicate units into new data set, and finally import paneldata for the given population (= residents in Oslo per January 1., 2010, aged 18-39)
require no.ssb.fdb:23 as db
create-dataset population
import db/BOSATTEFDT_BOSTED 2010-01-01 as residence
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month
generate age = 2010 - int(birth_year_month/100)
keep if age >= 18 & age < 40 & residence == '0301'
clone-units population paneldata
use paneldata
import-panel db/INNTEKT_WLONN db/SIVSTANDFDT_SIVSTAND db/BOSATTEFDT_BOSTED 2011-12-31 2012-12-31 2013-12-31 2014-12-31
Panel datasets are created using a single import-panel
command. Multiple batches cannot be imported into the same panel dataset. Nor is it possible to mix cross-sectional data and / or event-based data with paneldata. However, you can connect variables that contain fixed information (gender, date of birth, country of birth, etc.) using the merge
command.
It is also possible to create a panel dataset by converting an existing cross-sectional dataset into panel / long format using the reshape-to-panel
command. See section 2.9.1 for a review of this command.