2.9.1 Restructuring from cross-sectional data to paneldata
For statistics and analyses in microdata.no, datasets created through the command import
are normally used. These are data sets of the "wide" type, where information about all units in a population is structured horizontally at a variable level. The reshape-to-panel
command makes it possible to change the data structure to long-format (panel-format), where information about each unit (individual) is structured vertically at the observation / record level.
Variables that are measured over several times and that you want in long/panel format, must be named through reshape-to-panel
with specified prefixes that consist of the letters (prefix) from the original variable in the wide dataset. Other variables for which no prefix is specified, typically information that is only measured once (gender, country of birth, etc), are automatically defined as fixed information and the values for these are repeated for all sub-levels of each unit.
The suffixes of the original wide variables with repetitive measurements must consist of integers. These will form the sublevel of the long/panel dataset. Typical examples of suffixes would be two- or four-digit years, or other types of time indications that also point to month or quarter, e.g. 201901, 201902 etc. You are free to choose other types of suffixes as long as it consists of digits1. Suffixes of type 1, 2, 3, 4 etc are also allowed.
The illustration below shows how the restructuring takes place under the hood. The example shows a wide-format dataset that contains the
variables sivstand18-sivstand20, lønn18-lønn20, and kjønn. Marital
status (sivstand) and wage (lønn) are thus measured for the years
2018-2020, while gender (kjønn) is a fixed piece of information that is
only measured once. The dataset is converted to long format using the
command reshape-to-panel sivstand lønn
. The variable date@panel is
created automatically and contains the sublevel which in this case is a
double-digit year.
The reshape-to-panel
command has several uses:
-
A more flexible alternative to
import-panel
, which also creates panel datasets, but which has some limitations. Among other things, all variables here must have valid measurement dates for all measurements, which can be challenging if cross-sectional variables are included in the data set (variables that only have values on given annual, quarterly or monthly dates). Thereshape-to-panel
command allows all combinations of variables. -
Some analyses require a long format, and the support for this is now greatly improved. In addition, you have access to all the flexibility and functionalities associated with wide data sets, and can do the entire adaptation in this format before you easily restructure to long format afterwards. This is useful if you need to compare and perform operations over variable values across sub-level (over time), e.g. compare the value of wages in 2020 compared to 2019.
[Example: Restructure datasets from wide to long format](i18n\en\docusaurus-plugin-content-docs\current\eksempel\Sammensatte operasjoner\Restrukturere datasett fra wide- til long-format.md)
Footnotes
-
The character "_" is also allowed, e.g. “sivstatus2019_01_01”. However, after the reshape operation is completed, the special character will be removed from the sub-levels. For example, using the suffix “2019_01_01”, the corresponding sublevel will be changed to “20190101” in the transformed dataset. ↩