Now you can restructure datasets from wide to long format
The new command reshape-to-panel
gives you the freedom to restructure your data to long-format, i.e. data where the information is organized downwards as observations / records.
For statistics and analyses in microdata.no, datasets created through the command import
are normally used. These are data sets of the “wide” type, where information about all units in a population is structured horizontally at a variable level. The new reshape-to-panel
command now makes it possible to change the data structure to long-format (panel-format), where information about each unit (individual) is structured vertically at the observation / record level.
Variables that are measured over several times and that you want in long / panel format, must be named through reshape-to-panel
with specified prefixes that consist of the letters (prefix) from the original variable in the wide dataset. Other variables for which no prefix is specified, typically information that is only measured once (gender, country of birth, etc), are automatically defined as fixed information and the values for these are repeated for all sub-levels of each unit.
The illustration below shows how the restructuring takes place under the hood. The example shows a wide-format dataset that contains the variables sivstand18-sivstand20, lønn18-lønn20, and kjønn. Marital status (sivstand) and wage (lønn) are thus measured for the years 2018-2020, while gender (kjønn) is a fixed piece of information that is only measured once. The dataset is converted to long format using the command reshape-to-panel sivstand lønn
. The variable date@panel is created automatically and contains the sublevel which in this case is a double-digit year.
The reshape-to-panel
command has several uses:
- A more flexible alternative to the
import-panel
, which also creates panel datasets, but which has some limitations. Among other things, all variables here must have valid measurement dates for all measurements, which can be challenging if cross-sectional variables are included in the data set (variables that only have values on given annual, quarterly or monthly dates). Thereshape-to-panel
command allows all combinations of variables. - Some analyzes require a long format, and the support for this is now greatly improved. In addition, you have access to all the flexibility and functionalities associated with wide data sets, and can do the entire adaptation in this format before you easily restructure to long format afterwards. This is useful if you need to compare and perform operations over variable values across sub-level (over time), e.g. when creating a condition based upon a comparison between the value of wages in 2020 and 2019.