Reasons for discrepancies between results from microdata.no and Statistics Norway's statistics
Results created in microdata.no should, as a general rule, be relatively similar to the figures found on Statistics Norway's statistics pages, including Statistikbanken. But it is rare that the numbers are completely identical. This has several reasons:
- The data source has a lot to say about the deviations. Data imported from the various registers have varying degrees of deviation compared with official Statistics Norway statistics.
- Different data sources also give different figures. There are different sources of statistics for this - therefore it is important to compare data obtained from the same data sources.
- Different measurement times: Some statistics use annual figures, quarterly figures, monthly figures etc., while others use status figures measured at given dates, e.g. per 1/1, 31/12 or the reference week with the date 16/11. It is therefore important that you use the same measurement period/time when comparing figures.
- Different populations: Choices are often made related to the population used for the statistics in question. This must be taken into account when creating statistics in microdata.no with the intention of reproducing official Statistics Norway figures. E.g. can age restrictions or choices be made related to resident status (permanent residents only or all residents including people with Dnr).
- Differences in production/facilitation (despite the same data source): Even if you compare figures obtained from the same data source/register, there may be discrepancies in the figures due to Statistics Norway carrying out consistency processing/"cleaning" of the data. It is not necessarily the case that it is these finished production files that you import and work with in microdata.no
- Privacy filter in microdata.no noises frequencies and censors extreme values: Among the various privacy filters used in microdata.no, it is in particular the noise filter and the censoring of extreme values (winsorization) that can contribute to deviations in the figures compared to official statistics. Admittedly, the noise filter contributes an uncertainty of only +/- 5 (+/- 10 when measuring the difference between two numbers), so for large numbers this does not matter much. But winsorization will be able to affect average values, standard deviations and min/max values created through, among other things, the tabulate and summarize commands. The degree of deviation depends on the value distribution. The more normally distributed the number, the smaller the deviations.
Methods to find the cause of deviations
- Check "About the statistics" for the relevant Statistics Norway statistics. You will find such a statistical comment for all Statistics Norway's statistics, including which population is used, how the data has been processed for consistency, or whether there are other conditions that can affect the figures. You often find some relevant information directly linked to the Statistics Norway statistics (e.g. in the table title and in the footnote).
- Check the variable description of the relevant variable you use to create your statistics in microdata.no. There you will usually find a description of the population delimitation for this variable, etc. This should comply with SSB.
- Make sure in particular that you have chosen the same type of age limit, selection of permanent residents (all with a social security number) vs. all residents (fnr and dnr), measurement time/period, and geographical delimitation.
- Do not compare "apples and pears": Statistics Norway often has different versions of the same type of statistics, which are based on different data sources. It may be a good idea to make sure that you compare the microdata.no figures with the corresponding SSB figures made using the same type of data material.
You will find more information about this topic in the course material for the themed course "Working with data you don't see" (last half):