Skip to main content

4. Descriptive variable statistics

Microdata.no provides various techniques for data exploration. The most basic and useful tools are frequency tabulations (one-way or cross tables) and summary statistics (for numerical/metrical variables). It is also possible to visualize through histograms, barcharts, piecharts or anonymised scatterplots (hexbinplots).

The microdata.no analysis system currently has the following commands available for the production of descriptive statistics:

  • tabulate

  • summarize

  • boxplot

  • hexbin

  • piechart

  • histogram

  • barchart

  • sankey

In addition, the following commands can be used on panel data (ref section 5.9.1):

  • tabulate-panel

  • summarize-panel

  • transitions-panel

Through various options, alternative representations of the same distributions may be displayed, and specified units can also be filtered out from the tables/figures through if-conditions.


NOTE!

The values ​​for mean, standard deviation and gini are affected by the fact that the statistical population is winsorized before the descriptive statistics figures are calculated. Winsorization means that extreme values are coded ​​and set to the limit value for the respective first and last percentile, cf. the values ​​for 1% and 99% in the summarize result. This affects the average, standard deviation and gini so that the calculated value is somewhat lower than the actual value. This depends on how skewed the distribution for the respective variable populations is. In the case of a normal distribution, winsorisation will not have any particular effect.

Percentile, quartile, and median values ​​are not affected by winsorization, but are displayed with three-digit precision.

Graphical displays of numerical calculations through commands such as boxplot, barchart, histogram and hexbin are also affected by the mentioned privacy measures.

Regression analyzes mainly return estimates and to a small extent personally identifiable information. Therefore, these are not subject to the measures mentioned above. You will find documentation of available regression analyzes in chapter 5.

\rhd More information about winsorization and other privacy measures can be found here