3.3 Use of functions
In addition to the basic mathematical operators
=
, +
, -
, /
, *
, (
, )
, ^
microdata.no gives access to a large number of functions to be utilised in order to generate variables. A specific example is the case of recoding residency from municipality into county level. As data on residency takes alphanumerical values on municipality level by default (= four-digit code where the first two-digits represent county number whereas the last two-digits specifies the municipality within the county), the function substr()
is needed in order to retrieve the first two digits representing counties:
generate county = substr(residency,1,2)
The input parameters "1" and "2" inside the substr()
-expression are
referring to the starting position and the number of characters to read respectively. The municipality of Bergen are represented by the value '1201'. Retrieving the first two digits will result in the value '12' which represents the county of Hordaland.
Another typical use case for substr()
is when there is a need for
information on educational level on a higher aggregated level than the default 6-digit code level. Using an educational division on 1- or 2-digit level is very common. This function will suit as a very useful tool for such a purpose.
Other important functions are round()
, int()
, alternatively floor()
. These are useful for the purpose of transforming decimal numbers into integers or to retrieve subvalues. round()
rounds decimal numbers the regular way, while int()
and floor()
rounds downwards. If e.g. there is a need to retrieve the birth year from the numerical variable yearmonth (year and month on the format YYYYMM), the following expression can be used:
generate birthyear = int( yearmonth / 100)
This expression will generate birth year by dividing by 100 and keeping the integer number (skipping the decimal digits). In practice, this operation retrieves the first four digits from a numerical 6-digit value. For example, to retrieve the value 2010 from the numerical value 201006 in order to calculate age per 2013, the following expression can be used:
generate age = 2013 - int( yearmonth / 100)
If birth date is represented by an 8-digit numerical number (YYYYMMDD), the expression need to be adjusted as follows:
generate age = 2013 - int( birthdate / 10000)
Appendix B presents a list of all available functions. Note that each
function have requirements regarding which type of variable formats they are suited for, e.g. substr()
requires alphanumeric values only.
Examples of facilitating variables and use of functions
Examples of functions used for individual variable calculations