New command for random population sub-selections
Random population sub-selections are now possible through the new command sample
.
For users operating with large populations/datasets, data adaptations, codings and analysis may become resource demanding and time consuming to execute. In such cases, the sample
command may be used as a tool by trimming your population, in order for your executions to run more smoothly. Thus, it is possible to test your command scripts on a smaller sample before performing final executions on a total population.
Also, for various testing purposes, such as testing of statistical methods or other statistics, sample
may be a useful tool.
The command expects two input parametres: Sample size and seed number (in this particular order). By specifying a decimal number (0.0 – 1.0), your dataset will be trimmed down to the specific random sample share. By specifying an integer > 1000, your random sample will consist of this number of individuals.
The seed number is a custom positive integer which ensures that the random sample is identical when performing consecutive executions of the same script. If another seed number is specified, a new sample will be randomly chosen, different from the previous.
Example of a 10% random sample sub-selection using the seed 1234:
sample 0.1 1234
Example of a random sample consisting of 10 000 individuals, using the seed 1234:
sample 10000 1234
Example of a new 10% random sample sub-selection different from the first example, using another seed:
sample 0.1 5678