New command for random population sub-selections

 by  Trond Pedersen

Random population sub-selections are now possible through the new command sample.

For users operating with large populations/datasets, data adaptations, codings and analysis may become resource demanding and time consuming to execute. In such cases, the sample command may be used as a tool by trimming your population, in order for your executions to run more smoothly. Thus, it is possible to test your command scripts on a smaller sample before performing final executions on a total population.

Also, for various testing purposes, such as testing of statistical methods or other statistics, sample may be a useful tool.

The command expects two input parametres: Sample size and seed number (in this particular order). By specifying a decimal number (0.0 - 1.0), your dataset will be trimmed down to the specific random sample share. By specifying an integer > 1000, your random sample will consist of this number of individuals.

The seed number is a custom positive integer which ensures that the random sample is identical when performing consecutive executions of the same script. If another seed number is specified, a new sample will be randomly chosen, different from the previous.

Example of a 10% random sample sub-selection using the seed 1234:

sample 0.1 1234

Example of a random sample consisting of 10 000 individuals, using the seed 1234:

sample 10000 1234

Example of a new 10% random sample sub-selection different from the first example, using another seed:

sample 0.1 5678

Click here for more examples