1
How to ensure your simple random sampling is really
random?
What is a random sampling process?
When we carry out a discrete sampling exercise, we want to pick up a
predetermined number of objects from a much larger population. The
sampling methodology depends on the type of statistical analysis being
performed, but it is suffice to consider simple random sampling or
systematic sampling methods.
Simple random sampling is the process of selecting a random sample from a
finite or infinite population. As the word “random” in statistics suggests, we
must collect a number of samples from the population without definite aim
or pattern. It has two important properties that make it outstanding from
other methods, i.e.
Unbiased: each unit has the same chance of being chosen
Independence: selection of one unit has no influence on the selection of
other units.
If there is a finite population of n units and we want to take r unit samples
each time, we can then draw a total combination of
n
C
r
different samples
from these n units.
By mathematical definition, a combination is a selection of all or part of a set
of objects, without regard to the order in which objects are selected. The
number of combination of n objects taken r at a time is given by the
following formula:
!)!(!
!
!
)1)...(2)(1(
r
P
rnr
n
r
rnnnn
C
rn
rn
… Eq [1]
In mathematics, the factorial of a non-negative integer n, denoted by n! is the
product of all positive integers less than or equal to n. For example, if n =5, then,
5! = 5 x 4 x 3 x 2 x 1.
The equation [1] also tells us that each random sample has an equal
rn
C
1
probability of being selected.
Let’s say hypothetically there are 100 drums of chemicals for a shipment,
how many ways that 4 different drums can be randomly selected to form a
sample from them for testing?
2
The answer is: we can have
100
C
4
or
1234
979899100
or
3 921 225
ways!
Therefore statistically speaking, if we can devise a procedure for selecting a
sample of 4 drums such that each of these nearly 4 millions samples has an
equal probability (i.e. equal to 1/3 921 225) of being selected, then the
sample selected would be a random sample.
The random sampling requires the experience of the person who is doing the
sampling but he or she can strongly influence any subjective distortions such
as :
- preferred sampling at easily accessible locations
- intuitive selection of either obviously darker or lighter colour of the
population
- tendency towards an intuitive regular distribution of sampling points
Using a random number generator
In random sampling, each item in the population for laboratory analysis has
an equal chance of being selected through a methodology which is not bias.
The followings are some of the ways for consideration:
a. Use of a random number table
We can pick up various random samples required from a random number
table generated by a random number generator. A 4-digit random number
table caters for a population of up to 10 000 items. An example of random
number table with 5 digit numbers is shown in Figure 1 below.
Figure 1: Part of a table of random numbers
3
Label all the items of your targeted population sequentially with a number. If
you are to select 200 samples from this population, then you should
randomly select a starting point on the table, move down columns selecting
appropriate numbers which have their last three digits being less than 200
until you attain 200 samples.
b. Generation of random numbers from MS Excel spreadsheet
Again label your population items sequentially, identify each item with a
random number generated by the MS Excel RAND() function, and then rank
the random numbers in ascending order by Excel function RANK(). Pick up
the number of samples targeted according to their ranking order. Figure 2
shows an example of 12 items in a population randomized and ranked.Fi
Figure 2: Randomization and ranking of 12 items in a population
A B C
1 Item # Random Rank
2 A(1) =RAND() =RANK(B2,$B2:B13,1)
3 A(2) =RAND() =RANK(B3,$B2:B13,1)
4 A(3) =RAND() =RANK(B4,$B2:B13,1)
~~ ~~ ~~ ~~
11 A(10) =RAND() =RANK(B11,$B2:B13,1)
12 A(11) =RAND() =RANK(B12,$B2:B13,1)
13 A(12) =RAND() =RANK(B13,$B2:B13,1)
c. Use of statistical software such as R programming
Those who are familiar with the open source statistical software R will find
selecting random samples very easily. For example, after labelling all the
100 items of a population sequentially, we write in the R program:
> C=sample(100,10) #Randomly take 10 samples from 100 in a population
> C
[1] 19 57 27 60 11 86 50 36 29 9