|
A sampling algorithm is a procedure that allows us to select randomly a subset of units (a sample) from a population without enumerating all the possible samples of the population.
More precisely, let be a finite population and a sample or a subset of . A sampling design is a probability distribution on the set of all the subsets i.e. and
The inclusion probability of a unit is its probability of being selected in the sample . The sum of the inclusion probabilities is equal to the expectation of the sample size .
In many sampling problems, the number of possible samples is generally very large. For instance, if and , the number of possible samples already equals 10,272,278,170. The selection of a sample by enumerating all the possible samples is generally impossible. A sampling algorithm is a method that allows bypassing this enumeration. There exists several class of methods:
-
Sequential algorithms. In this case, there is only one reading of the population file. Each unit is successively examined and the decision of selection is irremediably taken.
-
One by one algorithms. At each step, a unit is selected from the population until the fixed sample size is obtained.
-
Eliminatory algorithms. At each step, a unit is removed from the population until the fixed sample size is obtained.
-
Rejective methods For instance, samples with replacement are generated until a sample without replacement is obtained. Rejective methods can be interesting if there exists a more general sampling design that is simpler than the design we want to implement.
-
Splitting methods. This method described in Deville and Tillé (1998) starts with a vector of inclusion probabilities. At each step, this vector is randomly replaced by another vector until a vector containing only zeros and ones (i.e. a sample) is obtained.
The same sampling design can generally be implemented by using different methods. For instance, Tillé (2006) gives sequential, one by one, eliminatory algorithms for several sampling designs like simple random sampling with and without replacement and multinomial sampling.
The main difficulties however appear when the sample is selected with unequal inclusion probabilities without replacement and fixed sample size. The first proposed method was systematic sampling with unequal inclusion probabilities (Madow, 1949). For this sequential algorithm, first compute the cumulated inclusion probabilities . Next units such that
are selected, where is a uniform continuous random variable in [0,1) and is the sample size.
An interesting rejective procedure was proposed by Sampford (1967). Samples are selected with replacement. The first unit is selected with probability , the other units are selected with probability
The sample is accepted if distinct units are selected, otherwise another sample is selected.
Chen et al (1994) discussed the sampling design without replacement and fixed sample size that maximizes the entropy given by
They gave a procedure for selecting a sample according this sampling design. Several other efficient algorithms that implement this sampling design are given in Tillé (2006).
Other methods have been proposed by Brewer (1975), Deville and Tillé (1998). A review is given in Brewer and Hanif (1983) and Tillé (2006). Other sampling algorithms allows us to solve more complex problems. For instance, the cube method (Deville and Tillé, 2004) allows selecting balanced samples in the sense that the Horvitz-Thompson estimator are equal or approximately equal to the population totals for a set of control variables.
Reprinted with permission from Lovric, Miodrag (2011), International Encyclopedia of Statistical Science. Heidelberg: Springer Science +Business Media, LLC
- Brewer, 1975
- Brewer, K. (1975).
A simple procedure for pswor.
Australian Journal of Statistics, 17:166-172.
- Brewer and Hanif, 1983
- Brewer, K. and Hanif, M. (1983).
Sampling with Unequal Probabilities.
Spinger-Verlag, New York.
- Chen et al., 1994
- Chen, S., Dempster, A., and Liu, J. (1994).
Weighted finite population sampling to maximize entropy.
Biometrika, 81:457-469.
- Deville and Tillé, 1998
- Deville, J.-C. and Tillé, Y. (1998).
Unequal probability sampling without replacement through a splitting method.
Biometrika, 85:89-101.
- Deville and Tillé, 2004
- Deville, J.-C. and Tillé, Y. (2004).
Efficient : The cube method.
Biometrika, 91:893-912.
- Madow, 1949
- Madow, W. (1949).
On the theory of systematic sampling, II.
Annals of Mathematical Statistics, 20:333-354.
- Sampford, 1967
- Sampford, M. (1967).
On sampling without replacement with unequal probabilities of selection.
Biometrika, 54:499-513.
- Tillé, 2006
- Tillé, Y. (2006).
Sampling algorithms.
Springer-Verlag, New York.
|