Balanced Sampling

Select balanced and spatially balanced probability samples in multi-dimensional spaces with any prescribed inclusion probabilities. The local pivotal method and spatially correlated Poisson sampling (for spatially balanced sampling) are included. Also the cube method (for balanced sampling) and the local cube method (for doubly balanced sampling) are included.

Current version: 1.5.4

Source package

Compile the package (.tar.gz)

Download »

News. Version 1.5.4 soon available on CRAN!

See changelog.

It's unbelievably fast!

Yves Tillé, Professor, Institut de statistique, Neuchâtel, Switzerland.

I installed it immediately and the results in terms of execution times are exciting!

Roberto Benedetti, Professor, Department of Economic Studies, University of Chieti-Pescara, Italy.

I can't survive without it! It solved all my problems in terms of time of computation.

Maria M. Dickson, PhD Student in Economic Statistics, Department of Social Science - University "Sapienza", Rome

It's really fast. It's C++

The implementations are written in C++ and make use of the excellent Rcpp package.

# How fast is it?
# Let's check with N = 100 000 
# and 5 balancing variables
N = 100000;        # population size
n = 100;           # sample size
p = rep(n/N,N);    # inclusion probabilities
# matrix of 5 auxiliary variables
X = cbind(p,runif(N),runif(N),runif(N),runif(N)); 
system.time(cube(p,X));  
   user  system elapsed 
  0.228   0.008   0.237

N = 100000;                      # A population of size 100 000
n = 50;                          # Sample size
p = rep(n/N,N);                  # Inclusion probabilities
X = cbind(runif(N),runif(N));    # Auxiliary variables

system.time(lpm2(p,X));
   user  system elapsed 
 45.605   0.048  45.659 
 
system.time(lpm2_kdtree(p,X));
   user  system elapsed 
  0.372   0.004   0.375 

# For no ties it gives identical results!
  
set.seed(12345);
s1 = lpm2(p,X);

set.seed(12345);
s2 = lpm2_kdtree(p,X);

print(sum(s1!=s2))
 0

New kd-tree implementation of LPM2! It's much faster.

Thanks to Jonathan Lisic for providing this new implementation of lpm2! Now available in the R package SamplingBigData (imported to BalancedSampling).

Spatially balanced sample. It's representative.

If the inclusion probabilities are equal, then a well spread sample is representative in the auxiliary variables, i.e. the distribution in the sample is very close to the distribution in the population. The image shows a spatially balanced sample (red dots) selected with the local pivotal method.

set.seed(1234567);
N = 1000;                     # population size
n = 100;                      # sample size
p = rep(n/N,N);               # inclusion probabilities
X = cbind(runif(N),runif(N)); # matrix of auxiliary variables
s = lpm1(p,X);                # select sample

plot(X[,1],X[,2],xlab="x",ylab="y",pch=19,col="gray");
points(X[s,1],X[s,2],pch=19,col="red");

Spatially balanced sampling is efficient! For any smooth relationship.

set.seed(12345);
N = 200;           # population size
n = 40;            # sample size
p = rep(n/N,N);    # inclusion probabilities
x = runif(N,0,10); # auxiliary variable
# target variable related to x
y = 4 + 2 * x + 2 * x^2 + rnorm(N,0,1);

nrs = 1000;
ht1 = ht2 = rep(0,nrs);
for(i in 1:nrs){
  # spatially balanced
  s = lpm2(p,cbind(x));
  ht1[i] = sum(y[s]/p[s]);
  # simple random
  s = sample(N,n);
  ht2[i] = sum(y[s]/p[s]);
}
res = cbind(ht1,ht2);
colnames(res) = c("Spatially balanced sampling","Simple random sampling");
boxplot(res);

How to cite

Grafström, A. Lisic, J (2018). BalancedSampling: Balanced and Spatially Balanced Sampling. R package version 1.5.4. http://www.antongrafstrom.se/balancedsampling

Who cites this package?

Benedetti, R., Piersimoni, F., and Postiglione, P. (2015). Spatial Sampling Designs. In Sampling Spatial Units for Agricultural Surveys (pp. 149-196). Springer Berlin Heidelberg.
Brus, D. J. (2015). Balanced sampling: A versatile sampling approach for statistical soil surveys. Geoderma, 253, 111-121.
Dickson, M. M., Benedetti, R., Giuliani, D., and Espa, G. (2014). The Use of Spatial Sampling Designs in Business Surveys. Open Journal of Statistics, 2014.
Dickson, M. M. and Tillé, Y. (2015). Ordered Spatial Sampling by Means of the Traveling Salesman Problem, DEM discussion paper.
Grafström, A., Saarela, S., and Ene, L. T. (2014). Efficient sampling strategies for forest inventories by spreading the sample in auxiliary space. Canadian Journal of Forest Research, 44(10), 1156-1164.
Schnell, S. (2015). Integrating trees outside forests into national forest inventories. PhD thesis.

Use in real surveys

It is used for the national seashore inventory in Sweden, 2015.
It is used for the national forest inventory in Sweden, 2018.
It is used for the national forest and pasture inventory in Albania, 2018.

Report a bug

anton.grafstrom(at)gmail.com

Changelog

2018-09-03: Release of version 1.5.4 Added hierarchical local pivotal method. lpm2_kdtree moved to new package SamplingBigData.

2016-01-27: Release of version 1.5.1 Added a faster kd-tree implementation of lpm2 by Jonathan Lisic!

2014-04-14: Release of version 1.4. Fixed a numerical stability issue in the cube-based methods.

2014-04-07: Release of version 1.3. Fixed a serious bug in cubestratified and lcubestratified.

2014-03-31: Release of version 1.2. Added stratified balanced sampling (cubestratified) and stratified doubly balanced sampling (lcubestratified).

2014-03-26: Release of version 1.1. Added separate functions for the flight phase and the landing phase of the cube method and the local cube method. Renamed inclusionprobabilities to probabilities to not have a conflict with the package "sampling". Improved the speed of scps.

2014-03-19: First release of version 1.0