Select balanced and spatially balanced probability samples in multi-dimensional spaces with any prescribed inclusion probabilities. The local pivotal method and spatially correlated Poisson sampling (for spatially balanced sampling) are included. Also the cube method (for balanced sampling) and the local cube method (for doubly balanced sampling) are included.
Current version: 1.5.4
It's unbelievably fast!
I installed it immediately and the results in terms of execution times are exciting!
I can't survive without it! It solved all my problems in terms of time of computation.
The implementations are written in C++ and make use of the excellent Rcpp package.
# How fast is it? # Let's check with N = 100 000 # and 5 balancing variables N = 100000; # population size n = 100; # sample size p = rep(n/N,N); # inclusion probabilities # matrix of 5 auxiliary variables X = cbind(p,runif(N),runif(N),runif(N),runif(N)); system.time(cube(p,X)); user system elapsed 0.228 0.008 0.237
N = 100000; # A population of size 100 000 n = 50; # Sample size p = rep(n/N,N); # Inclusion probabilities X = cbind(runif(N),runif(N)); # Auxiliary variables system.time(lpm2(p,X)); user system elapsed 45.605 0.048 45.659 system.time(lpm2_kdtree(p,X)); user system elapsed 0.372 0.004 0.375 # For no ties it gives identical results! set.seed(12345); s1 = lpm2(p,X); set.seed(12345); s2 = lpm2_kdtree(p,X); print(sum(s1!=s2)) 0
Thanks to Jonathan Lisic for providing this new implementation of lpm2! Now available in the R package SamplingBigData (imported to BalancedSampling).
If the inclusion probabilities are equal, then a well spread sample is representative in the auxiliary variables, i.e. the distribution in the sample is very close to the distribution in the population. The image shows a spatially balanced sample (red dots) selected with the local pivotal method.
set.seed(1234567); N = 1000; # population size n = 100; # sample size p = rep(n/N,N); # inclusion probabilities X = cbind(runif(N),runif(N)); # matrix of auxiliary variables s = lpm1(p,X); # select sample plot(X[,1],X[,2],xlab="x",ylab="y",pch=19,col="gray"); points(X[s,1],X[s,2],pch=19,col="red");
set.seed(12345); N = 200; # population size n = 40; # sample size p = rep(n/N,N); # inclusion probabilities x = runif(N,0,10); # auxiliary variable # target variable related to x y = 4 + 2 * x + 2 * x^2 + rnorm(N,0,1); nrs = 1000; ht1 = ht2 = rep(0,nrs); for(i in 1:nrs){ # spatially balanced s = lpm2(p,cbind(x)); ht1[i] = sum(y[s]/p[s]); # simple random s = sample(N,n); ht2[i] = sum(y[s]/p[s]); } res = cbind(ht1,ht2); colnames(res) = c("Spatially balanced sampling","Simple random sampling"); boxplot(res);
Grafström, A. Lisic, J (2018). BalancedSampling: Balanced and Spatially Balanced Sampling. R package version 1.5.4. http://www.antongrafstrom.se/balancedsampling
anton.grafstrom(at)gmail.com
2018-09-03: Release of version 1.5.4 Added hierarchical local pivotal method. lpm2_kdtree moved to new package SamplingBigData.
2016-01-27: Release of version 1.5.1 Added a faster kd-tree implementation of lpm2 by Jonathan Lisic!
2014-04-14: Release of version 1.4. Fixed a numerical stability issue in the cube-based methods.
2014-04-07: Release of version 1.3. Fixed a serious bug in cubestratified and lcubestratified.
2014-03-31: Release of version 1.2. Added stratified balanced sampling (cubestratified) and stratified doubly balanced sampling (lcubestratified).
2014-03-26: Release of version 1.1. Added separate functions for the flight phase and the landing phase of the cube method and the local cube method. Renamed inclusionprobabilities to probabilities to not have a conflict with the package "sampling". Improved the speed of scps.
2014-03-19: First release of version 1.0