assign.N.sample {pmclust} | R Documentation |
This utility function samples data randomly from X.spmd
to form a relatively small subset of original data. The EM algorithm on the
smaller subset is topically performing fast and capturing rough structures of
entire dataset.
assign.N.sample(total.sample = 5000, N.org.spmd)
total.sample |
a total number of samples which will be selected from
the original data |
N.org.spmd |
the original data size,
i.e. |
This utility function performs simple random sampling without replacement
for the original dataset X.spmd
. Different random seeds should
be set before calling this function.
A list variable will be returned and containing:
N | total sample size across all S processors |
N.spmd | sample size of given processor |
N.allspmds | a collection of sample sizes for all S processors |
ID.spmd | index of selected samples ranged from 1
to N.org.spmd
|
Note that N
and N.allspmds
are the same across all
S processors, but N.spmd
and ID.spmd
are most
likely all distinct. The lengths of these elements are 1 for
N
and N.spmd
, S for N.allspmd
, and
N.spmd
for ID.spmd
.
Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.
High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/
Programming with Big Data in R Website: http://r-pbd.org/
## Not run: # Save code in a file "demo.r" and run in 4 processors by # > mpiexec -np 4 Rscript demo.r ### Setup environment. library(pmclust, quiet = TRUE) comm.set.seed(123) ### Generate an example data. N.org.spmd <- 5000 + sample(1:1000, 1) ret.spmd <- assign.N.sample(total.sample = 5000, N.org.spmd) cat("Rank:", comm.rank(), " Size:", ret.spmd$N.spmd, "\n", sep = "") ### Quit. finalize() ## End(Not run)