generate.MixSim {pmclust} | R Documentation |
This function utilizes MixSim to generate sets of data for testing algorithms.
generate.MixSim(N, p, K, MixSim.obj = NULL, MaxOmega = NULL, BarOmega = NULL, PiLow = 1.0, sph = FALSE, hom = FALSE)
N |
total sample size across all S processors, i.e.
sum over |
p |
|
K |
number of clusters. |
MixSim.obj |
an object returned from |
MaxOmega |
maximum overlap as in |
BarOmega |
averaged overlap as in |
PiLow |
lower bound of mixture proportion as in |
sph |
sph as in |
hom |
hom as in |
If MixSim.obj
is NULL, then BarOmega
and MaxOmega
will be used in MixSim
to obtain a new
MixSim.obj
.
A set of simulated data and information will be returned in a list variable including:
K | number of clusters, as the input |
p | dimension of data
X.spmd ,
as the input |
N | total sample size, as the input |
N.allspmds | a collection of sample sizes for all S processors, as the input |
N.spmd | total sample size of given processor, as the input |
X.spmd | generated data set with dimension with
dimension N.spmd * p |
CLASS.spmd
| true id of each data, a vector of
length N.spmd
and has values from 1 to K |
N.CLASS.spmd | true sample size of each clusters, a
vector of length K |
MixSim.obj | the true model where data
X.spmd generated from
|
Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.
Melnykov, V., Chen, W.-C. and Maitra, R. (2012) “MixSim: Simulating Data to Study Performance of Clustering Algorithms”, Journal of Statistical Software, (accepted).
High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/
Programming with Big Data in R Website: http://r-pbd.org/
## Not run: # Save code in a file "demo.r" and run in 4 processors by # > mpiexec -np 4 Rscript demo.r ### Setup environment. library(pmclust, quiet = TRUE) ### Generate an example data. N <- 5000 p <- 2 K <- 2 data.spmd <- generate.MixSim(N, p, K, BarOmega = 0.01) X.spmd <- data.spmd$X.spmd ### Run clustering. PARAM.org <- set.global(K = K) # Set global storages. # PARAM.org <- initial.em(PARAM.org) # One initial. PARAM.org <- initial.RndEM(PARAM.org) # Ten initials by default. PARAM.new <- apecma.step(PARAM.org) # Run APECMa. em.update.class() # Get classification. ### Get results. N.CLASS <- get.N.CLASS(K) comm.cat("# of class:", N.CLASS, "\n") comm.cat("# of class (true):", data.spmd$N.CLASS.spmd, "\n") ### Quit. finalize() ## End(Not run)