pmclust and pkmeans {pmclust} | R Documentation |
Parallel Model-Based Clustering and Parallel K-means Algorithm
pmclust(X = NULL, K = 2, MU = NULL, algorithm = .PMC.CT$algorithm, RndEM.iter = .PMC.CT$RndEM.iter, CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X, rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm) pkmeans(X = NULL, K = 2, MU = NULL, algorithm = c("kmeans", "kmeans.dmat"), CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X, rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm)
X |
a GBD row-major matrix or a |
K |
number of clusters. |
MU |
pre-specified centers. |
algorithm |
types of EM algorithms. |
RndEM.iter |
number of Rand-EM iterations. |
CONTROL |
a control for algorithms, see |
method.own.X |
how |
rank.own.X |
who own |
comm |
MPI communicator. |
These are high-level functions for several functions in pmclust
including: data distribution, setting global environment .pmclustEnv
,
initializations, algorithm selection, etc.
The input X
is either in ddmatrix
or gbd
. It will
be converted in gbd
row-major format and copied into
.pmclustEnv
for computation. By default, pmclust uses a
GBD row-major format (gbdr
). While common
means that
X
is identical on all processors, and single
means that
X
only exist on one processor rank.own.X
.
These functions return a list with class pmclust
or pkmeans
.
See the help page of PARAM
or PARAM.org
for details.
Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.
High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/
Programming with Big Data in R Website: http://r-pbd.org/
set.global
,
e.step
,
m.step
.
set.global.dmat
,
e.step.dmat
,
m.step.dmat
.
## Not run: # Save code in a file "demo.r" and run in 4 processors by # > mpiexec -np 4 Rscript demo.r ### Setup environment. library(pmclust, quiet = TRUE) ### Load data X <- as.matrix(iris[, -5]) ### Distribute data jid <- get.jid(nrow(X)) X.gbd <- X[jid,] ### Standardized N <- allreduce(nrow(X.gbd)) p <- ncol(X.gbd) mu <- allreduce(colSums(X.gbd / N)) X.std <- sweep(X.gbd, 2, mu, FUN = "-") std <- sqrt(allreduce(colSums(X.std^2 / (N - 1)))) X.std <- sweep(X.std, 2, std, FUN = "/") ### Clustering library(pmclust, quiet = TRUE) comm.set.seed(123, diff = TRUE) ret.mb1 <- pmclust(X.std, K = 3) comm.print(ret.mb1) ret.kms <- pkmeans(X.std, K = 3) comm.print(ret.kms) ### Finish finalize() ## End(Not run)