R: Parallel Model-Based Clustering and Parallel K-means...

pmclust and pkmeans {pmclust}

R Documentation

Parallel Model-Based Clustering and Parallel K-means Algorithm

Description

Parallel Model-Based Clustering and Parallel K-means Algorithm

Usage

  pmclust(X = NULL, K = 2, MU = NULL,
    algorithm = .PMC.CT$algorithm, RndEM.iter = .PMC.CT$RndEM.iter,
    CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X,
    rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm)

  pkmeans(X = NULL, K = 2, MU = NULL,
    algorithm = c("kmeans", "kmeans.dmat"),
    CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X,
    rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm)

Arguments

`X`	a GBD row-major matrix or a `ddmatrix`.
`K`	number of clusters.
`MU`	pre-specified centers.
`algorithm`	types of EM algorithms.
`RndEM.iter`	number of Rand-EM iterations.
`CONTROL`	a control for algorithms, see `CONTROL` for details.
`method.own.X`	how `X` is distributed.
`rank.own.X`	who own `X` if `method.own.X = "single"`.
`comm`	MPI communicator.

Details

These are high-level functions for several functions in pmclust including: data distribution, setting global environment .pmclustEnv, initializations, algorithm selection, etc.

The input X is either in ddmatrix or gbd. It will be converted in gbd row-major format and copied into .pmclustEnv for computation. By default, pmclust uses a GBD row-major format (gbdr). While common means that X is identical on all processors, and single means that X only exist on one processor rank.own.X.

Value

These functions return a list with class pmclust or pkmeans.

See the help page of PARAM or PARAM.org for details.

Author(s)

Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.

References

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Programming with Big Data in R Website: http://r-pbd.org/

Examples

## Not run: 
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)

### Load data
X <- as.matrix(iris[, -5])

### Distribute data
jid <- get.jid(nrow(X))
X.gbd <- X[jid,]

### Standardized
N <- allreduce(nrow(X.gbd))
p <- ncol(X.gbd)
mu <- allreduce(colSums(X.gbd / N))
X.std <- sweep(X.gbd, 2, mu, FUN = "-")
std <- sqrt(allreduce(colSums(X.std^2 / (N - 1))))
X.std <- sweep(X.std, 2, std, FUN = "/")

### Clustering
library(pmclust, quiet = TRUE)
comm.set.seed(123, diff = TRUE)

ret.mb1 <- pmclust(X.std, K = 3)
comm.print(ret.mb1)

ret.kms <- pkmeans(X.std, K = 3)
comm.print(ret.kms)

### Finish
finalize()

## End(Not run)

[Package pmclust version 0.1-7 Index]

Parallel Model-Based Clustering and Parallel K-means Algorithm

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples