R: EM-like Steps for GBD

EM-like algorithms {pmclust}

R Documentation

EM-like Steps for GBD

Description

The EM-like algorithm for model-based clustering of finite mixture Gaussian models with unstructured dispersions.

*.dmat's are ddmatrix versions.

Usage

  em.step(PARAM.org)
  aecm.step(PARAM.org)
  apecm.step(PARAM.org)
  apecma.step(PARAM.org)
  kmeans.step(PARAM.org)

  em.step.dmat(PARAM.org)
  kmeans.step.dmat(PARAM.org)

Arguments

PARAM.org

an original set of parameters generated by set.global.

Details

A global variable called X.spmd should exist in the .pmclustEnv environment, usually the working environment. The X.spmd is the data matrix to be clustered, and this matrix has a dimension N.spmd by p.

A PARAM.org will be a local variable inside all EM-linke functions em.step, aecm.step, apecm.step, apecma.step, and kmeans.step, This variable is a list containing all parameters related to models. This function also updates in the parameters by the EM-like algorithms, and return the convergent results. The details of list elements are initially generated by set.global.

Value

A convergent results will be returned the other list variable containing all new parameters which represent the components of models. See the help page of PARAM or PARAM.org for details.

Author(s)

Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.

References

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Programming with Big Data in R Website: http://r-pbd.org/

Chen, W.-C. and Maitra, R. (2011) “Model-based clustering of regression time series data via APECM – an AECM algorithm sung to an even faster beat”, Statistical Analysis and Data Mining, 4, 567-578.

Chen, W.-C., Ostrouchov, G., Pugmire, D., Prabhat, M., and Wehner, M. (2013) “A Parallel EM Algorithm for Model-Based Clustering with Application to Explore Large Spatio-Temporal Data”, Technometrics, (revision).

Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society Series B, 39, 1-38.

Lloyd., S. P. (1982) “Least squares quantization in PCM”, IEEE Transactions on Information Theory, 28, 129-137.

Meng, X.-L. and Van Dyk, D. (1997) “The EM Algorithm.an Old Folk-song Sung to a Fast New Tune”, Journal of the Royal Statistical Society Series B, 59, 511-567.

Examples

## Not run: 
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)
comm.set.seed(123)

### Generate an example data.
N.allspmds <- rep(5000, comm.size())
N.spmd <- 5000
N.K.spmd <- c(2000, 3000)
N <- 5000 * comm.size()
p <- 2
K <- 2
data.spmd <- generate.basic(N.allspmds, N.spmd, N.K.spmd, N, p, K)
X.spmd <- data.spmd$X.spmd

### Run clustering.
PARAM.org <- set.global(K = K)          # Set global storages.
# PARAM.org <- initial.em(PARAM.org)    # One initial.
PARAM.org <- initial.RndEM(PARAM.org)   # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org)     # Run APECMa.
em.update.class()                       # Get classification.

### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")

### Quit.
finalize()

## End(Not run)

[Package pmclust version 0.1-7 Index]