R: Generate MixSim Examples for Testing

generate.MixSim {pmclust}

R Documentation

Generate MixSim Examples for Testing

Description

This function utilizes MixSim to generate sets of data for testing algorithms.

Usage

  generate.MixSim(N, p, K, MixSim.obj = NULL, MaxOmega = NULL,
                  BarOmega = NULL, PiLow = 1.0, sph = FALSE, hom = FALSE)

Arguments

`N`	total sample size across all S processors, i.e. sum over `N.spmd` is `N`.
`p`	dimension of data `X.spmd`, i.e. `ncol(X.spmd)`.
`K`	number of clusters.
`MixSim.obj`	an object returned from `MixSim`.
`MaxOmega`	maximum overlap as in `MixSim`.
`BarOmega`	averaged overlap as in `MixSim`.
`PiLow`	lower bound of mixture proportion as in `MixSim`.
`sph`	sph as in `MixSim`.
`hom`	hom as in `MixSim`.

Details

If MixSim.obj is NULL, then BarOmega and MaxOmega will be used in MixSim to obtain a new MixSim.obj.

Value

A set of simulated data and information will be returned in a list variable including:

`K`	number of clusters, as the input
`p`	dimension of data `X.spmd`, as the input
`N`	total sample size, as the input
`N.allspmds`	a collection of sample sizes for all S processors, as the input
`N.spmd`	total sample size of given processor, as the input
`X.spmd`	generated data set with dimension with dimension `N.spmd * p`
`CLASS.spmd`	true id of each data, a vector of length `N.spmd` and has values from 1 to `K`
`N.CLASS.spmd`	true sample size of each clusters, a vector of length `K`
`MixSim.obj`	the true model where data `X.spmd` generated from

Author(s)

Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.

References

Melnykov, V., Chen, W.-C. and Maitra, R. (2012) “MixSim: Simulating Data to Study Performance of Clustering Algorithms”, Journal of Statistical Software, (accepted).

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Programming with Big Data in R Website: http://r-pbd.org/

Examples

## Not run: 
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)

### Generate an example data.
N <- 5000
p <- 2
K <- 2
data.spmd <- generate.MixSim(N, p, K, BarOmega = 0.01)
X.spmd <- data.spmd$X.spmd

### Run clustering.
PARAM.org <- set.global(K = K)          # Set global storages.
# PARAM.org <- initial.em(PARAM.org)    # One initial.
PARAM.org <- initial.RndEM(PARAM.org)   # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org)     # Run APECMa.
em.update.class()                       # Get classification.

### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")
comm.cat("# of class (true):", data.spmd$N.CLASS.spmd, "\n")

### Quit.
finalize()

## End(Not run)

[Package pmclust version 0.1-7 Index]