R: Obtain a Set of Random Samples for X.spmd

assign.N.sample {pmclust}

R Documentation

Obtain a Set of Random Samples for X.spmd

Description

This utility function samples data randomly from X.spmd to form a relatively small subset of original data. The EM algorithm on the smaller subset is topically performing fast and capturing rough structures of entire dataset.

Usage

  assign.N.sample(total.sample = 5000, N.org.spmd)

Arguments

`total.sample`	a total number of samples which will be selected from the original data `X.spmd`.
`N.org.spmd`	the original data size, i.e. `nrow(X.spmd)`.

Details

This utility function performs simple random sampling without replacement for the original dataset X.spmd. Different random seeds should be set before calling this function.

Value

A list variable will be returned and containing:

`N`	total sample size across all S processors
`N.spmd`	sample size of given processor
`N.allspmds`	a collection of sample sizes for all S processors
`ID.spmd`	index of selected samples ranged from 1 to `N.org.spmd`

Note that N and N.allspmds are the same across all S processors, but N.spmd and ID.spmd are most likely all distinct. The lengths of these elements are 1 for N and N.spmd, S for N.allspmd, and N.spmd for ID.spmd.

Author(s)

Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.

References

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Programming with Big Data in R Website: http://r-pbd.org/

Examples

## Not run: 
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)
comm.set.seed(123)

### Generate an example data.
N.org.spmd <- 5000 + sample(1:1000, 1)
ret.spmd <- assign.N.sample(total.sample = 5000, N.org.spmd)
cat("Rank:", comm.rank(), " Size:", ret.spmd$N.spmd,
    "\n", sep = "")

### Quit.
finalize()

## End(Not run)

[Package pmclust version 0.1-7 Index]