R: Parallel Apply and Lapply Functions

apply and lapply {pbdMPI}

R Documentation

Parallel Apply and Lapply Functions

Description

The functions are parallel versions of apply and lapply functions.

Usage

pbdApply(X, MARGIN, FUN, ..., pbd.mode = c("mw", "spmd", "dist"),
         rank.source = .pbd_env$SPMD.CT$rank.root,
         comm = .pbd_env$SPMD.CT$comm,
         barrier = TRUE)
pbdLapply(X, FUN, ..., pbd.mode = c("mw", "spmd", "dist"),
          rank.source = .pbd_env$SPMD.CT$rank.root,
          comm = .pbd_env$SPMD.CT$comm,
          bcast = FALSE, barrier = TRUE)
pbdSapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE,
          pbd.mode = c("mw", "spmd", "dist"),
          rank.source = .pbd_env$SPMD.CT$rank.root,
          comm = .pbd_env$SPMD.CT$comm,
          bcast = FALSE, barrier = TRUE)

Arguments

`X`	a matrix or array in `pbdApply()` or a list in `pbdLapply()` and `pbdSapply()`.
`MARGIN`	`MARGIN` as in the `apply()`.
`FUN`	as in the `apply()`.
`...`	optional arguments to `FUN`.
`simplify`	as in the `sapply()`.
`USE.NAMES`	as in the `sapply()`.
`pbd.mode`	mode of distributed data `X`.
`rank.source`	a rank of source where `X` broadcast from.
`comm`	a communicator number.
`bcast`	if bcast to all ranks.
`barrier`	if barrier for all ranks.

Details

All functions are majorly called in manager/workers mode (pbd.model = "mw"), and just work the same as their serial version.

If pbd.mode = "mw", the X in rank.source (master) will be redistributed to processors (workers), then apply FUN on the new data, and results are gathered to rank.source. “In SPMD, master is one of workers.” ... is also scatter() from rank.source.

If pbd.mode = "spmd", the same copy of X is supposed to exist in all processors, and original apply(), lapply(), or sapply() is operated on part of X. An allgather() or gather() call is required to aggregate results manually.

If pbd.mode = "dist", the different X is supposed to exists in all processors, i.e. ‘distinct or distributed’ X, and original apply(), lapply(), or sapply() is operated on the all X. An allgather() or gather() call is required to aggregate results manually.

In SPMD, it is better to split data into pieces, and X is a local matrix in all processors. Originally, apply() should be sufficient in this case.

Value

A list or matrix will be returned.

Author(s)

Wei-Chen Chen wccsnow@gmail.com, George Ostrouchov, Drew Schmidt, Pragneshkumar Patel, and Hao Yu.

References

Programming with Big Data in R Website: http://r-pbd.org/

Examples

## Not run: 
### Save code in a file "demo.r" and run with 2 processors by
### SHELL> mpiexec -np 2 Rscript demo.r

### Initial.
suppressMessages(library(pbdMPI, quietly = TRUE))
init()
.comm.size <- comm.size()
.comm.rank <- comm.rank()

### Example for pbdApply.
N <- 100
x <- matrix((1:N) + N * .comm.rank, ncol = 10)
y <- pbdApply(x, 1, sum, pbd.mode = "mw")
comm.print(y)

y <- pbdApply(x, 1, sum, pbd.mode = "spmd")
comm.print(y)

y <- pbdApply(x, 1, sum, pbd.mode = "dist")
comm.print(y)


### Example for pbdApply for 3D array.
N <- 60
x <- array((1:N) + N * .comm.rank, c(3, 4, 5))
dimnames(x) <- list(lat = paste("lat", 1:3, sep = ""),
                    lon = paste("lon", 1:4, sep = ""),
                    time = paste("time", 1:5, sep = ""))
comm.print(x[,, 1:2])

y <- pbdApply(x, c(1, 2), sum, pbd.mode = "mw")
comm.print(y)

y <- pbdApply(x, c(1, 2), sum, pbd.mode = "spmd")
comm.print(y)

y <- pbdApply(x, c(1, 2), sum, pbd.mode = "dist")
comm.print(y)


### Example for pbdLapply.
N <- 100
x <- split((1:N) + N * .comm.rank, rep(1:10, each = 10))
y <- pbdLapply(x, sum, pbd.mode = "mw")
comm.print(unlist(y))

y <- pbdLapply(x, sum, pbd.mode = "spmd")
comm.print(unlist(y))

y <- pbdLapply(x, sum, pbd.mode = "dist")
comm.print(unlist(y))

### Finish.
finalize()

## End(Not run)

[Package pbdMPI version 0.3-1 Index]