Overview

The "Programming with Big Data in R" project (pbdR) is a set of highly scalable R packages for distributed computing and profiling in data science.

Our packages include high performance, high-level interfaces to MPI, ZeroMQ, ScaLAPACK, NetCDF4, PAPI, and more. While these libraries shine brightest on large distributed systems, they also work rather well on small clusters and usually, surprisingly, even on a laptop with only two cores.

Winner of the Oak Ridge National Laboratory 2016 Significant Event Award for "Harnessing HPC Capability at OLCF with the R Language for Deep Data Science." OLCF is the Oak Ridge Leadership Computing Facility, which currently includes Summit, the second most powerful computer system in the world.

Contact

Authors and Citation

Please cite individual packages used as well as this web page:

@ONLINE{
  pbdR2012,
  author = {Ostrouchov, G. and Chen, W.-C. and Schmidt, D. and Patel, P.},
  title = {Programming with Big Data in R},
  year = {2012},
  organization = {Oak Ridge National Laboratory and University of Tennessee},
  url = {http://r-pbd.org/}
}


## Cite individual packages by running:

citation("package")

Funding

This project, including software, documentation, talks, and tutorials, is/has been supported in part by the following:

Acknowledgements

We thank everyone who has submitted a bug report for the pbdR project. We also thank the members of the CRAN for their help and suggestions with pbdR packages, as well as their tireless efforts to develop and support R and its extensions.