## Overview The "Programming with Big Data in [R](http://www.r-project.org/)" project (pbdR) is a set of [highly scalable R](https://www.hpcwire.com/2016/07/06/olcf-researchers-scale-r-tackle-big-science-data-sets/) packages for distributed computing and profiling in data science. Our packages include high performance, high-level interfaces to [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface), [ZeroMQ](http://zeromq.org/), [ScaLAPACK](https://en.wikipedia.org/wiki/ScaLAPACK), [NetCDF4](https://en.wikipedia.org/wiki/NetCDF), [PAPI](http://icl.utk.edu/papi/), and more. While these libraries [shine brightest on large distributed systems](https://www.olcf.ornl.gov/2016/07/05/olcf-expands-data-analytics-capability-with-popular-programming-language/), they also [work rather well on small clusters](https://www.vldb.org/pvldb/vol11/p2168-thomas.pdf#search="pbdR") and usually, surprisingly, even on a laptop with only two cores. Winner of the Oak Ridge National Laboratory 2016 Significant Event Award for "Harnessing HPC Capability at OLCF with the R Language for Deep Data Science." OLCF is the Oak Ridge Leadership Computing Facility, which currently includes Summit, the second most powerful computer system in the world. ## Contact * **Discussion group**: [RBigDataProgramming](https://groups.google.com/forum/?fromgroups#!forum/rbigdataprogramming) (preferred) * **Email**: RBigData AT gmail ## Authors and Citation * [Wei-Chen Chen](http://snoweye.github.io/) * [George Ostrouchov](http://www.csm.ornl.gov/%7Eost/) * Pragneshkumar Patel * [Drew Schmidt](http://wrathematics.github.io/) Please cite individual packages used as well as this web page: ``` @ONLINE{ pbdR2012, author = {Ostrouchov, G. and Chen, W.-C. and Schmidt, D. and Patel, P.}, title = {Programming with Big Data in R}, year = {2012}, organization = {Oak Ridge National Laboratory and University of Tennessee}, url = {http://r-pbd.org/} } ## Cite individual packages by running: citation("package") ``` ## Funding This project, including software, documentation, talks, and tutorials, is/has been supported in part by the following: - Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. - Division of Mathematical Sciences, National Science Foundation, Award No. [1418195](http://www.nsf.gov/awardsearch/showAward?AWD_ID=1418195), 2014-2019. - The National Institute for Mathematical and Biological Synthesis, under Award No. EF-0832858 and DBI-1300426, 2013-2014. - The Division of Molecular and Cellular Biosciences, National Science Foundation Award MCB-1120370, 2013-2014. - The Office of Cyberinfrastructure of the U.S. National Science Foundation under Award No. ARRA-NSF-OCI-0906324 for NICS-RDAV center, 2012-2013. - U.S. Department of Energy Office of Science under Contract No. DE-AC05-00OR22725, 2011-2013. ## Acknowledgements We thank everyone who has submitted a bug report for the pbdR project. We also thank the members of the [CRAN](http://cran.r-project.org/) for their help and suggestions with pbdR packages, as well as their tireless efforts to develop and support R and its extensions.