An Intro to Rcpp
Higher level languages such as R and Python are easy and quick to write, though they are slow when executed1. Lower level languages2, such as C and C++ take more time to write, but create binaries which are considerably faster when executed. So often high level languages are used to prototype an idea, but once you know what you want to do it is implemented in a lower level language.
1. Sumall
Let's look at the performance of the sumall function across C++, R, and Python. Given an integer argument x, the function sums all the integers from 1 to x.
\[sumall(x) = \sum_{i = 1}^x{i}\]
We'll implement this function using recursion below.
1.1. C++
// timing structure from here: https://stackoverflow.com/questions/22387586/measuring-execution-time-of-a-function-in-c #include <chrono> /* Only needed for the sake of this example. */ #include <iostream> #include <thread> int sumall(int x) { if(x == 1) {return 1;} else {return x + sumall(x - 1);} } int main() { using std::chrono::high_resolution_clock; using std::chrono::duration_cast; using std::chrono::duration; using std::chrono::milliseconds; auto t1 = high_resolution_clock::now(); int answer = sumall(99999); auto t2 = high_resolution_clock::now(); /* Getting number of milliseconds as a double. */ duration<double, std::milli> ms_double = t2 - t1; std::cout << "The answer is " << answer << "\n"; std::cout << "The calculation took " << ms_double.count() << "ms\n"; return 0; }
The | answer | is | 704982704 |
The | calculation | took | 1.14802ms |
So the calculation of sumall(999999) takes 1.23 milliseconds.
1.2. R
start.time <- Sys.time() sumall <- function(x){ if (x == 1) {return(1)} else {return(x + sumall(x - 1))} } sumall(99999) end.time <- Sys.time() time.taken <- end.time - start.time time.taken
Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution halted
So the calculation of sumall(99999) is not possible in R, at least using our recursive method. The recursion nests too deeply. Though the calculation is possible in C++.
import time start = time.time() def sumall(x): if x == 1: return(1) else: return(x + sumall(x - 1)) sumall(99999) end = time.time() print(end - start)
RecursionError: maximum recursion depth exceeded in comparison
Again, the calculation of sumall(99999) is not possible in Python, due to the recursion exceeding the maximum depth. So here we see an example of a calculation that is not just slower in Python/R relative to C++, but cannot be done.
1.3. RCpp
Now let's implement sumall in RCpp.
library(Rcpp) start.time <- Sys.time() cppFunction('int sumall(int x) { if(x == 1) {return 1;} else {return x + sumall(x - 1);} } ') print(paste0("The value of sumall(99999) is ", sumall(99999))) end.time <- Sys.time() time.taken <- end.time - start.time print(paste0("And it took ", round(time.taken, 2), " seconds."))
[1] "The value of sumall(99999) is 704982704" [1] "And it took 3.9 seconds."
And there we go, we can now call the sumall function in R, and because it was implemented in C++ we don't fail due to excessive recursion. Note, the result is slower than pure C++, but the margin will not always be so large. And in most real-world cases, you'll get to take advantage of the ease of data cleaning in R instead of C++.