An Intro to Rcpp

Table of Contents

Higher level languages such as R and Python are easy and quick to write, though they are slow when executed1. Lower level languages2, such as C and C++ take more time to write, but create binaries which are considerably faster when executed. So often high level languages are used to prototype an idea, but once you know what you want to do it is implemented in a lower level language.

1. Sumall

Let's look at the performance of the sumall function across C++, R, and Python. Given an integer argument x, the function sums all the integers from 1 to x.

\[sumall(x) = \sum_{i = 1}^x{i}\]

We'll implement this function using recursion below.

1.1. C++

// timing structure from here: https://stackoverflow.com/questions/22387586/measuring-execution-time-of-a-function-in-c
#include <chrono>

/* Only needed for the sake of this example. */
#include <iostream>
#include <thread>

int sumall(int x)
{
  if(x == 1)
    {return 1;}
  else
    {return x + sumall(x - 1);}
}

int main()
{
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;

    auto t1 = high_resolution_clock::now();
    int answer = sumall(99999);
    auto t2 = high_resolution_clock::now();

    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;

    std::cout << "The answer is " << answer << "\n";
    std::cout << "The calculation took " << ms_double.count() << "ms\n";
    return 0;
}
The answer is 704982704
The calculation took 1.14802ms

So the calculation of sumall(999999) takes 1.23 milliseconds.

1.2. R

start.time <- Sys.time()
sumall <- function(x){
if (x == 1)
{return(1)}
else
{return(x + sumall(x - 1))}
}
sumall(99999)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution halted

So the calculation of sumall(99999) is not possible in R, at least using our recursive method. The recursion nests too deeply. Though the calculation is possible in C++.

import time

start = time.time()
def sumall(x):
    if x == 1:
        return(1)
    else:
        return(x + sumall(x - 1))

sumall(99999)
end = time.time()
print(end - start)

RecursionError: maximum recursion depth exceeded in comparison

Again, the calculation of sumall(99999) is not possible in Python, due to the recursion exceeding the maximum depth. So here we see an example of a calculation that is not just slower in Python/R relative to C++, but cannot be done.

1.3. RCpp

Now let's implement sumall in RCpp.

library(Rcpp)

start.time <- Sys.time()
cppFunction('int sumall(int x)
{
  if(x == 1)
    {return 1;}
  else
    {return x + sumall(x - 1);}
}
')

print(paste0("The value of sumall(99999) is ", sumall(99999)))

end.time <- Sys.time()
time.taken <- end.time - start.time
print(paste0("And it took ", round(time.taken, 2), " seconds."))

[1] "The value of sumall(99999) is 704982704"
[1] "And it took 3.9 seconds."

And there we go, we can now call the sumall function in R, and because it was implemented in C++ we don't fail due to excessive recursion. Note, the result is slower than pure C++, but the margin will not always be so large. And in most real-world cases, you'll get to take advantage of the ease of data cleaning in R instead of C++.

Footnotes:

1

Often writing the code is the major speed bottleneck, and so in this sense high level languages can be very 'fast'.

2

High and low level means how far the code is from the machine. Think of the machine as the base. Higher level languages are farther from the machine than lower level languages.

Author: Matt Brigida, Ph.D.

Created: 2022-06-24 Fri 11:03

Validate