Skip to content
Jan Gorecki edited this page Apr 8, 2020 · 5 revisions

This wiki page is meant to collect information that are useful for efficient R C api use, which itself is not very well documented.

finding C source of base R function

pryr package is doing that very well.

#install.packages("pryr")

print(sum) # take the body and paste into pryr::show_c_source
pryr::show_c_source(.Primitive("sum"))

difference of length and truelength (by @mattdowle)

truelength is the allocated length. length is amount used. truelength was an unused field in R until recently. Now, finally, truelength is used as it was intended by Ross originally (allocated length).

releasing memory after setting new truelength (by @mattdowle)

You can't set a new truelength. That's the actual allocation on R's heap / or allocated using malloc by R (R can do both depending on the size of the vector and how it has been configured/compiled). If a length is set smaller than truelength, though, which we do in data.table (e.g. at the end of fread) then the memory leak can be solved. I was told by an R core member there is a new 'growable' bit that can be set. When growable is set, gc() releases truelength rather than length, so the workarounds at the top of assign.c can be removed. It should have been like that in the first place in R, but for whatever reason they didn't use truelength at all

lazy evaluation handling in C (by @2005m)

Very good example is the code contributed in fcaseR by @2005m, related lines are https://github.com/Rdatatable/data.table/pull/4021/files#diff-25cd0b0c089d5976de15097388ff5683R153-R162

when should I NOT use restrict when declaring C pointer? (by @mattdowle)

We should not use restrict when two threads update a shared variable, for example from within an atomic or critical, iiuc. I even found something online somewhere that even const together with restrict is beneficial too.

printing and raising exceptions from openmp parallel region (by @jangorecki)

We use OpenMP for making many routines parallelized. Special care has to be taken inside the regions that uses OpenMP. One of the restrictions is that you must not print to console, or raise exceptions. One way to deal with it is to defer those, and emit outside of parallel region. If the exception is happening then you can set own flag variable, then based on that flag escape all further computations (from all threads). Once outside of parallel region, raise the exception, or emit print.

In data.table we have a dedicated structure, that meant to carry results of the computation together with console output, messages, warnings, errors. Then it is easy to pass all those informations between functions, as a single object. This structure, named ans_t, defined in src/types.h, has been used in rolling function, and NA fill function. If you would like to use ans_t please see usage of it in those functions.

measure time (by @st-pasha)

The easy way to measure time in a platform independent way is to use OpenMP omp_get_wtime function.

const bool verbose = GetVerbose();
double tic, toc;
if (verbose)
  tic = omp_get_wtime();
/* my processing */
if (verbose)
  toc = omp_get_wtime();
if (verbose)
  Rprintf("My processing took %.3fs\n", toc - tic);
Clone this wiki locally