We ask all the time, how to make R go faster, and there are a few basic tips and clear working examples, but except for that, it is confusing. We just have to test. It is not unique to R. In C and C++ and Java programming.
A post in R-devel email list made me think I should try harder to remember good tips I have found.
1. Basics. Vectorize. This is a lead pipe cinch. Not Debatable.
No use of  to access elements.
Problem is not for loops per say, but rather the frequent use of . for loops got a bad rap. There's a comment about that in Chambers, Software for Data Analysis
2. SEEK ALGORITHMIC IMPROVEMENTS! This is a big lesson I learned in agent-based modeling.
Poor memory usage and repeated growing of objects is a disaster. Repeated use of cbind or rbind inside loops is a killer. We all have done it as novices, but must stop! There's full worked example I did called stackListItems-01.R. I beat the hell out of that one, I'd say.
3. See Simon Wood, Generalized Additive Models: An introduction with R. This is now my favorite book ever! p. 331 Appendix A "A1 Basic computational efficiency." There is a ridiculous speedup from grouping matrix multiplications.
A and B are n x n matrices. y is a vector.
It is HORRIBLE AWFUL SLOW to do this:
A %*% B %*% y
Much faster to do
A %*% (B %*% y)
The second way B %*% y creates an n x 1 vector, and multiplying A %*% (B%*%y) is HUGELY accelerated. Note multiplying an n x n matrix A times and n x n matrix B requires n * n-squared calculations. Avoid it!
I need a bigger list of "definitely stupid things go do" with matrices.
4. There are some R functions that are optimized for matrix calculations. rowSum. crossprod
1. Want X'X: don't do t(X) %*% X, do crossprod(X) 2. don't do t(X) %*% y, do crossprod(X,y) 3. don't do X %*% t(y, do tcrossprod(X,y) 4. don't do crossprod(v, t(m)), do tcrossprod(v, m)
One explanation I just found in a vignette for RcppEigen. This is the first clear explanation I've found for the speedup we might obtain:
"As shown in the last example, the R function crossprod calculates the product of the transpose
of its first argument with its second argument. The single argument form, crossprod(X),
evaluates X'X. One could, of course, calculate this product as
t(X) %*% X
but crossprod(X) is roughly twice as fast because the result is known to be symmetric and
only one triangle needs to be calculated. The function tcrossprod evaluates crossprod(t(X))
without actually forming the transpose." (Douglas Bates and Dirk Eddelbuettel, "Fast and Elegant Numerical Linear Algebra Using the RcppEiden Package, p.6 (supplied with RcppEigen version 0.3.1.2)).
There's a GREAT overview of several competing approaches summarized here:
Bates, Douglas, (June 2004) "Least Squares Calculations in R: Timing Different Approaches", Rnews, 4(1): 17
Note, after that, Bates went on to adopt the Eigen library for fast matrix algebra, so don't take the conclusion in the 2004 article as a final answer. What you are supposed to marvel at is that it is possible to make calculations go WAY FASTER by being smarter about algorithms and implementation.
5. Try cpmfun() from the compiler function. Try to get an intuition when it really helps. Perhaps it helps best on functions that are badly written (using vector access  a lot). But may help others.
6. Don't forget that accessing and copying memory structures is a slowdown.
A. R has copy on change semantics, knowledge of it may help.
B. Pre-allocating vectors and matrices. May help.
C. I suspect, but have no proof, that creating and accessing data structures "anonymously" may help.
Idea: if one does
X < - someFunction(...) Y <- otherFunction(X, ..) Z <- anotherFunction(Y)
we see slowdown because possibly because we are naming and storing X and Y. what if somebody would do a comprehensive analysis of this construct for me. Is it faster to not name X, Y
Z < - anotherFunction(otherFunction(someFunction(...))
When I worked in Java, those were called anonymous functions
7. Especially nice reviews to refer people to.
Chapter length treatment in Norm Matloff, Art of R Programming. Focuses on vectorization and similar. Good examples. Also demonstrates cmpfun().
Slideshow: Ross Ihaka, Duncan Temple Lang, Brendan McArdle "Writing Efficient Programs in R (and Beyond)"
Very Good Slides Survey:
Duncan Temple Lang, "Efficiency in R: Simple rules, Garbage Collection, Profiling & algorithms" http://www.ms.uky.edu/~mai/sta705/Profiling.pdf. Obviously, not the original source, somebody reposted DTL's notes. But still good:)
This is a complete worked example:
John Nash, http://rwiki.sciviews.org/doku.php?id=tips:rqcasestudy, but the eventual link to the document is: http://macnash.telfer.uottawa.ca/~nashjc/RQtimes.pdf
Nice notes, rather like this list I'm compiling now (wish I had found it first)
Amy F. Szczepanski, "R with High Performance Computing:Parallel processing and large memory"
Things I need to do.
1. Get better at dissecting the Rprof output. It is now EASY to get a profile,
##run R code here
HOWEVER, to me that output is not nearly as informative as a C profile, which shows branching.
In R now, you see how much time is spent in each function, but not HOW MANY TIMES each is called. Thus it is obscure whether there is a speedup to be found. See if there is a full detailed branching profiler for R.
2. Use the foreign function interface. Decide if I'd rather write in C, or CPlusPlus, of Fortran.
Test Rcpp and examples from Dirk Eddelbuettel. See his post in R-devel December 8, 2012. It discusses a clear working example.
Doug Bates's announcement of RcppEigen to the Eigen email list, Jun 28, 2011.
I worry students and I will "loose contact" with what we can understand by relying on simplifying libraries and sleek packaging. I'm in favor of sleek, but when something goes wrong, we feel helpless, unable to debug. Here's some grounding!
Søren Højsgaard (2012-09-08) "Compiling Rcpp, RcppArmadillo and RcppEigen code using SHLIB
(Old habbits die hard)"
Barr, Stephen J. "Blog: Using RInside With RcppEigen"