This evening I was reading Norman Matloff’s excellent book, The Art of R Programming. He mentions,

If you are adding rows or columns one at a time within a loop, and the matrix will eventually become large, it’s better to allocate a large matrix in the first place.

The reason for this is that every time a new matrix is defined the machine needs to allocate the memory, and this can be expensive. But just how much of a performance concern is this? In order to answer that question I wrote a simple example code.

# The TBind function allocates memory to create
# a new matrix with the same name n times, which
# makes it very inefficient.
TBind <- function(n) {
t1 <- Sys.time()
x <- rnorm(10)
for (i in 2:n) {
x <- rbind(x, rnorm(10))
}
t2 <- Sys.time()
return(difftime(t2, t1))
}
# The TAllocate function instead preallocates the entire matrix,
# and then goes back in and fills in each row.
TAllocate <- function(n) {
t1 <- Sys.time()
x <- matrix(nrow = n, ncol = 10)
for (i in 1:n) {
x[i, ] <- rnorm(10)
}
t2 <- Sys.time()
return(difftime(t2, t1))
}

Check out the results when building a matrix with 10 columns and one hundred thousand rows- 1 second versus 22 minutes!

> TAllocate(1e5)
Time difference of 1.148581 secs
> TBind(1e5)
Time difference of 22.04133 mins

And how much time does this take to create this matrix the proper way, without using a for loop?

TEff <- function(n) {
t1 <- Sys.time()
x <- matrix(runif(10 * n), ncol = 10)
t2 <- Sys.time()
return(difftime(t2, t1))
}

Which gives the following result:

> TEff(1e5)
Time difference of 0.206775 secs

Thanks to Matt Leonawicz for the blog post on how to use Sys.time().

### Like this:

Like Loading...