# Object size in R

EDIT: For a more thorough treatment of this subject, please see Hadley Wickham’s post.

Today I was thinking about factors in R. They should be more memory efficient, right? But how much more memory efficient are they compared to other classes? Here’s the scoop:

```> x <- sample(1:3, 1000, replace = TRUE)

> class(x)
[1] "integer"

> object.size(x)
4040 bytes
```

Assuming 40 bytes are for overhead, we see that each integer is stored in 4 bytes, or 32 bits per integer. If one bit stores the sign then the maximum integer is $2^{32} -1$.

```> as.integer(2^31 - 1)
[1] 2147483647
> as.integer(2^31)
[1] NA
```

Sure enough. Back to the original train of thought:

```> object.size(as.numeric(x))
8040 bytes
```

This means that each double precision number is stored as 8 bytes = 64 bits, as expected.

```> object.size(as.factor(x))
4576 bytes
```

Factors have more overhead than integers- but they are stored as the same 32 bit integers. This could be much more of a savings if the value was some long character string.

```> object.size(as.character(x))
8184 bytes
```

This one is a little more mysterious. Why would a single character take up 8 bytes? I don’t have an answer. Remember x was nothing but a sample of 1:3.

```> y <- as.character(x)
> y[y == 1] <- "Here is some long string"
> y[y == 2] <- "And another bunch of letters"
> y[y == 3] <- "Make it even bigger"

[1] "Here is some long string"     "Make it even bigger"
[3] "Here is some long string"     "And another bunch of letters"
[5] "And another bunch of letters" "And another bunch of letters"

> object.size(y)
8256 bytes
```

So even though we went from a string with 1 character to a string with around 20 characters the size of the object hardly changed. It’s worth noting that the class was coerced when we did the operations y == 1.

```> object.size(as.factor(y))
4648 bytes
```

Reassuringly, when converted to a factor it’s consistent with having 32 bit integers as values, plus a bit for overhead. But what if I check the size of a random string of 20 characters?

```> y2 <- sapply(1:1000, function(x) paste(sample(letters, 20), collapse = ""))