Using the tapply function in R

There are lots of apply functions in R (apply, lapply, sapply etc) and they are used instead of loops. I find tapply particularly useful, it applies a function such as sum, mean or length to a subset of a table

There are three components,  tapply(1, 2, 3) where

  1. the vector of data you want to apply the function to
  2. the way to break the data up (this can be a single variable or a list of them
  3. the function (such as length or mean) that you wish to apply to the data

#load up some data

#example – obtain mean petal length per species
tapply(iris$Petal.Length, iris$Species, mean)

#Find the mean petal length for each species where the petal width is 1.4
tapply(iris$Petal.Length[iris$Petal.Width == 1.4], iris$Species[iris$Petal.Width == 1.4], mean, na.rm=TRUE)

#If you have problems with this check the two vectors are the same length
length(iris$Petal.Length[iris$Petal.Width == 1.4])
length(iris$Species[iris$Petal.Width == 1.4])

# example – I can find the average of two different subsets of the data by using a list
tapply(mtcars$mpg, list(mtcars$cyl, mtcars$am), mean)


#Other useful sources

Leave a Reply

Your email address will not be published. Required fields are marked *