Using count data with the subsample() function of the divDyn package.


divDyn iNEXT R

No matching items

Although the subsample() function was developed for estimating turnover rates and diversity changes over multiple time intervals, it was adapted to execute subsampling on single samples.

Let’s say that we have a sample of 20 species, where species have the following number of specimens:

# counts of specimens
counts <- c(35,19,13,9,6,4,2,2,2,2,2,2,2,1,1,1,1,1,1,1)

Preparing data

The current version of divDyn (Kocsis et al. 2019) can only accept extended formats for ecological samples. This means that every specimen (identity) has to be present as a separate object. To make the count data above compatible with subsample(), we can use the rep() funciton from the base package to repeat placeholders (i.e. letters) that symbolize the specimens that belong to different species.

# one letter that represents each species
species <- letters[1:length(counts)]

# extended format of the sample
specimens <- rep(species, counts)

Running the table() function confirms that the structure we created is indeed matching the count data above.

 a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t 
35 19 13  9  6  4  2  2  2  2  2  2  2  1  1  1  1  1  1  1 

To make the vector of specimens compatible with the subsample() function, we have to make it a data.frame.

samp <- data.frame(specimens, stringsAsFactors=FALSE)

As factors are more difficult to work with (unless you intentionally want to), it is easier to write more reliable analyses if you stick with characters.

Richness function

By default, the subsample() function uses the divDyn() function as the applied procedure in the subsampling loops. We have to override this by another function that calculates diversity (e.g. richness) estimates from the long format of the sample. For richness, it is easiest to write your own function that just counts the number of unique elements in the vector.

richness <- function(x) length(unique(x))
[1] 20


The important thing to note is, that the argument of this function has to be x. In simple cases where the data.frame which is fed to the subsample function has only one column, this represents a vector of specimens.

Now, all that remains is to feed our data to the subsampling. By default, subsample will execute classical rarefaction (Sanders 1968).

# Richness subsampled to 10 individuals, number of iterations is 1000
subsample(samp, tax="specimens", q=10, FUN=richness, iter=1000, counter=F)
[1] 5.88

As the target quota is quite low, you need to increase the number of iterations to make the estimate stabilize.

To run Shareholder Quorum Subsampling (Alroy 2010), you have to specify the type and the q arguments appropriately.

# Richness subsampled to 10 individuals, number of iterations is 1000
subsample(samp, tax="specimens", q=0.5, type="sqs", FUN=richness, iter=1000, counter=F)
[1] 3.531

You can contrast this with the analytical solution (coverage-based rarefaction) (Chao and Jost 2012) implemented in the iNEXT-package

# download with install.packages(iNEXT)

# estimate Hill numbers for 0.5 coverage
estimateD(counts, base="coverage", level=0.5)
  Assemblage        m      Method Order.q        SC       qD   qD.LCL   qD.UCL
1       data 5.044756 Rarefaction       0 0.4999992 3.767626 2.613546 4.921705
2       data 5.044756 Rarefaction       1 0.4999992 3.428733 2.409730 4.447735
3       data 5.044756 Rarefaction       2 0.4999992 3.060668 2.185915 3.935420

Value we are interested in is qD at the order of 0, which is somewhat higher than the solution suggested by SQS, but the offset is really not high, concerning that we are talking about fractions of species. The true advantage of using subsample() is that you can supply any function as FUN and do not have to stick with Hill-numbers.


Thanks to Erin Saupe for asking about the example and indirectly to Seth Finnegan for his original course material.


Alroy, John. 2010. “The Shifting Balance of Diversity Among Major Marine Animal Groups.” Science 329: 1191–94.
Chao, Anne, and Lou Jost. 2012. “Coverage-Based Rarefaction and Extrapolation: Standardizing Samples by Completeness Rather Than Size.” Ecology 93 (12): 2533–47.
Kocsis, Ádám T., Carl J. Reddin, John Alroy, and Wolfgang Kiessling. 2019. “The R Package divDyn for Quantifying Diversity Dynamics Using Fossil Sampling Data.” Methods in Ecology and Evolution 10 (5): 735–43.
Sanders, Howard L. 1968. “Marine Benthic Diversity: A Comparative Study.” The American Naturalist 102 (925): 243–82.