# counts of specimens
<- c(35,19,13,9,6,4,2,2,2,2,2,2,2,1,1,1,1,1,1,1) counts
Using count data with the subsample()
function of the divDyn package.
Although the subsample()
function was developed for estimating turnover rates and diversity changes over multiple time intervals, it was adapted to execute subsampling on single samples.
Let’s say that we have a sample of 20 species, where species have the following number of specimens:
Preparing data
The current version of divDyn (Kocsis et al. 2019) can only accept extended formats for ecological samples. This means that every specimen (identity) has to be present as a separate object. To make the count data above compatible with subsample()
, we can use the rep()
funciton from the base package to repeat placeholders (i.e. letters) that symbolize the specimens that belong to different species.
# one letter that represents each species
<- letters[1:length(counts)]
species
# extended format of the sample
<- rep(species, counts) specimens
Running the table()
function confirms that the structure we created is indeed matching the count data above.
table(specimens)
specimens
a b c d e f g h i j k l m n o p q r s t
35 19 13 9 6 4 2 2 2 2 2 2 2 1 1 1 1 1 1 1
To make the vector of specimens compatible with the subsample()
function, we have to make it a data.frame
.
<- data.frame(specimens, stringsAsFactors=FALSE) samp
As factors are more difficult to work with (unless you intentionally want to), it is easier to write more reliable analyses if you stick with characters.
Richness function
By default, the subsample()
function uses the divDyn()
function as the applied procedure in the subsampling loops. We have to override this by another function that calculates diversity (e.g. richness) estimates from the long format of the sample. For richness, it is easiest to write your own function that just counts the number of unique elements in the vector.
<- function(x) length(unique(x))
richness richness(specimens)
[1] 20
Subsampling
The important thing to note is, that the argument of this function has to be x
. In simple cases where the data.frame
which is fed to the subsample
function has only one column, this represents a vector of specimens.
Now, all that remains is to feed our data to the subsampling. By default, subsample will execute classical rarefaction (Sanders 1968).
library(divDyn)
# Richness subsampled to 10 individuals, number of iterations is 1000
subsample(samp, tax="specimens", q=10, FUN=richness, iter=1000, counter=F)
[1] 5.933
As the target quota is quite low, you need to increase the number of iterations to make the estimate stabilize.
To run Shareholder Quorum Subsampling (Alroy 2010), you have to specify the type
and the q
arguments appropriately.
# Richness subsampled to 10 individuals, number of iterations is 1000
subsample(samp, tax="specimens", q=0.5, type="sqs", FUN=richness, iter=1000, counter=F)
[1] 3.514
You can contrast this with the analytical solution (coverage-based rarefaction) (Chao and Jost 2012) implemented in the iNEXT-package
# download with install.packages(iNEXT)
library(iNEXT)
# estimate Hill numbers for 0.5 coverage
estimateD(counts, base="coverage", level=0.5)
Assemblage m Method Order.q SC qD qD.LCL qD.UCL
1 data 5.044756 Rarefaction 0 0.4999992 3.767626 2.730383 4.804868
2 data 5.044756 Rarefaction 1 0.4999992 3.428733 2.526175 4.331290
3 data 5.044756 Rarefaction 2 0.4999992 3.060668 2.301103 3.820233
Value we are interested in is qD
at the order of 0
, which is somewhat higher than the solution suggested by SQS, but the offset is really not high, concerning that we are talking about fractions of species. The true advantage of using subsample()
is that you can supply any function as FUN
and do not have to stick with Hill-numbers.