install.packages("chronosphere")
Basic diversity dynamics of scleractinian corals with divDyn
Goals and introduction
The purpose of this vignette is to guide new users through data acquisition and analysis, as well as to demonstrate the basic capabilities of the divDyn R (Kocsis et al. 2019) package. This tutorial will use Scleractininan corals as an example group which was selected for data quality reasons - and as we have readily available additional information about the coral taxa with which to customize the analyses.
Occurrence data download
To run analyses, first the occurrence data have to be downloaded and processed. You can download all PBDB occurrences using the chronosphere
R package. The chronosphere
is an initiative we have started at GeoZentrum Nordbayern in Erlangen, to facilitate the use, tracability, portability and version controlling of data in scientific analyses.
The package is available from the CRAN repositories and can be installed with the following line of code (you will probably have to select your preferred download mirror).
After the package is installed, it is ready for use, you can attach it with:
library(chronosphere)
Chronosphere - Evolving Earth System Variables
Important: never fetch data as a superuser / with admin. privileges!
Note that the package was split for efficient maintenance and development:
- Plate tectonic calculations -> package 'rgplates'
- Arrays of raster and vector spatials -> package 'via'
# omit 'ver="20240506"' for most up-to-date data!
<- fetch(src="pbdb", ver="20240506") dat
------------------------------------------------------------
Accessing chronosphere registry tables.
------------------------------------------------------------
Downloading registry tables.
------------------------------------------------------------
Item no. 2445, src:pbdb, ser: occs3, ver: 20240506, res: species.
------------------------------------------------------------
------------------------------------------------------------
Downloading import code.
------------------------------------------------------------
------------------------------------------------------------
Downloading data file.
------------------------------------------------------------
MD5 checksum passed.
If you use the data in publications, please cite its
reference(s), as well as that of the 'chronosphere' project.
Please remember to acknowledge the Paleobiology Database (http://paleobiodb.org) in your publication.
Alternatively, you can use: The Paleobiology Database Community. (year). Denormalized occurrence table of the Paleobiology Database. Zenodo. https://doi.org/10.5281/zenodo.7808071
This is a pre-processed version of the PBDB that was downloaded with this API call:
attributes(dat)$chronosphere$API
[1] "https://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&interval=Ediacaran,Holocene&show=class,classext,genus,subgenus,coll,coords,loc,paleoloc,strat,stratext,lith,env,ref,crmod,timebins,timecompare,refattr,entname,attr,ident,img,plant,abund,ecospace,taphonomy,etbasis,pres,prot,lithext,geo,methods,resgroup,refattr,ent"
You can copy and paste this URL to your browser and the PBDB would compile the current corresponding dataset for you. It normally takes about 15-30 minutes to get the dataset, another couple of minutes to download the .csv, which you still have to read in using read.table()
.
The PBDB API allows much more flexible download, but this denormalized version of the PBDB is enough for the overwhelming majority of the analyses. There are some cases, when the data you are interested in are not in this subset and you have to use the API yourself.
Data processing
Installing and loading divDyn
To process the stratigraphic information in the downloaded occurrence file, and do the analyses, you have to to install and attach the divDyn package.
The package is available from the CRAN repositories and can be installed with the following line of code (you will probably have to select your preferred download mirror).
install.packages("divDyn")
After the package is installed, it is ready for use, you can attach it with:
library(divDyn)
v0.8.2. See latest changes with 'news(package="divDyn")'. Default:
Time flows forward with 'bin', and backward with 'age' and 'slice()'.
Initial filtering
The downloaded dataset needs to be subsetted to represent the order of scleractinian corals.
<- dat[dat$order=="Scleractinia",] dat
The analyses will be conducted at the level of genera.
The entries that have no valid genus names in the PaleoDB can be omitted at this stage. They will not provide us any information about the evolution of the group.
# omit non-genus entries
<- dat[!dat$genus=="", ] dat
The dataset should be cleaned as much as possible at this stage, for didactic purposes and for keeping things simple, this step is skipped in this tutorial.
Stratigraphic resolution
Every collection in the the PaleoDB has a user entered 'early_interval'
and 'late_interval'
field. These are fields are used to place the collection in stratigraphy/geological time. The 'early_interval'
collection is mandatory for every entry while the 'late_interval'
is only necessary if the taxon does not fit in the interval specified as 'early_interval'
. Otherwise the 'late_interval'
field is blank (empty quotes: ""
).
The divDyn package uses two frequently used, discrete time scales that we use as the finest stratigraphic resolution of our analyses: the stage level time scale (stages
, 95 bins) and the ten million year time scale (tens
, 49 bins). These can be attached with the data()
function:
data(stages)
data(tens)
These will be useful for plotting as they contain the age estimates for the intervals, but they are not too useful for finding which interval the 'early_interval'
and 'late_interval'
fields point to. This can be found in the keys
object.
data(keys)
This list is designed to be used with the categorize()
function of the package, that combines the large number of entries and replaces them with the name of the group they belong to. This has to be run on both the 'early_interval'
and the 'late_interval'
fields.
# the 'stg' entries (lookup)
<- categorize(dat[ ,"early_interval"], keys$stgInt)
stgMin <- categorize(dat[ ,"late_interval"], keys$stgInt) stgMax
The output of this function is either an interval number that corresponds to the stages
timescale, -1
, indicating that the value was empty quotes, or NA
, if the interval is either not in the lookup table or is too broad.
Because this scheme is designed for general use, the output of this function is a vector of character values. These have to be demoted to numeric values, otherwise their order would not be correct.
# convert to numeric
<- as.numeric(stgMin)
stgMin <- as.numeric(stgMax) stgMax
Now these two vectors have to be combined. There are more solutions to solve the problem of stratigraphic uncertainty, but for the sake of simplicity, let’s just omit every occurrence that cannot be assigned to a single stage in the stages
timescale. For that, we will create an empty container, check whether an the interval entry for the occurrence satisfies the condition above, and if so, save the interval ID.
# empty container
$stg <- rep(NA, nrow(dat))
dat
# select entries, where
<- c(
stgCondition # the early and late interval fields indicate the same stg
which(stgMax==stgMin),
# or the late_intervar field is empty
which(stgMax==-1))
# in these entries, use the stg indicated by the early_interval
$stg[stgCondition] <- stgMin[stgCondition] dat
Now every occurrence entry either has a single integer number in the stg
column, or NA
as its interval identifier. Note that the code above cannot be used for Cambrian and Ordovician occurrences due to stratigraphic problems. If you are interested in assigning these to stage-level intervals, we recommend to take a look at the vignette we wrote that describes global marine, Phanerozoic scale analyses:
https://github.com/divDyn/ddPhanero/blob/master/doc/1.0.1/dd_phanero.pdf
The assignments can be repeated for the 10-million year bins with the following commands:
# a. categorize interval names to bin numbers
# categorize is the new function of the package
<-categorize(dat[,"early_interval"],keys$tenInt)
binMin<-categorize(dat[,"late_interval"],keys$tenInt)
binMax# convert to numeric
<-as.numeric(binMin)
binMin<-as.numeric(binMax)
binMax# b. resolve max-min interval uncertainty
# final variable (empty)
$bin <- rep(NA, nrow(dat))
dat# use entries, where
<- c(
binCondition # the early and late interval fields indicate the same bin
which(binMax==binMin),
# or the late_interval field is empty
which(binMax==-1))
# in these entries, use the bin indicated by the early_interval
$bin[binCondition] <- binMin[binCondition] dat
Sampling assessment
You can assess how well the stratigraphic resolution turned out by running the table()
function on the interval-identifier vector:
table(dat$stg)
48 49 54 55 56 57 58 59 60 61 62 63 64 65 66 67
1 5 114 72 448 910 602 101 86 248 80 87 493 383 231 1954
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
1149 988 157 288 207 483 1023 390 536 128 71 158 298 742 396 289
84 85 86 87 88 89 90 91 92 93 94 95
296 295 466 652 519 1193 1772 1194 1465 1903 7829 1201
Summing it tells us how many of the overall occurrences we can assign to stratigraphic stages.
sum(table(dat$stg))
[1] 31903
# which is a
sum(table(dat$stg))/nrow(dat)
[1] 0.8511779
# proportion of the total data.
As we cannot use unresolved occurrences in any way, coding will be easier if we omit them at this stage.
# omit unbinned
<- dat[!is.na(dat$stg),] dats
If you take a look at the stage object, you can notice that besides the few entries in stages 37
, 49
and 50
, Scleractinian become important fossils in the Anisian stage.
# omit Paleozoic
<- dats[dats$stg>52,] dats
You can get a more organized view of sampling parameters by running the binstat()
function in divDyn, that calculates the occurrence, collection, and reference counts in a single line. This is the general use of the high-level function of the package: you state the occurrence data.frame
as the first, and then the column names as additional arguments. The column tax
stands for the taxon names, bin
for the discrete time bins, coll
for collection identifiers and ref
for reference identifiers.
<- binstat(dats, tax="genus", bin="stg",
bsFull coll="collection_no", ref="reference_no")
The database contains duplicate occurrences (multiple species/genus).
$occs bsFull
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[16] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[31] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[46] NA NA NA NA NA NA NA NA 114 72 448 910 602 101 86
[61] 248 80 87 493 383 231 1954 1149 988 157 288 207 483 1023 390
[76] 536 128 71 158 298 742 396 289 296 295 466 652 519 1193 1772
[91] 1194 1465 1903 7829 1201
This output is organizes so that the index of the values in the vector match up the bin identifier (e.g. 60th value is for stg 60
). The default setting of the function will output a message about duplicate occurrences. This warns us that there are collections with more than one genus entries in a collection (i.e. more than one species/genus). If you are interested in genus-level analyses, it is probably better to count these as one, which you can do with duplicates=FALSE
option.
<- binstat(dats, tax="genus", bin="stg",
bs coll="collection_no", ref="reference_no", duplicates=FALSE)
$occs bs
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[16] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[31] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[46] NA NA NA NA NA NA NA NA 99 66 357 769 444 77 72
[61] 228 77 74 413 292 206 1665 906 701 126 243 190 387 822 349
[76] 452 119 63 146 228 629 360 252 256 226 411 508 410 941 1192
[91] 964 1119 1474 6157 972
Plotting
Plotting these variables is probably better then just looking at the numbers. The package includes a powerful time-scale plotting function that lets you visualize the how time is broken down to discrete intervals. This highly-customizable function is built on the basic plot()
function, and most of its arguments are inherited. The following function call will draw plot with in the past 250 million years, with series-level shading and system-level boxes at the bottom:
tsplot(stages, boxes="sys", boxes.col="systemCol",
shading="series", xlim=c(250, 0), ylim=c(0,2000))
For the sake of simplicity it is best to combine the output of binstat()
wth that timescale object. Since the stg
column indicates that same time slices in both objects, this step is easiest to accomplish with a simple merge()
function call. This step should take care of any temporal mismatches between the two objects and will ensure that the correct values are plotted on th the geological timescale.
<- merge(stages, bs, by="stg")
bsMerged
# inspect object
str(bsMerged)
'data.frame': 95 obs. of 25 variables:
$ stg : int 1 2 3 4 5 6 7 8 9 10 ...
$ sys : chr "E" "E" "E" "Cm" ...
$ system : chr "Ediacaran" "Ediacaran" "Ediacaran" "Cambrian" ...
$ series : chr "Upper Ediacaran" "Upper Ediacaran" "Upper Ediacaran" "Terreneuvian" ...
$ stage : chr "Avalon Assemblage" "White Sea Assemblage" "Nama Assemblage" "Fortunian" ...
$ short : chr "Ava" "Whs" "Nam" "For" ...
$ bottom : num 580 560 550 539 529 ...
$ mid : num 570 555 544 534 525 ...
$ top : num 560 550 539 529 521 ...
$ dur : num 20 10 11.2 9.8 8 6.5 5.5 4.5 4 3.5 ...
$ systemCol: chr "#fddd88" "#fddd88" "#fddd88" "#94ac72" ...
$ seriesCol: chr "#fddd88" "#fddd88" "#fddd88" "#9db989" ...
$ col : chr "#fcd589" "#fdd587" "#fed583" "#a9be93" ...
$ occs : num NA NA NA NA NA NA NA NA NA NA ...
$ colls : num NA NA NA NA NA NA NA NA NA NA ...
$ refs : num NA NA NA NA NA NA NA NA NA NA ...
$ SIBs : num NA NA NA NA NA NA NA NA NA NA ...
$ occ1 : num NA NA NA NA NA NA NA NA NA NA ...
$ occ2 : num NA NA NA NA NA NA NA NA NA NA ...
$ ref1 : num NA NA NA NA NA NA NA NA NA NA ...
$ ref2 : num NA NA NA NA NA NA NA NA NA NA ...
$ u : num NA NA NA NA NA NA NA NA NA NA ...
$ chao1occ : num NA NA NA NA NA NA NA NA NA NA ...
$ uPrime : num NA NA NA NA NA NA NA NA NA NA ...
$ chao1ref : num NA NA NA NA NA NA NA NA NA NA ...
Then you can draw the number of occurrences with lines()
.
tsplot(stages, boxes="sys", boxes.col="systemCol",
shading="series", xlim=c(250, 0), ylim=c(0,2000), ylab="Number occurrences")
lines(bsMerged$mid, bsMerged$occs)
We will mostly use the same basic function call for plotting, and typing all these arguments is time consuming. But we can spare some time, if we defined a wrapper function around the plotting that executes the plotting call the with the same arguments that we want to keep the same, while allowing us to change what we would like to change. This is fairly straightforward with the ...
argument that allows you to pass arguments through function calls.
<- function(...) tsplot(stages, boxes="sys", boxes.col="systemCol",
tp shading="series", xlim=52:95, ...)
Now, every time you call the newly defined function tp()
it will have the same colour, shading and x-interval, while you can change other arguments of the tsplot()
function. For instance if you wanted see how the number of collections change over time, you can draw a similar plot as above, with:
tp(ylim=c(0,350), ylab="Number of collections")
lines(bsMerged$mid, bsMerged$colls)
The nicest feature of this function is that you do not have to use the built-in data, you can create your own timescale, as long as you use the same structure as in stages and specify the top and bottom boundaries of the intervals (top and bottom columns).
It is not surprising that occurrences and collection numbers have the same overall trajectory. As you can see, the sampling of corals is highly volatile over time, with a couple of marked peaks in the late Triassic, Late Jurassic, mid and Late Creteacoues and the Neogene, that we have to take into consideration in some form, when describing the evolution of the group.
Richness through time
Now that we know how sampling changed over time, we can calculate a lot of diversity, extinction and origination rate series from the dataset with the basic divDyn()
function. This function requires an occurrence dataset with a taxon (tax
) and a discrete time (bin
) columns.
<- divDyn(dats, tax="genus", bin="stg") dd
The output of this function resembles that of the binstat() function. Variables are organized with their names. - Names starting with t- (e.g. t3) are taxon counts - Names starting with div- are diversity estimators. - Names starting with ori- and ext- are origination and extinction rate estimators, respectively. - Names starting with samp- are measures of sampling completeness. The abbreviations are resolved in the help file that can be accessed with
?divDyn
This file also includes all equations that are used to calculate the variables.
It is recommended to merge the output of the divDyn()
function with that of the timescale, similar to how we did it earlier.
<- merge(stages, dd, by="stg") ddMerged
The most basic way to count richness over time is with range-through diversities (divRT
). This is simply just the count of taxa that was supposed to be living in an interval (assuming presence if it was found before and after it, if it is not sampled). You can plot these with:
# simple diversity
tp(ylim=c(0,300), ylab="richness")
lines(ddMerged$mid, ddMerged$divRT, lwd=2)
This is the metric that can be applied to Sepkoski’s compendium (Sepkoski 2002). Note that diversity in the Cretaceous is comparable to those that corals demonstrated in the Neogene, with a slight decrease in the Cretaecous. The sharp drop in the last bin is also worthy of notion. This is an edge effect and is also partially present because the fossil information in the Holocene is usually worse in the PaleoDB. However, adding all recent taxa to the dataset would also create an artifact known as the Pull of the Recent. The complete preservation would spuriously increase range-based diversity estimates in the last few bins.
Subsampling
Ultimately, all time series of richness suffer from the heterogeneity of sampling in space and time, as well as a number of other factors. For instance, the temporal bias is clear, as there is a high correlation between sampling and richness through time.
cor.test(dd$divRT, bs$occs, method="spearman")
Warning in cor.test.default(dd$divRT, bs$occs, method = "spearman"): Cannot
compute exact p-value with ties
Spearman's rank correlation rho
data: dd$divRT and bs$occs
S = 6390.6, p-value = 0.001223
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.4821689
Subsampling aims at estimating what values the focus variables would take, if sampling were uniform over time. This methodology was developed for richness estimation (in an analytical context first, i.e. algegra and equations and then with Monte Carlo simulations), but can be generalized to other variables. You are invited to check out this generalization in the package Handout that you can access with the
vignette("handout")
command.
Classical Rarefaction
As all intervals have sampling at least 50 occurrences, let’s start with rarefying every stage to 50
occurrences. You can accomplish this by running:
<- subsample(dats, tax="genus", bin="stg",
cr50 q=50, coll="collection_no", duplicates=FALSE, iter=300)
This function call will run the default Classical Rarefaction method on every interval 300
times, calculate the entire divDyn output matrix in every trial and averages the results. With specifying coll=collection_no
and duplicates=FALSE
, we instruct the function to treat the multiple occurrences of the same genus in a collection as one. It’s structure is the same as that of divDyn()
’s, so you can access the variables the same way.
# to ensure alignment, the result is merged with the timescale
<- merge(stages, cr50, by="stg")
cr50Merged
# and actual plot
tp(ylim=c(0,100), ylab="Subsampled richness")
lines(cr50Merged$mid, cr50Merged$divRT, lwd=2)
That is a different overall picture. Note the much higher peak in the Cretaceous then before.You can also note the low variation of the series. This is an artifact of using classical rarefaction on the data: as the quota drops the curces get flatter and flatter. This is the primary reason why Shareholder Quorum Subsamping was developed (Alroy, 2010).
Turnover rates
We can get more information about what guided the taxon’s history through time by looking at the temporal trajectories of turnover. The main divDyn()
function output includes turnover these rates. You can plot the per capita extinction and origination rates of Foote (Foote 1999) with the following lines of code:
# turnover rates
tp(ylim=c(0,1), ylab="per capita turnover rates")
lines(ddMerged$mid, ddMerged$extPC, lwd=2, col="red")
lines(ddMerged$mid, ddMerged$oriPC, lwd=2, col="green")
legend("topright", legend=c("extinction", "origination"),
col=c("red", "green"), lwd=2)
Note huge peaks at the ends of the series (origination rates peak at the beginning and extinction rates at the recent end). These are edge effects that are force due to the dataset structure. You can note the effects of the end-Triassic, Pliensbachian-Toarcian and the end-Cretaceous crises on the rates. Subsampling typically has a much lower effect on turnover rates, but it is helpful to validate patterns that you see with the raw data. Although this has has to be assessed in more detail J. Alroy recommended to use Classical Rarefaction for the estimation of turnover rates.
Symbiotic status’ effect on diversity dynamics
Story gets more interesting when the difference between different symbiotic types are considered. Zooxanthellate (Z) and Azooxanthellate (AZ) corals are about equally diverse in today oceans. The two tend to differ in morphology (Z are usually colonials, while most AZ corals are solitary) and typically occupy different environments (Z corals prefer shallower reef settings, whereas AZ corals tend to occurr in deeper waters). How the distinction emerged and what the reasons for this could be is the topic of one our papers in Paleobiology (Kiessling and Kocsis 2015), where these snippets of code come from.
Determining the symbotic status (whether or not they host photosymbionts) of extant corals is straightforward, but not so for fossil ones. Although the photosymbionts do not fossilize, some morphological traits and geogchemical properties correlate with the corals’ ability to host symbionts. W. Kiessling compiled a list of most likely photosymiotic status for coral genera occurring in the Paleobiology Database. This list is available from the chronosphere
and can be loaded directly with this line of code.
<- fetch(src="SOM-kiessling-coralgenera", ver="0.54") sym
------------------------------------------------------------
Accessing chronosphere registry tables.
------------------------------------------------------------
Downloading registry tables.
------------------------------------------------------------
Item no. 678, src:SOM-kiessling-coralgenera, ser: list, ver: 0.54, res: NA.
------------------------------------------------------------
------------------------------------------------------------
Downloading import code.
------------------------------------------------------------
------------------------------------------------------------
Downloading data file.
------------------------------------------------------------
MD5 checksum passed.
If you use the data in publications, please cite its
reference(s), as well as that of the 'chronosphere' project.
Kiessling, W., and Kocsis, Á. T. (2015). Biodiversity dynamics and environmental occupancy of fossil azooxanthellate and zooxanthellate scleractinian corals. Paleobiology, 41(3), 402–414.
Kiessling, W., and Kocsis, A. T. (2020). List of scleractinian genera in the Paleobiology Database and their inferred symbiotic mode (Version 0.54). Zenodo. https://doi.org/10.5281/zenodo.8054526
This table also contains information about corallite integrity, coloniality, and whether the taxon is extant or extinct.
Every row in this table represent a coral genus in the PBDB, which is designated with in the column genus.subgenus
. These entries were assigned to valid genus names (genus.proper
) by Wolfgang Kiessling. It is very easy to match the two tables with the merge()
function. Stating all.x=TRUE
forces the keeping of all rows in the first argument, which is in our case the PaleoDB. This is handy, as we’d like to assess how many fossil genera do get phototsymbiotic tags.
# merge the two tables
<- merge(dats, sym, by.x="genus", by.y="genus.subgenus", all.x=TRUE)
dats
# which are not merged?
# occurrences
sum(is.na(dats$ECOLOGY))/nrow(dats)
[1] 0.04508261
# genera
<-unique(dats[, c("genus", "ECOLOGY")])
genTabsum(is.na(genTab$ECOLOGY))/nrow(genTab)
[1] 0.0621118
The table also contains useful information about whether the taxa are extant or extinct. As there are some
You can calculate the diversity dynamics of these groups separately by subsetting the occurrence data to two subsets and then running the divDyn()
function on them separately on them. As their ecology tend to be similar, the Azoxanthellate and Apozooxanthellate (non- and facultatively symbiotic) corals are analyzed together.
# subset
<- dats[dats$ECOLOGY=="z", ]
zdat <- dats[dats$ECOLOGY=="az" | dats$ECOLOGY=="ap", ]
azdat
# divDyn
<- divDyn(zdat, tax="genus", bin="stg")
zDD <- divDyn(azdat, tax="genus", bin="stg")
azDD
# merge with timescale for plotting
<- merge(stages, zDD, by="stg")
zDDMerged <- merge(stages, azDD, by="stg") azDDMerged
The difference in the diversity trajectories of the two group is striking, even with the raw patterns. You are invited to check the difference with subsampling, but as the preservation potential is similar, it is likely that subsampling won’t affect relative numbers.
tp(ylim=c(0,200), ylab="Raw richness")
lines(zDDMerged$mid, zDDMerged$divRT, lwd=2, col="blue")
lines(azDDMerged$mid, azDDMerged$divRT, lwd=2, col="orange")
# legend function
<- function(...)
legZAZ legend(legend=c("z", "az"), col=c("blue", "orange"),
lty=c(1,1), bg="white", cex=1.3, ...)
legZAZ("topleft")
But what about turnover? You can contrast the extinction rates of the two groups with the already calculated per-capita rates.
# plot
tp(ylim=c(0,1), ylab="Per capita extinctions")
# lines
lines(zDDMerged$mid, zDDMerged$extPC, lwd=2, col="blue")
lines(azDDMerged$mid, azDDMerged$extPC, lwd=2, col="orange")
# legend
legZAZ("topleft")
Overall, the two series are quite similar. Z corals appear to be somewhat more affected at the end-Cretaceous. How about originations?
# plot
tp(ylim=c(0,1), ylab="Per capita originations")
# lines
lines(zDDMerged$mid, zDDMerged$oriPC, lwd=2, col="blue")
lines(azDDMerged$mid, azDDMerged$oriPC, lwd=2, col="orange")
# legend
legZAZ("topleft")
But how do we know that differences are meaningful? The difference in the rates of the two group can come from two sources: random estimation error, or group membership. Estimation error will depend on the number of sampled taxa. One way to figure this out is to use the priciniples of model selection: we can compare the likelihood support of two models: whether the observed difference comes from the estimation error of a single model or there are actually two separate models that better describe the observed patterns. This principle is implemented in the fucntion ratesplit()
that compares these two models in a single call. For this, the ECOLOGY variable has to be cleaned:
$ecol <- "z"
zdat$ecol <- "az" azdat
Then the function can be called using the new variable:
<-ratesplit(rbind(zdat, azdat), sel="ecol", tax="genus", bin="stg") rs
This function compares the per capita rates that we have been working with, and the output shows the bin numbers where difference is meaningful.
rs
$ext
[1] 66 94
$ori
[1] 75 81
You can visualize these with dots. The results for origination rates indicate that azoxantellate corals showed bursts of evolution in some intervals.
tp(ylim=c(0,1), ylab="Per capita originations")
lines(zDDMerged$mid, zDDMerged$oriPC, lwd=2, col="blue")
lines(azDDMerged$mid, azDDMerged$oriPC, lwd=2, col="orange")
legZAZ("topleft")
# display selectivity with points
# select the higher rates
<-cbind(zDDMerged$oriPC[rs$ori], azDDMerged$oriPC[rs$ori])
selIntervals<-apply(selIntervals, 1, function(x) x[1]<x[2])
groupSelector
# draw the points
points(azDDMerged$mid[rs$ori[groupSelector]], azDDMerged$oriPC[rs$ori[groupSelector]],
pch=16, col="orange", cex=2)
points(zDDMerged$mid[rs$ori[!groupSelector]], zDDMerged$oriPC[rs$ori[!groupSelector]],
pch=16, col="blue", cex=2)
Extinction rate results are:
tp(ylim=c(0,1), ylab="Per capita extinctions")
lines(zDDMerged$mid, zDDMerged$extPC, lwd=2, col="blue")
lines(azDDMerged$mid, azDDMerged$extPC, lwd=2, col="orange")
legZAZ("topleft")
# display selectivity with points
# select the higher rates
<-cbind(zDDMerged$extPC[rs$ext], azDDMerged$extPC[rs$ext])
selIntervals<-apply(selIntervals, 1, function(x) x[1]<x[2])
groupSelector
# draw the points
points(azDDMerged$mid[rs$ext[groupSelector]], azDDMerged$extPC[rs$ext[groupSelector]],
pch=16, col="orange", cex=2)
points(zDDMerged$mid[rs$ext[!groupSelector]], zDDMerged$extPC[rs$ext[!groupSelector]],
pch=16, col="blue", cex=2)
Note that despite the apparently higher extinction rates of Z corals, the difference in the end-Createcous mass extinction is not supported by the method. The end-Pleistocene value is likely a result of the data treatmen around the recent, but the observed difference in the late Jurassic was not significant in our prevoius study (Kiessling and Kocsis 2015). This points to the importance of using clean data, and importance of the quick reproduction/reanalysis that the divDyn package now provides.