← Back to syllabus

W7_L14

Divergence times analyses

Practical Notes

divergence time analyses

Maximum Likelihood is a good alternative to the computational burden of Bayesian Inference in the era of phylogenomics and, as almost everything, we can carry on this analyses again with IQ-TREE! It implements the least square dating (LSD2) method to build a time tree when you have date information for tips or ancestral nodes. See here for the original paper.

  • node dating is used when you want toimplement informations from:

    • fossils specimens
    • biogeografical events
  • tips dating is commonly used in two different scenarios:

    • when analyzing viruses
    • when your have a tree encompassing extant and extinct taxa.

node dating

As you may remember we already inferrred a small Mollusck phylogeny. How can we chose which nodes to calibrates and how to calibrate them? A straightforward way is to just to use the divergence times allready inferred for certain nodes in the litterature. Here we will take Kocot et al. 2020 as a reference. Let'shave a look at some of their divergence time estimations:

NodeDivergence time (Mya)
Mollusca545.449
Gastropoda424.426

So now let's start our dating. First of all, we have to compile a txt files with informations about nodes that we want to constrain (called date file). It should be something like this:

taxon3,taxon4 -100
taxon1,taxon2,taxon5 -50

For example, here the most recent common ancestor (MRCA) of taxon1, taxon2, and taxon5 existed 50 mya (million year ago) and the MRCA of taxon3 and taxon4 existed 100 mya. Note that no empty space should be added to the comma-separated list of taxa, as empty space is used as a separator between taxon list and dates.

After that we need to run IQ-TREE specifing our precomputed ML tree, the partitioning scheme and the date file. Moreover, when dealing with divergence time estimations, it's always better to also specify the outgroup. Indeed time trees, contrary to ML trees, are rooted by definition!

iqtree -s data/example_dating/dating_nodes/mollusks.aa.aln -spp data/example_dating/dating_nodes/partitions.txt --date data/example_dating/dating_nodes/calibrations.txt --date-tip 0 -o "H_robusta" -te data/example_dating/dating_nodes/mollusks.treefile --prefix data/example_dating/dating_nodes/node_dating --date-ci 100

The main flags here are:

  • --date which specifies the file with the calibrations
  • --date-tip if 0 specifies that you have only tips sampled at present
  • -o specifies the outgroup
  • -te specifies a precomputed phylogeny
  • --date-ci will resample branch lengths a certain amount of times to infer the confidence intervals.

After having run a command, you should habe a timetree.nexus file. The nexus format is commonly used in divergence time estimation since it is able to store much more information than newick - e.g. confidence intervals.

Now open the nexus file with figtree and take a look at the tree - remember to display also the node ages...

Discuss!

Why some nodes do not have a Confidence Interval? Why there are some very short branches? Is it a signature of the fast radiation of Mollusks at the begginning of the Cambrian era?


tips dating

This is a common approach e.g. in virus datasets, where you have sampling time but no information on ancestral nodes. Here we will use a toy dataset consisting of some Dengue (viral) samples.

As for node dating it is not required to have dates for all tips and the date information present in the date file might be frammentary and uncertain.

D4PRico 2019
D4SLanka    2022-09-09
D4Philip    2019-02-01:2019-12-31
D4ElSal1    2019-12-31:NA
D4Tahiti1    2019
D4Mexico    2024-10
D4Thai3  NA:2019-01-06
D4Indon1 2007

For the only sample we have from Puerto rico we just know was sampled in 2019, while for the sample in Sirilanka we now exactly the day it was sampled, and the sample from the Philippines was sampled between 1st Feb 2019 and 31st Dec 2019. For some data range you can use “NA” to mean that the lower or upper bound is missing.

Now we can execute our divergence time analysis, including also the inference of a tree, just by not specifying any topology:

iqtree -s data/example_dating/dating_tips/dengue.nt.aln --date data/example_dating/dating_tips/calibrations.txt -te data/example_dating/dating_tips/dengue.treefile -redo

Discuss!

What can you understand from this tree? From where did this strain of dengue start to spread?


main