← Back to syllabus

W4_L10

Support metrics

Practical Notes

support metrics

These are the branch support metrics which we are going to explore:

  • parametric & nonparametric Bootstrap Proportions (BPs) + trasfer bootstrap expectations (TBE)
  • parametric & nonparametric jacknife
  • SH-like approximate Likelihood Ratio Test (SHaLRT)
  • genes & sites Concordance Factors (gCFs & sCFs)
  • Posterior Probabilities (PP)

We can start by inferring a tree specifying a model to skip the model selection step:

iqtree -s data/example_alignements/nt/Rhabdomeric1.nt.aln -m GTR --prefix data/example_alignements/nt/Rhabdomeric1.nt

Among the many files generated, there is also a tree: data/example_alignements/nt/Rhabdomeric1.nt.nwk.treefile.


Nonparametric bootstrap:

The standard nonparametric bootstrap is invoked by the -b option, which also specifies the number of bootstrap replicates - 100 is the minimum recommended number. We can use the --tree-fix to fix the toplogy and the -t flags to specify the tree we just inferred:

iqtree -s data/example_alignements/nt/Rhabdomeric1.nt.aln -m GTR -b 100 --prefix data/example_alignements/nt/nonparamet_bootstrap  -t data/example_alignements/nt/Rhabdomeric1.nt.treefile --tree-fix

Then explore the result in the .iqtree file.


Parametric bootstrap:

iqtree implements Ultra Fast Bootstrap 2 (UFB2; described in Hoang et al. 2018). The -B flag specifies the number of replicates - 1000 is the minimum number recommended. Perform the parametric bootstrap support using:

iqtree -s data/example_alignements/nt/Rhabdomeric1.nt.aln -m GTR -B 1000 --prefix data/example_alignements/nt/parametric_bootstrap

NB: due to the functioning of the parametric bootstrap a tree inference has to be performed and BPs can not be calculated on a preexisting tree.


Question(s):

Take a look at the parametric_bootstrap.splits.nex file ... does it remember anything to you?!

By default, sites within partitions are resampled. However, it is also possible to resample whole partitions (Nei et al. 2001). This can be done with -bsam GENE. Moreover, it is also possible to resample partitions and then sites within resampled partitions (Gadagkar et al. 2005). This can be done with -bsam GENESITE. These approaches may help to reduce false positives (i.e. wrong branch receiving 100% support).

How to implement the TBE? Read the help page in the appropriate section! Is the resul different with and withou the use of TBE?! Think in which kind of dataset you expect TBE to result in a strong difference.


Jacknife:

The non-replacement version of the boostrap can also be performed in a parametric and non-parametric fashion using the -j and -J flags:

iqtree -s data/example_alignements/nt/Rhabdomeric1.nt.aln -m GTR -j 100 --prefix data/example_alignements/nt/nonparamet_jacknife

and

iqtree -s data/example_alignements/nt/Rhabdomeric1.nt.aln -m GTR -J 1000 --prefix data/example_alignements/nt/parametric_jacknife

Other than the number of replicate, the subsampling proportion for jackknife cna be specified using the flag --jack-prop.


SH-like approximate likelihood ratio test:

Among single branch tests, iqtree implements a non-parametric approximate likelihood ratio test based on a Shimodaira-Hasegawa-like procedure via the flag -alrt:

iqtree -s data/example_alignements/nt/Rhabdomeric1.nt.aln -m GTR -alrt 1000 --prefix data/example_alignements/nt/alrt

You can also perform both SH-aLRT and the ultrafast bootstrap on the "best" Maximum Likelilhood phylogeny. It's then easier to observe wether the different support metrics are giving contrasting results through our phylogenies.

iqtree -s data/example_alignements/nt/Rhabdomeric1.nt.aln -m GTR -B 1000 -alrt 1000 --prefix data/example_alignements/nt/cobined_supports

Let's take a look at the .iqtree among the other output files, bearing in mind that the values related to different metrics should be treated differently. With the non-parametric bootstrap and SH-aLRT you should start to believe in a clade if it has >= 80% support, while with the parametric bootstrap it should be >= 95%.

Question(s):

SHaLRT is not the only approach to test the support of single branches! Explore the --abayes option!


Concordance Factors

CFs are two measures for quantifying genealogical concordance in phylogenomic datasets (Minh et al. 2020). These are gene concordance factor (gCF) and the site concordance factor (sCF). For every branch of a reference tree:

  • gCF is defined as the percentage of “decisive” gene trees containing that branch.

  • sCF is defined as the percentage of "decisive" alignment sites supporting a branch in the reference tree.

Let's start by inferring a species tree:

iqtree2 -p data/example_alignements/aa --prefix concat

Then we can infer a tree for each locus:

iqtree2 -S data/example_alignements/aa --prefix loci

Now let's perform CFs calculation with the following command:

iqtree -p data/example_alignements/aa -t concat.treefile --gcf loci.treefile --scf 100 --prefix concord

and take a look at the resulting nodal support values on figtree.

Question(s):

How do the two metrics compare to each other? And what about parametric and non-parametri BPs?!


Posterior Probabilities

PPs are calculated as the fraction of sampled trees that include that clade, for each node. This was done using either bpcomp command of phylobayes or treeannotator, but it can be done using iqtree as well:

iqtree -con mytrees -minsup 0.5 -bi 100

Here, -minsup sets the minimum support threshold to actually include a clade (if the criteria is not met then a politomy will appear) and -bi specifies a burn-in (the exact number of initial trees to ignore ).


further reading:

A couple of nice papers on how the nodal support can be overestimated by BP and PP.

Resources on concordance factors: paper & tutorial.

Here you'll find the manual for IQ-TREE, it's quite user-friendly and exhaustive. Many tutorials are based on those described here.

A cool paper where I found many definition of support metrics: Guindon et al. 2009.


main