← Back to syllabus

W2_L06

Distance-based vs character-based algorithms

Practical Notes

distance-based versus charcter-based algorythms


intro:

Before starting this practical set the appropriate working directory and load the necessary libraries in R:

Libraries:

setwd("~/.../RPHY/")
library(Biostrings) 
library(msa)
library(ape)
library(seqinr)
library(phangorn)
library(phytools)

Phylogeny inference methods fall into two categories:

  • distance based
  • character based

Distance methods involve calculating a genetic distance between every pair of species (based on comparison of their aligned sequences) and using the resulting distance matrix iteratively to construct a tree. Character-based methods include maximum parsimony, maximum likelihood (ML) and Bayesian inference (BI)


distance-based phylogenies:

Let's start by reading the nucleotide sequences and aligning them as we learned during the last lesson:

sequences <- readDNAStringSet("sequences/COX1.nt.fa")
alignment <- as.phyDat(msa(sequences, "Muscle"))

Then we proceed to calculate a distance matrix based on p distances. This distance is the proportion (p) of nucleotide sites at which two sequences being compared are different. It is obtained by dividing the number of nucleotide differences by the total number of nucleotides compared. It does not make any correction for multiple substitutions at the same site, substitution rate biases (for example, differences in the transitional and transversional rates), or differences in evolutionary rates among sites.

dm  <- dist.p(alignment)
dm

After constructing the distance matrix, we infer:

treeUPGMA  <- upgma(dm)
plot(treeUPGMA)
treeNJ  <- nj(dm)
plot(treeNJ)

parsimony:

To infer phylogenies with MP (maximum parsimony) we will use the parsimony ratchet (Nixon 1999). This allows to escape local optima and find better trees than only performing NNI (Nearest neighbor interchange) / SPR (Subtree Pruning and Rearrangement) rearrangements.

treePARSIMONY  <- pratchet(alignment,
                            maxit=200, minit=50, 
                            k=50, all = T)

You can still use other tree topology search strategy like this:

treePARSIMONY  <- optim.parsimony(treePARSIMONY, alignment, rearrangements = "NNI")

Then we can calculate parsimony scores:

fitch(treeUPGMA, alignment)
fitch(treeNJ, alignment)
fitch(treePARSIMONY, alignment)