← Back to syllabus

W8_L16

Modelling trait evolution on phylogenies

Practical Notes

modelling trait evolution on phylogenies

Before starting this practical set the appropriate working directory and load the necessary libraries in R:

Libraries:

setwd("~/Desktop/MP25/data/example_PCM/")
install.packages('ape')
install.packages('phytools')
install.packages('geiger')
library(ape)
library(phytools)
library(geiger)

As always in phylogenetic comparative methods, two fundamental components are required:

  • a phylogeny - but ultrametric tree is much better
  • either discrete or continuous trait(s)

Here we will leverage a Octopodiformes phylogeny and a discrete classification of the maximum depth at which they have been observed:

  • P - pelagic
  • E - euphotic, up to 100 meters in depth
  • D1 - disphotic 1, up to 400 meters in depth
  • D2 - disphotic 2, up to 400 meters in depth
  • N - no-photic, more than 800 meters in depth

Let's start by importing the tree and the data:

tree <- read.tree("tree_octopodiformes.nwk")
tree <- force.ultrametric(tree,"extend")
data <- read.table("traits_octopodiformes.csv", sep = ",", row.names = 1, header = T)
data <- setNames(as.factor(data[,"discrete"]),rownames(data))

Then, as seen in other instances, we have to find an model appropriate for our data. Also here, the central component is a rates matrix, not dissimilar from the Q matrix we have seen for amino acids / nucleotides models. Let's fit three models to the data, which are:

  • a timetree
  • character states

The three models are:

  • ER - Equal Rates
  • SYM - SYMmetrical
  • ARD - All Rates Different
ER <- fitDiscrete(tree, data, model="ER")
SYM <- fitDiscrete(tree,data,model="SYM")
ARD <- fitDiscrete(tree,data,model="ARD")

Here is a graphical visualisation of the models.

alt text alt text alt text

By recalling the object you can also see the Q matrix! WHich model do you think this one is?

                  D1            D2             E             N             P
    D1 -2.195643e+00  1.247291e+00  6.094590e-01  3.388934e-01  1.600471e-20
    D2  1.310616e+00 -1.385760e+00  5.926930e-19  3.286915e-24  7.514486e-02
    E   2.149612e-01  1.223017e-40 -2.149612e-01  6.065468e-19  7.414863e-18
    N   1.205376e-24  2.497221e-01  3.362381e-16 -2.497221e-01  5.080608e-26
    P   4.972750e-62  7.598279e-16  1.330707e-29  2.225052e-59 -7.598279e-16

Then we will asses how good their fit is, using the Akaike weights - the smaller the better!

aicc<-setNames(c(ER$opt$aicc,
                 ARD$opt$aicc,
                 SYM$opt$aicc),
               c("Equal Rates",
                 "All Rates Different",
                 "Symmetrical"))
aicw(aicc)

We can see that ARD is the better fit!

                         fit     delta            w
Equal Rates         123.9865  0.000000 9.893373e-01
All Rates Different 180.3405 56.354017 5.730930e-13
Symmetrical         133.0470  9.060562 1.066273e-02

Then we can use the best-fit model to perform the ancestral state reconstruction:

fitARD<-ace(data,tree,model="ARD",type="discrete")

Then we need to specify a color for each state:

levels(data)
cols <- c("red","purple","orange","blue","yellow")

trait_cols <- setNames(c(
                 levels(data)[1],
                 levels(data)[2],
                 levels(data)[3],
                 levels(data)[4],
                 levels(data)[5]),
               c("red",
                 "purple",
                 "orange",
                 "blue",
                 "yellow"))

trait_cols

And then plot the ASR on the phylo:

plot.phylo(tree,lwd=7.5, edge.width = 7.5, label.offset = .2,show.tip.label = T,no.margin = TRUE, 
           edge.color = "lightgray")
nodelabels(node=1:tree$Nnode+Ntip(tree),pie=fitARD$lik.anc,cex=0.42, piecol = cols)
tiplabels(pie=to.matrix(data[tree$tip.label],levels(data)),cex=0.42, piecol = cols)

alt text

So ... do Octopodiformes come from the photic zone ... or they originated deep down in the sea where the light is extremely scarce?


modelling trait evolution on phylogenies: continuous traits

Before starting this practical set the appropriate working directory and load the necessary libraries in R:

Libraries:

setwd("~/Desktop/MP25/data/example_PCM/")
install.packages('ape')
install.packages('phytools')
install.packages('geiger')
library(ape)
library(phytools)
library(geiger)

a proof of concept on simulated data:

Let's simulate a tree! Everybody should change a bit these numbers:

  • the random seed (set to 42)
  • the number of tips (set to 100)
  • the birth rate (set to 1.0)
  • the death rate (set to 0.6)
  • include only extant - and not extinct - species (set to TRUE)
set.seed(42)
tree<-NULL
while(is.null(tree)) 
  tree<-pbtree(n=100,b=1,d=0.6,extant.only=T)
plotTree(tree,ftype="off")

Then we can simulate uncorrelated (Brownian) evolution:

x <- fastBM(tree)
y <- fastBM(tree)

and then plot the resulting trees!

par(mfrow = c(1,2))
obj <- contMap(tree,x,plot=T, ftype="off", lwd=5, outline = FALSE)
obj <- contMap(tree,y,plot=T, ftype="off", lwd=5, outline = FALSE)

alt text

Now, I want you to perform a linear regression on the two traits:

  • a first time completely disregarding the phylogeny:
fit <- lm(x~y)
summary(fit)
  • a second time adjusting the data based on the phylogeny:
x_PIC<-pic(x,tree)
y_PIC<-pic(y,tree)
fit.pic<-lm(x_PIC~y_PIC)
summary(fit.pic)

Question!


main