← Back to syllabus

W1_L02

Phylogenetics 101 — Part A

View slidesPractical Notes

phylogenetics 101 - trees

Phylogenetic trees can be represented in multiple formats, but nonetheless they all share several interesting implications.

If N = n. species, there are:

  • N terminal branches
  • 2N-3 total branches
  • N-3 internal branches
  • N-2 internal nodes

Regarding the number of trees which can possibly describe the relationships between a given number of terminal nodes (e.g. species), I will just say that is a number which goes up quite quickly. This even more true for trees with a root, as for each unrooted there are 2N-3 times as many rooted trees. Here are some numbers:

LeavesUnrooted treesRooted trees
313
515105
794510,395
9135,1352,027,025
102,027,02534,459,425

The Newick format:

The Newick format was developed in the 1980s and is often associated with the Newick Conference held at the Newick restaurant in Dover, New Hampshire. Its simplicity made it quickly popular in the phylogenetics community, making it a de facto_ standard for exchanging tree data. The Newick is a combinations of parentheses, punctuation, numbers & letters. Lets take a look at some possibilities the .nwk offers:

(,,(,));                         	      no nodes are named

(A,B,(C,D));                     	      leaf nodes are named

(A,B,(C,D)E)F;                     	      all nodes are named

(:0.1,:0.2,(:0.3,:0.4):0.5);    	      all but root node have a distance to parent

(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;     	  all have a distance to parent

(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);    	  distances and leaf names (popular)

(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F; 	      distances and all names

((B:0.2,(C:0.3,D:0.4)E:0.5)A:0.1)F; 	  a tree rooted on a leaf node (rare)

Last but not least, remember that .nwk trees are not unique representations, and that relationships between terminals can be written in several different ways. For example, the following three strings all describe the exact same tree topology:

(A,B,(C,D));
(B,A,(C,D));
((C,D),A,B);
(A,B,(D,C));

Common pitfalls:

  • Always end your Newick string with a semicolon ;.
  • Unbalanced parentheses are one of the most frequent sources of problems - opening and closing parentheses must match.
  • Either include branch lengths for all nodes or omit them entirely.
  • Special characters (spaces, colons, commas, parentheses) inside taxon names must be quoted, for example: 'Homo sapiens'.
  • Whitespace (spaces, tabs, newlines) is generally ignored and you may encounter trees formatted across multiple lines.

The Nexus format:

Originally developed in the early 1990s by researchers including Wayne Maddison and David Maddison, Nexus was created to address the need for a flexible format capable of holding various types of biological data. It can store complex datasets and the instructions needed for different analytical software. This makes it especially useful for studies that integrate multiple data types, such as molecular sequences and morphological traits. Lets take a look at its structure and syntax.

Every Nexus file begins with a header line that indicates that the file follows the Nexus conventions:

#NEXUS

Nexus files are organized into one or more "blocks." Each block is a section that groups related data or commands. A block starts with the keyword BEGIN followed by the block name, and ends with END;. The TAXA Block contains definitions of taxa (e.g., species or sequences).

  DIMENSIONS NTAX=4;
  TAXLABELS A B C D;
END;

CHARACTERS (or DATA) Block stores character matrices or sequence data. This block can include details about the number of characters, formatting, and missing data symbols.

BEGIN CHARACTERS;
  DIMENSIONS NCHAR=100;
  FORMAT DATATYPE=DNA MISSING=? GAP=-;
  MATRIX
    A ATGCT...,
    B ATG-T...
  ;
END;

TREES Block contains one or more phylogenetic trees - in Newick format! Trees can be annotated with branch lengths, support values, or other metadata.

BEGIN TREES;
  TREE tree1 = ((A,B),(C,D));
END;

We can visualize .nwk and .nexus files either online or using software as FigTree. Take a look at some examples here and discuss them.


main