Comparative Genomics

Data available

Gene trees are constructed using the longest protein for every gene in Ensembl: proteins are clustered using hcluster_sg based on wublastp scores, and each cluster of proteins is aligned using MCoffee. Finally, TreeBeST is used to produce a gene tree from each multiple alignment, reconciling it with the species tree to call duplication events. Homologues are deduced from these trees. More information→
Families are constructed by MCL clustering of all Ensembl proteins, i.e. not only the longest protein. Metazoan proteins from UniProtKB SwissProt and SPTREMBL are added to extend the protein set. More information→
Whole genome alignments are performed either pairwise between two species using BlastZ-net or translated Blat analysis, or using multiple species. More information →
Ancestral sequences are calculated from multi-species whole genome alignments. More information→
Conservation scores and constrained elements are calculated from the whole genome multiple alignments. More information→
Syntenies are calculated from the pairwise alignments. More information→

Access

Data can be accessed using the Compara Perl API, BioMart, or comparative genomics pages on the browser. Gene trees can be viewed from any 'Gene' page on the browser, and exported via the control panel and the Jalview plug-in in the pop-ups that appear when clicking on any part of the tree.

The external Java-based tool PhyloWidget can also be used to visualise phylogenetic trees of compara species. An example which includes all the current species for the main Ensembl website, plus a few additional mammalian species of interest, has been created by the Compara team:

Ensembl species tree (requires Java)