Archive Ensembl HomeArchive Ensembl Home
Home > Help & Documentation

Genomic alignments

BlastZ-net Pairwise Alignment Analysis

BlastZ-net (Schwartz S et al., Genome Res.;13(1):103-7, Kent WJ et al., Proc Natl Acad Sci U S A., 2003;100(20):11484-9) alignments are provided for closely related pairs of species. The alignments are the results of post-processing the raw BlastZ results. In the first step, original blocks are chained according to their location in both genomes. The netting process chooses for the reference species the best sub-chain in each region. The reference species in the BlastZ-net alignments is in bold:

Human (Homo sapiens)

Alpaca (Vicugna pacos)
Armadillo (Dasypus novemcinctus)
Bushbaby (Otolemur garnettii)
Cat (Felis catus)
Chicken (Gallus gallus)
Chimpanzee (Pan troglodytes)
Cow (Bos taurus)
Dog (Canis familiaris)
Dolphin (Tursiops truncatus)
Elephant (Loxodonta africana)
Gorilla (Gorilla gorilla)
Guinea Pig (Cavia porcellus)
Hedgehog (Erinaceus europaeus)
Horse (Equus caballus)
Hyrax (Procavia capensis)
Kangaroo rat (Dipodomys ordii)
Lesser hedgehog tenrec (Echinops telfairi)
Macaque (Macaca mulatta)
Megabat (Pteropus vampyrus)
Microbat (Myotis lucifugus)
Mouse (Mus musculus)
Mouse Lemur (Microcebus murinus)
Opossum (Monodelphis domestica)
Orangutan (Pongo pygmaeus)
Pika (Ochotona princeps)
Platypus (Ornithorhynchus anatinus)
Rabbit (Oryctolagus cuniculus)
Rat (Rattus norvegicus)
Shrew (Sorex araneus)
Squirrel (Spermophilus tridecemlineatus)
Tarsier (Tarsius syrichta)
Tree Shrew (Tupaia belangeri)

Mouse (Mus musculus)

Dog (Canis familiaris)
Human (Homo sapiens)
Platypus (Ornithorhynchus anatinus)
Rat (Rattus norvegicus)

Medaka (Oryzias latipes)

Stickleback (Gasterosteus aculeatus)

C.intestinalis (Ciona intestinalis)

C.savignyi (Ciona savignyi)

Translated Blat Pairwise Alignment Analysis

Translated blat (Kent W, Genome Res., 2002;12(4):656-64) is used to look for homologous regions between more distantly related pairs of species. We expect to find homologies mainly in coding regions. There are 2 sets of translated blat analyses: a new set where the raw results were passed through a chain and netting procedure similar to that used for the BlastZ-net analyses to produce the best sub-chain for the reference species (Translated Blat Net); and a few species that have not yet been reanalysed (Translated Blat)

Translated Blat Net

Homo sapiens H.sap
Mus musculus - M.mus
Rattus norvegicus - - R.nor
Gallus gallus YES YES - G.gal
Xenopus tropicalis YES YES - YES X.tro
Tetraodon nigroviridis YES YES YES - YES T.nig
Takifugu rubripes YES YES - - - - T.rub
Oryzias latipes YES - - - - - - O.lat
Gasterosteus aculeatus YES - - - - - - - G.acu
Danio rerio YES YES YES YES YES YES YES YES YES D.rer
Ciona savignyi YES - - YES - - - - - YES C.sav
Ciona intestinalis YES - - YES - - - - - YES - C.int
Taeniopygia guttata YES - - YES - - - - - - - - T.gut
Anolis carolinensis YES - - YES - - - - - - - - - A.car
H.sap M.mus R.nor G.gal X.tro T.nig T.rub O.lat G.acu D.rer C.sav C.int

Translated Blat

Anopheles gambiae A.gam
Aedes aegypti YES A.aeg
Drosophila melanogaster YES YES D.mel
A.gam A.aeg D.mel

PECAN Multiple Alignment Analysis

Pecan is used to provide global multiple genomic alignments. First, Mercator is used to build a synteny map between the genomes and then Pecan builds alignments in these syntenic regions.

Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Read more about Pecan.

12 amniota vertebrates Pecan

Human (Homo sapiens)
Chimpanzee (Pan troglodytes)
Orangutan (Pongo pygmaeus)
Macaque (Macaca mulatta)
Mouse (Mus musculus)
Rat (Rattus norvegicus)
Dog (Canis familiaris)
Cow (Bos taurus)
Horse (Equus caballus)
Opossum (Monodelphis domestica)
Platypus (Ornithorhynchus anatinus)
Chicken (Gallus gallus)

EPO Multiple Alignment Analysis

The new EPO (Enredo, Pecan, Ortheus) pipeline is a three steps pipeline for whole-genome multiple alignments. Enredo produces colinear segments from extant genomes handling both rearrangements, deletions and duplications. Pecan, as described above, is used to align these segments. Finally, Ortheus is used to create genome-wide ancestoral sequence reconstructions. Further details on these methods can be found at:

The 4-way catarrhini-specific alignments and 9-way eutherian mammal alignments were generated using the new EPO (Enredo Pecan Ortheus) pipeline. The 9-way alignment contains all the high-coverage mammalian genomes.

The 31-way eutherian mammmal alignments were not generated using the EPO pipeline due to difficulties with running Ortheus on the low coverage genomes. Instead the 22 low coverage genomes (including guinea pig) were projected on to the 9-way eutherian mammal alignments using the BlastZ-net alignments.

4 catarrhini primates EPO

Human (Homo sapiens)
Chimpanzee (Pan troglodytes)
Orangutan (Pongo pygmaeus)
Macaque (Macaca mulatta)

9 eutherian mammals EPO

Human (Homo sapiens)
Chimpanzee (Pan troglodytes)
Orangutan (Pongo pygmaeus)
Macaque (Macaca mulatta)
Mouse (Mus musculus)
Rat (Rattus norvegicus)
Dog (Canis familiaris)
Cow (Bos taurus)
Horse (Equus caballus)

31 eutherian mammals EPO+low coverage projection

Human (Homo sapiens)
Chimpanzee (Pan troglodytes)
Gorilla (Gorilla gorilla)
Orangutan (Pongo pygmaeus)
Macaque (Macaca mulatta)
Tarsier (Tarsius syrichta)
Mouse Lemur (Microcebus murinus)
Bushbaby (Otolemur garnettii)
Mouse (Mus musculus)
Rat (Rattus norvegicus)
Squirrel (Spermophilus tridecemlineatus)
Kangaroo rat (Dipodomys ordii)
Guinea Pig (Cavia porcellus)
Rabbit (Oryctolagus cuniculus)
Pika (Ochotona princeps)
Tree Shrew (Tupaia belangeri)
Dog (Canis familiaris)
Cat (Felis catus)
Cow (Bos taurus)
Alpaca (Vicugna pacos)
Dolphin (Tursiops truncatus)
Microbat (Myotis lucifugus)
Megabat (Pteropus vampyrus)
Horse (Equus caballus)
Hedgehog (Erinaceus europaeus)
Shrew (Sorex araneus)
Hyrax (Procavia capensis)
Elephant (Loxodonta africana)
Lesser hedgehog tenrec (Echinops telfairi)
Armadillo (Dasypus novemcinctus)
Sloth (Choloepus hoffmanni)

Ancestral sequences are inferred from the EPO multiple alignments using Ortheus. Ortheus is a probabilistic method for the inference of ancestor, a.k.a tree, alignments. The main contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion and deletion events. Ancestral sequences are predicted for each node of the phylogenetic tree that relates the sequences. Each ancestral sequence is named according to the derived extant species. For example, a sequence named Hsap, Ptro, Mmul corresponds to the ancestor of the Homo sapiens, Pan troglodytes, and Macaca mulatta genomes.

Conservation Analysis

Additionally we use Gerp (Cooper GM et al., Genome Res., 2005; 15:901-913) to calculate conservation scores and call constrained elements on the 31-way and 12-way multiple alignments. Conservation scores are estimated on a column-by-column basis. Constrained elements are stretches of the multiple alignment where the sequences are highly conserved according to the previous score.

Synteny Analysis

We calculate syntenic regions using blastz-net alignments. We look for stretches where the alignment blocks are in synteny. The search is run in two phases. In the first one, syntenic alignments that are closer than 200 kbp are grouped. In the second phase, the groups that are in synteny are linked provided that no more than 2 non-syntenic groups are found between them and they are less than 3Mbp apart.

Homo sapiens Hsap
Pan troglodytes YES Ptro
Pongo pygmaeus YES - Ppyg
Macaca mulatta YES - - Mmul
Mus musculus YES - - - Mmus
Rattus norvegicus YES - - - YES Rnor
Canis familiaris YES - - - YES - Cfam
Bos taurus YES - - - - - - Btau
Equus caballus YES - - - - - - - Ecab
Monodelphis domestica YES - - - - - - - - Mdom
Gallus gallus YES - - - - - - - - - Ggal
Hsap Ptro Ppyg Mmul Mmus Rnor Cfam Btau Ecab Mdom Ggal