The Ensembl Annotation Process
Main genebuild
Genes are annotated using an automated pipeline in the genebuild procedure. These transcripts are based on mRNA and proteins in public scientific databases. Transcripts from this pipeline are combined with a reviewed, manually curated set from the Vega project, along with reviewed protein coding transcripts from the CCDS project.
See the annotation article for more about the Ensembl pipeline, gene names and annotation. A description of EST genes is also included in this article.
More genes
Ensembl also includes automatically-annotated Ig segments and non-coding RNAs.
Annotation methods
- Exonerate is a tool for pairwise sequence comparison, used in the Ensembl genebuild to align mRNA to the assembly. This generic tool allows many alignment models to be used, either using exhaustive dynamic programming or a variety of heuristics.
- Wise2, also used in the genebuild, focuses on comparisons of biopolymers, commonly DNA and protein sequence. Algorithms in this package include genewise and estwise.
- Low-coverage genomes are annotated using a modified pipeline which attempts to locate genes across multiple scaffolds.