Archive Ensembl HomeArchive Ensembl Home
Home > Help & Documentation

Annotation of Non-Coding RNAs

Non-coding RNA Overview

Non-coding RNAs (ncRNAs) are involved in many biological processes and are increasingly seen as important. As is the case with proteins, it is the overall structure of the molecule which imparts function. However, while similar protein structures are often reflected in a conserved amino acid sequence, sequences underlying RNA secondary structure are very variable; this makes ncRNAs difficult to detect using sequence alone.

Because of this, we use a variety of techniques to detect ncRNAs. First, a combination of sensitive BLAST searches are used to identify likely targets, then a covariance model search is used to measure the probability that the targets can fold into the structures required. Other ncRNAs are added as part of the raw compute stage.

The following non-coding RNA gene types are annotated, along with pseudogenes

tRNA
transfer RNA
Mt-tRNA
transfer RNA located in the mitochondrial genome
rRNA
ribosomal RNA
scRNA
small cytoplasmic RNA
snRNA
small nuclear RNA
snoRNA
small nucleolar RNA
miRNA
microRNA precursors
misc_RNA
miscellaneous other RNA

Annotation Details

Most ncRNAs are annotated by aligning genomic sequence against RFAM using BLASTN. The BLAST hits are clustered and filtered by E value and are used to seed Infernal searches of the locus with the corresponding RFAM covariance models. The purpose of this is to reduce the search space required, as to scan the entire genome with all the RFAM covariance models would be extremely CPU-intensive. The resulting BLAST hits are then used as supporting evidence for ncRNA genes.

miRNAs are predicted by BLASTN of genomic sequence slices against miRBase sequences. The BLAST hits are clustered and filtered by E value and the aligned genomic sequence is then checked for possible secondary structure using RNAFold. If evidence is found that the genomic sequence could form a stable hairpin structure, the locus is used to create a miRNA gene model. The resulting BLAST hit is used as supporting evidence for the miRNA gene.

tRNAs are annotated as part of the raw compute process using tRNAscan-SE.