Annotation of Non-Coding RNAs
Non-coding RNA Overview
Non-coding RNAs (ncRNAs) are involved in many biological processes and are increasingly seen as important. As is the case with proteins, it is the overall structure of the molecule which imparts function. However, while similar protein structures are often reflected in a conserved amino acid sequence, sequences underlying RNA secondary structure are very variable; this makes ncRNAs difficult to detect using sequence alone.
Because of this, we use a variety of techniques to detect ncRNAs. First, a combination of sensitive BLAST searches are used to identify likely targets, then a covariance model search is used to measure the probability that the targets can fold into the structures required. Other ncRNAs are added as part of the raw compute stage.
The following non-coding RNA gene types are annotated, along with pseudogenes
- tRNA
- transfer RNA
- Mt-tRNA
- transfer RNA located in the mitochondrial genome
- rRNA
- ribosomal RNA
- scRNA
- small cytoplasmic RNA
- snRNA
- small nuclear RNA
- snoRNA
- small nucleolar RNA
- miRNA
- microRNA precursors
- misc_RNA
- miscellaneous other RNA
Annotation Details
Most ncRNAs are annotated by aligning genomic sequence against RFAM using BLASTN. The BLAST hits are clustered and filtered by E value and are used to seed Infernal searches of the locus with the corresponding RFAM covariance models. The purpose of this is to reduce the search space required, as to scan the entire genome with all the RFAM covariance models would be extremely CPU-intensive. The resulting BLAST hits are then used as supporting evidence for ncRNA genes.
miRNAs are predicted by BLASTN of genomic sequence slices against miRBase sequences. The BLAST hits are clustered and filtered by E value and the aligned genomic sequence is then checked for possible secondary structure using RNAFold. If evidence is found that the genomic sequence could form a stable hairpin structure, the locus is used to create a miRNA gene model. The resulting BLAST hit is used as supporting evidence for the miRNA gene.
tRNAs are annotated as part of the raw compute process using tRNAscan-SE.