Annotation of Ig/Tcr Segment Genes
Ensembl Ig Segment genes are annotations of the Immunoglobulin and T-cell receptor genes present in germ-line genomic DNA. During B-cell and T-cell maturation, these genes are rearranged and brought together by V-D-J recombination to form functional Immunoglobulin and T-cell receptor genes.
The protein and nucleotide databases contain many sequences of mature Immunoglobulin/T-cell receptor genes. Inferring gene structures by aligning these to the genome using the standard Ensembl genebuild pipeline gives unsatisfactory results, for two main reasons:
- The non-coding DNA that separates pairs of gene segments brought together by V-D-J recombination is not an intron, and the gene boundaries display different sequence signals to splice sites. Spliced alignment programs such as GeneWise therefore require retuning in order to predict the gene structure correctly.
- Some genes comprise multiple exons. Delineating a "spliced" alignment of a mature Immunoglobulin/T-cell receptor sequence to the genome into its constituent segments is therefore not straightforward.
For this reason, Immunoglobulin/T-cell receptor gene segments have been annotated differently to other Ensembl genes. Ensembl makes use of the IMGT database, which provides annotation of individual gene segments on reference sequences. Ensembl extracts individual gene sequences and aligns them to the genome using the Exonerate alignment tool.
To reflect their role in the V-D-J recombination process, segments are annotated as belonging to one of four classes: V segment, D segment, J segment and C segment. At present, Ensembl Ig gene annotations are only available for human and mouse.
References
IMGT, the international ImMunoGeneTics information system.
Nucl. Acids Res. 2005 33:D593-D597
Automated generation of heuristics for biological sequence comparison.
BMC Bioinformatics 2005 6:31