Does Ensembl have promoters or regulatory regions?

Canonical promoters are not available in Ensembl. However, you can find promoter associated Regulatory Features for human, which have been generated by the Ensembl Regulatory Build. These are displayed in the browser via the Regulatory Features track or can be accessed from the functional genomics database. Other regulatory data are also available, such as CTCF binding sites and motifs from the cisRED database.
These can be found in the Region in detail and Gene Regulation views, the help for these views gives more information.

How do I convert IDs? I have ENSG... IDs and I would like HGNC symbols and EntrezGene IDs along with matching Affymetrix platform HC G110 probes.

This can be done using BioMart. We outline the protocol using Ensembl genes ENSG00000162367 and ENSG00000187048. We will enter in the list of genes and export IDs from multiple databases.

Database: Ensembl genes Dataset: Homo sapiens genes Filters: GENE: ID list limit box: select as the header Ensembl Gene ID(s) and enter gene names.

Attributes: EXTERNAL:External References, select HGNC symbol and EntrezGene ID. Scroll down to EXTERNAL: Microarray Attributes to select Affy HC G110.

Click Results at the top.

See BioMart FAQs for more.

Can I view exons, introns, and flanking sequence to a transcript?

Yes! For a colour-coded sequence (previously known as ExonView) click on any transcript. From the transcript tab, click on the Exons link at the left. Click on the configure this page link at the left and customise your view. Or, try BioMart for sequence export.

I think my gene is wrongly annotated, or missing transcripts.

Ensembl determines genes using automatic annotation, involving both computer and biological expertise to determine an entire gene set. This is the Ensembl genebuild. Initial alignment of proteins/mRNAs lead to our transcript set, so all genes in Ensembl link back to protein/mRNA evidence, termed the supporting evidence. For an example see this page. Transcript information must be present in public biological databases such as EMBL-Bank, UniProt and NCBI RefSeq in order to be used to determine Ensembl genes. Click on External References from a gene page or General identifiers from a transcript page to see matching sequences across databases. Consider submitting any sequences to EMBL-Bank. For more transcripts, turn on Vega/Havana genes in the Region in Detail page. Please report any confusing gene annotation to our helpdesk.

How can I export data?

Export individual sequences or alignments using the Export link at the left of the gene, transcript, or location page. Alternatively, export in batch using BioMart. Perl programmers can use our API to access all Ensembl data. See here for more.

What is the difference between Ensembl, Havana and Merged transcripts? And what does known and novel mean?

For human and mouse Ensembl not only shows transcripts that are annotated automatically using the Ensembl genebuild pipeline, but also transcripts that are manually annotated by the Havana team. If the Ensembl and Havana annotation agree with each other the transcripts are combined into a Merged transcript. When a transcript is only annotated by Ensembl or Havana it is named an Ensembl or Havana transcript, respectively. Transcripts that do match a species-specific entry in the UniProtKB/Swiss-Prot or RefSeq databases are categorised as known, those that do not as novel. For more detailed information, please have a look at our genebuild documentation.

How do I view and order clones?

About clones

Ensembl does not have clones for sale, however BAC clones can be displayed on our browser. BAC clone numbers can be obtained by looking up accession numbers in the EMBL database. Clicking on a clone from the 'Region in Detail' view should provide this number. To turn on the clone tracks, click on Configure this page at the left and select Misc. regions for human, or External data for other species. Select one or more sets of clones, then click SAVE and close.

The international BAC clone nomenclature is described here.

Ordering clones

Try the clone registry for ordering clones. Individual libraries can be found here. Clones can also be ordered from imaGenes, C.H.O.R.I., and Geneservice.

bMQ clones for mouse

See this contact page for the bMQ library 129/AB2.2 BACs. Some BAC end sequences have been deposited in the Trace Server.

Where is the MICER resource for mouse?

The MICER clone set is available from any mouse location tab, in the region in detail. Turn on the MICER track as follows: click on Configure this page at the left of the region in detail view. Click on External data in the left menu of the panel. Turn on the DAS track named MICER clones. SAVE and close the panel. The region in detail view should now reload with the new track displayed.

Where are older or archive sites?

Click on the View in archive site link at the bottom of any page. Or, go to

How do I see multi-species comparisons?

Click on the Genomic alignments link from any gene page to view whole genome alignments for that region. Links at the left for gene trees, orthologues, paralogues and protein families offer sequence alignments on the gene level. To come soon are graphical views of the alignments, formally known as multicontigview and alignsliceview pages. These views are still available from our archive sites.

I have a list of old Ensembl IDs from a previous release. How can I find their IDs in the current version?

The gene IDs might be the same in the current version. Search for the gene ID in the browser, or in BioMart. A gene ID can change if the gene structure changes dramatically, for example if a gene is split into two, or alternatively, two genes are merged into one. Coming soon See our ID Tracker for a quick way to batch-convert Ensembl stable IDs.

Or, view our older, archive sites.

I am looking for a clone that contains my gene or region of interest

The 'Region in Detail' view displays clones along the chromosome, along with genes. Turn on clones from various clonesets in the Control Panel. Look for more information by clicking on a clone in the display. Note: Ensembl does not sell clones, only displays positional information.

How does Ensembl determine homology relationships?

For detailed documentation about the homology prediction pipeline, have a look at this article. Orthologues and paralogues are listed in the gene tab or viewable in the gene trees. Click on any node in the tree to export an alignment.

Trees are downloadable from our ftp site.

Please see the following reference for more: Vilella et. al, EnsemblCompara GeneTrees: analysis of complete, duplication aware phylogenetic trees in vertebrates.

How do I get alignments of homologous proteins? Can I get the CDS (coding sequence) alignments as well?

Yes, both can be obtained: see this example script.

How can I obtain the conserved elements calculated across multiple species?

These are the constrained elements. See this example script to obtain them using the ensembl-compara API.

Can I view syntenic regions in Ensembl?

Click on the 'Synteny' link from any Region tab in Ensembl to view conserved blocks of sequences, for example here. Syntenic regions are calculated from the pairwise alignments.

I would like a list of homologues to my gene. Should I look at the gene trees or the families?

Although there is overlap, the EnsemblCompara MCL Families and Gene Trees are two different complementary data sets.

To construct the Gene Trees, only the longest translation of each gene is included, and only species represented in Ensembl are used. However, the methodology has been specifically constructed to find homology relationships.

The families include all Ensembl transcripts plus the Uniprot/Swissprot and Uniprot/SPTREMBL peptides for all the metazoans, which duplicates the total number of peptides represented in the gene trees. These families are clustered using a Markov Clustering method, MCL.

You can view both using the gene tree, orthologues or paralogues, or protein family links from the Gene tab in the browser, or access both using the Compara-API.

BioMart can be used to export homologues calculated from the gene trees.

What are the blast and MCL options used to determine the EnsemblCompara MCL Families?

The families are calculated with the following parameters.

For version v50 (and future versions), the blastall options are:

blastall -d $fastadb -i $qy_file -p blastp -e 0.00001 -v 250 -b 0

For the MCL clustering, the parameters are:

-I 2.1 -tf 'gq(50)' -scheme 6

For version v48 and previous versions, the parameters were:

-I 2.1 -P 10000 -S 1000 -R 1260 -pct 90

I am looking for MeDIP data. These are the human, tissue-specific DNA methylation profiles discussed in Genome Research [Rakyan et. al, Sept 2008].

From any human location tab, a region in detail view such as this example can display this information. To turn on the MeDIP tracks, select Configure this page from the left-hand menu. Select Functional genomics and switch on one or more MeDIP tracks. Click save and close. The region in detail view should reload with the new tracks added.

If you have any other questions about Ensembl, please do not hesitate to contact our HelpDesk. You may also like to subscribe to the developers' mailing list.