Archive Ensembl HomeArchive Ensembl Home
Home > Help & Documentation

What's New in Release 54

New assemblies and genebuilds

New zebrafish assembly (Zebrafish)

Ensembl Zebrafish now uses the latest assembly, Zv8, from the Zebrafish Sequencing Project at the Sanger Institute. This incorporates a WGS assembly from a single zebrafish, which has made it possible to remove some haplotypic data, and an improved clone path order. More information on the assembly is available from the project website.

Data updates

Variation updates (all species)

There is a new variation database for zebrafish as this is a new assembly. Old data has been mapped to the new assembly

Zebra finch (Taeniopygia guttata), introduced to Ensembl in release 53, now has a variation database. SNPs were called by ssaha2/pileup method with 454 sequencing reads deposited in the Trace archive

Mouse consequence types have been recalculated to pick up regulatory features from the new mouse regulatory build.

Species-wide changes

  • source,sample tables have been altered to add description/display columns for web display
  • Variations that don't have mappings in variation_feature table have had their validation_status set to 'failed' and have been removed from tables that contain them

Ensembl-vega updates (Mouse)

There are new ensembl-vega datasets in both human and mouse.

Gene patch for horse (Horse)

Some split genes in Horse have been eliminated using evidence from human-mouse 1:1 homologues.

Homologous genes in human and mouse were aligned to the horse genome using Exonerate. Alignments were compared to the set of predicted genes in horse to patch the horse gene set. Horse genes which have only been partially predicted were extended by additional exons. Single genes which were mis-annotated as two distinct neighbouring genes were merged. Missing homologues in horse were also recovered.

New ncRNAs for low-coverage species (multiple species)

A selection of low-coverage species have had their ncRNAs computed, including sloth, armadillo, kangaroo rat, elephant, hyrax, megabat, tarsier, dolphin and alpaca.

Mouse DNAse1 data (Mouse)

Ensembl Mouse has a new set of DNAse1 Hypersensitivity sites.

eFG Array mapping (Fly, Human, Mouse)

Ensembl Human, Mouse and Fly include new array mapping data utilising the new eFG array mapping environment. These include both genomic and transcript mapping for the following formats where available:

  • AFFY IVT/UTR
  • AFFY ST
  • ILLUMINA WG (Whole Genome)

Zebrafish Array Mapping (Zebrafish)

Array mapping has been done for the new Zebrafish assembly, including:

  • AFFY (Genome + Transcript Xrefs)
  • LEIDEN (Genome Only)
  • AGILENT (Genome Only)

Mart Builds (all species)

Ensembl mart 54 includes:

  • New assembly for Zebrafish (Zv8) plus corrected assembly names for various other species.
  • Corrected Zebrafish attribute Agilent g2519f internal naming error reported by BioConductor user
  • Removed the % Identity and % Coverage from the Homolog attributes as advised by Compara. These are attributes from the WUBLASP step in the homology inference pipeline but they are not measures of homology confidence.
  • "Source description" has been added to the Variations attribute section.

In addition, Vega mart 35 is a new mart build, and Ensembl variation mart 54 now includes a new species, Zebra finch.

Gene name and xref projections (all species)

Gene names and GO terms have been projected between species based on Compara homologies.

Amazon public datasets (all species)

Amazon public datasets have been updated with the new e54 data.

cDNA update of human and mouse (all species)

The regular per-release updates of human and mouse cDNAs are included in this release.

Homologies and families (all species)

  • 49-way GeneTrees and Homologies have been created, with new/updated genebuilds and assemblies:
    • Clustering by hcluster_sg
    • Multiple Sequence Alignments with consistency-based MCoffee meta-aligner (mafftgins+muscle+kalign+probcons)
    • Pairwise gene-based dN/dS calculations for high coverage species pairs
  • MCL families have been updated, including all Ensembl transcript isoforms and newest Uniprot Metazoa:
    • Clustering by MCL
    • Multiple Sequence Alignments with MAFFT

Stable IDs have been created for both sets

Pairwise alignments (all species)

We have updated the pairwise alignments for zebrafish (Danio rerio):

  • human-zebrafish translated BLAT-NET
  • mouse-zebrafish translated BLAT-NET
  • rat-zebrafish translated BLAT-NET
  • chicken-zebrafish translated BLAT-NET
  • frog-zebrafish translated BLAT-NET
  • tetraodon-zebrafish translated BLAT-NET
  • fugu-zebrafish translated BLAT-NET
  • medaka-zebrafish translated BLAT-NET
  • stickleback-zebrafish translated BLAT-NET
  • Ciona savignyi-zebrafish translated BLAT-NET
  • Ciona intestinalis-zebrafish translated BLAT-NET

We have also added new alignments for medaka:

  • human-medaka BLASTZ-NET (imported from UCSC)
  • mouse-medaka BLASTZ-NET (imported from UCSC)

Compara dumps (all species)

EMF dumps are available for gene trees and EPO and PECAN multiple alignments, and BED files for 31-way and 12-way GERP constrained elements.

Web features

New compara views (all species)

Two new comparative genomics views have been added to Ensembl 54: a text-based Genomic Alignments view, and a "marked-up sequence" view of Phylogenetic Context on variation data. Below are some sample links to these views in Human:

Genomic Alignments, showing a constrained element

To access this view:

  1. from Location View, click on "Genomic Alignments" (left-hand menu)
  2. from Compara alignments and constrained elements, click on the feature and follow the "View alignments" link.

Context for variation features:

  1. Example 1 - simple case
  2. Example 2 - a case where dbSNP's ancestral allele seems wrong
  3. Example 3 - variation across species

To access this page, from Variation View click on "Phylogenetic Context" (left-hand menu)

User annotation of genes and transcripts (all species)

As part of our process of reinstating features that were omitted from preliminary releases of the new web code, we are pleased to include user annotation of both genes and transcripts in release 54. Gene annotations created prior to the site makeover should now be visible in your user account, and will be show on the 'Gene' section of the appropriate species - see the 'Personal annotation' link in the lefthand menu.

API and schema changes

Mouse regulatory build (Mouse)

We would like to announce the release of the first mouse "Regulatory Build", expanding our species coverage from human. Generated using he Ensembl functional genomics regulatory build pipeline, these RegulatoryFeatures provide a set of "best guess" regulatory elements, integrating both published and pre-publication data sets. A "focus" set of DNase1 Hypersensitivity from embronic stem cells was produced by a collaboration between Ensembl, David Adams (Wellcome Trust Sanger Institute), and Greg Crawford (Duke University). Analysing this focus set in conjunction with supporting data from 4 different cells line, including 4 specific histone modifications, RNA PolII sites and generalised Histone 3 methylation, provides 140609 regulatory annotations across the mouse genome. For more details see the Regulatory Build page.

Change to default behaviour of TranscriptAdaptor (all species)

When creating a Transcript object using Bio::EnsEMBL::Transcript->new(...), the 'edits_enabled' attribute for the transcript is set to '1' by default, but when Bio::EnsEMBL::DBSQL::TranscriptAdaptor is creating transcripts, this is not the case. This change will set the 'edits_enabled' attribute to '1' for transcripts created by the TranscriptAdaptor. See existing documentation for edits_enabled() in the Transcript module for further info.

Schema patches (all species)

  1. the meta table has updates to the schema version, and we have also changed the version in Registry.pm. (patch_53_54_a.sql)
  2. the size of the logic_name column in the analysis table is increased to 128 characters to better support the longer logic names required by the user upload functionality. Also the size of db_name column in external_db has increased to 100 characters, and the name column in oligo_probe to 40 characters, to accommodate longer probe names and for consistency with eFG . (patch_53_54_b.sql)
  3. the analysis_id column has been moved from identity_xref to object_xref, so that all object_xrefs can have an associated analysis, not just those that have been assigned by sequence matching. (patch_53_54_c.sql)

Taxonomy in core meta table (all species)

Entries such as "Homo/Pan/Gorilla group" have been removed from species.classification entries in the meta table, as they are not part of the standard taxonomy hierarchy and interfere with the web code's automated parsing and grouping of Ensembl species into a tree.

Translation attribs modified (all species)

Two of the translation_attribs introduced a while ago (starts_met and stop_codon to indicate whether the translation starts with methionine and there is a stop_codon at the end) have been removed from the core databases.

eFG Array Mapping Environment (all species)

A new array-mapping environment is now available, utilising the Ensembl functional genomics DB and API. The new strategy involves both genomic and direct-to-transcript sequence alignments, revised transcript xref rules and broader format support includes:

  • AFFY IVT/UTR
  • AFFY ST
  • ILLUMINA WG
  • CODELINK
  • PHALANX
  • AGILENT

The new environment also includes support for multi-species databases.