Archive Ensembl HomeArchive Ensembl Home
Home > Help & Documentation

The Ensembl FTP Server

If required, entire databases can be downloaded from our FTP site in a variety of formats, from flat files to MySQL dumps. Please be aware that these files can run to many gigabytes of data.

To facilitate storage and download all databases are GNU Zip (gzip, *.gz) compressed.

Please note: Ensembl supports downloading of many correlation tables via the highly customisable BioMart data mining tool. You may find exploring this web-based data mining tool easier than extracting information from our database dumps.

Species Files
Aedes aegypti (Aedes) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Anolis carolinensis (Anole Lizard) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Anopheles gambiae (Anopheles) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Bos taurus (Cow) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Caenorhabditis elegans (C.elegans) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Canis familiaris (Dog) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Cavia porcellus (Guinea Pig) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Choloepus hoffmanni (Sloth) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Ciona intestinalis (C.intestinalis) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Ciona savignyi (C.savignyi) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Danio rerio (Zebrafish) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Dasypus novemcinctus (Armadillo) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Dipodomys ordii (Kangaroo rat) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Drosophila melanogaster (Fruitfly) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Echinops telfairi (Lesser hedgehog tenrec) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Equus caballus (Horse) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Erinaceus europaeus (Hedgehog) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Felis catus (Cat) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Gallus gallus (Chicken) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Gasterosteus aculeatus (Stickleback) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Gorilla gorilla (Gorilla) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Homo sapiens (Human) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF EMF FUNCGEN -
Loxodonta africana (Elephant) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Macaca mulatta (Macaque) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Microcebus murinus (Mouse Lemur) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Monodelphis domestica (Opossum) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Mus musculus (Mouse) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF EMF FUNCGEN -
Myotis lucifugus (Microbat) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Ochotona princeps (Pika) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Ornithorhynchus anatinus (Platypus) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Oryctolagus cuniculus (Rabbit) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Oryzias latipes (Medaka) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Otolemur garnettii (Bushbaby) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Pan troglodytes (Chimpanzee) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Pongo pygmaeus (Orangutan) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Procavia capensis (Hyrax) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Pteropus vampyrus (Megabat) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Rattus norvegicus (Rat) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF EMF - -
Saccharomyces cerevisiae (S.cerevisiae) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Sorex araneus (Shrew) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Spermophilus tridecemlineatus (Squirrel) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Taeniopygia guttata (Zebra Finch) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Takifugu rubripes (Fugu) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Tarsius syrichta (Tarsier) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Tetraodon nigroviridis (Tetraodon) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Tupaia belangeri (Tree Shrew) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Tursiops truncatus (Dolphin) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Vicugna pacos (Alpaca) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Xenopus tropicalis (X.tropicalis) FASTA (DNA) FASTA (cDNA) FASTA (protein) EMBL GenBank MySQL GTF - - -
Multi-species - - - - - mySQL - EMF - BED
Ensembl Mart - - - - - mySQL - - - -

About the data

The following types of data dumps are available on the FTP site.

FASTA
FASTA sequence databases of Ensembl gene, transcript and protein model predictions. Since the FASTA format does not permit sequence annotation, these database files are mainly intended for use with local sequence similarity search algorithms. Each directory has a README file with a detailed description of the header line format and the file naming conventions.
DNA
Masked and unmasked genome sequences associated with the assembly (contigs, chromosomes etc.).
cDNA
cDNA sequences for Ensembl or ab initio predicted genes.
Peptides
Protein sequences for Ensembl or ab initio predicted genes.
RNA
Non-coding RNA gene preditions.
Flatfile
Flat files allow more extensive sequence annotation by means of feature tables and contain thus the genome sequence as annotated by the automated Ensembl genome annotation pipeline. Each nucleotide sequence record in a flat file represents a 1Mb slice of the genome sequence. Flat files are broken into chunks of 1000 sequence records for easier downloading.
EMBL
Ensembl database dumps in EMBL nucleotide sequence database format
GenBank
Ensembl database dumps in GenBank nucleotide sequence database format
MySQL
All Ensembl MySQL databases are available in text format as are the SQL table definition files. These can be imported into to any SQL database for a local installation of a mirror site. Generally, the FTP directory tree contains one one directory per database. For more information about these databases and their Application Programming Interfaces (or APIs) see the API section.
GTF
Gene sets for each species. These files include annotations of both coding and non-coding genes. This file format is described here.
EMF flatfile dumps (variation and comparative data)
Alignments of resequencing data are available for several species as Ensembl Multi Format (EMF) flatfile dumps. The accompanying README file describes the file format.

Also, the same format is used to dump whole-genome multiple alignments as well as gene-based multiple alignments and phylogentic trees used to infer Ensembl orthologues and paralogues. These files are available in the ensembl_compara database which will be found in the multi_species directory.

BED format files (comparative data)
Constrained elements calculated using GERP are available in BED format. For more information see the accompanying README file.

BED format is a simple line-based format. The first 3 mandatory column are:

  • chromosome name, starting with chr to be compiant with UCSC browser. Example: chrX
  • start position. This is a 0-based position
  • end position.
More information on the BED file format is available here.

Each directory on ftp.ensembl.org contains a README file. This additional document explains the FTP directory structure.