The Ensembl FTP Server
If required, entire databases can be downloaded from our FTP site in a variety of formats, from flat files to MySQL dumps. Please be aware that these files can run to many gigabytes of data.
To facilitate storage and download all databases are GNU Zip (gzip, *.gz) compressed.
Please note: Ensembl supports downloading of many correlation tables via the highly customisable BioMart data mining tool. You may find exploring this web-based data mining tool easier than extracting information from our database dumps.
Species | Files | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Aedes aegypti (Aedes) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Anolis carolinensis (Anole Lizard) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Anopheles gambiae (Anopheles) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Bos taurus (Cow) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Caenorhabditis elegans (C.elegans) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Canis familiaris (Dog) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Cavia porcellus (Guinea Pig) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Choloepus hoffmanni (Sloth) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Ciona intestinalis (C.intestinalis) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Ciona savignyi (C.savignyi) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Danio rerio (Zebrafish) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Dasypus novemcinctus (Armadillo) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Dipodomys ordii (Kangaroo rat) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Drosophila melanogaster (Fruitfly) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Echinops telfairi (Lesser hedgehog tenrec) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Equus caballus (Horse) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Erinaceus europaeus (Hedgehog) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Felis catus (Cat) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Gallus gallus (Chicken) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Gasterosteus aculeatus (Stickleback) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Gorilla gorilla (Gorilla) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Homo sapiens (Human) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | EMF | FUNCGEN | - |
Loxodonta africana (Elephant) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Macaca mulatta (Macaque) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Microcebus murinus (Mouse Lemur) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Monodelphis domestica (Opossum) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Mus musculus (Mouse) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | EMF | FUNCGEN | - |
Myotis lucifugus (Microbat) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Ochotona princeps (Pika) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Ornithorhynchus anatinus (Platypus) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Oryctolagus cuniculus (Rabbit) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Oryzias latipes (Medaka) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Otolemur garnettii (Bushbaby) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Pan troglodytes (Chimpanzee) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Pongo pygmaeus (Orangutan) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Procavia capensis (Hyrax) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Pteropus vampyrus (Megabat) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Rattus norvegicus (Rat) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | EMF | - | - |
Saccharomyces cerevisiae (S.cerevisiae) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Sorex araneus (Shrew) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Spermophilus tridecemlineatus (Squirrel) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Taeniopygia guttata (Zebra Finch) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Takifugu rubripes (Fugu) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Tarsius syrichta (Tarsier) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Tetraodon nigroviridis (Tetraodon) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Tupaia belangeri (Tree Shrew) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Tursiops truncatus (Dolphin) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Vicugna pacos (Alpaca) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Xenopus tropicalis (X.tropicalis) | FASTA (DNA) | FASTA (cDNA) | FASTA (protein) | EMBL | GenBank | MySQL | GTF | - | - | - |
Multi-species | - | - | - | - | - | mySQL | - | EMF | - | BED |
Ensembl Mart | - | - | - | - | - | mySQL | - | - | - | - |
About the data
The following types of data dumps are available on the FTP site.
- FASTA
- FASTA sequence databases of Ensembl gene, transcript and protein model
predictions.
Since the
FASTA format
does not permit sequence annotation, these database files are mainly intended
for use with local sequence similarity search algorithms.
Each directory has a README file with a detailed description of the header line format and the file naming conventions.
- DNA
- Masked and unmasked genome sequences associated with the assembly (contigs, chromosomes etc.).
- cDNA
- cDNA sequences for Ensembl or ab initio predicted genes.
- Peptides
- Protein sequences for Ensembl or ab initio predicted genes.
- RNA
- Non-coding RNA gene preditions.
- Flatfile
- Flat files allow more extensive sequence annotation by means of feature tables
and contain thus the genome sequence as annotated by the automated Ensembl
genome annotation pipeline.
Each nucleotide sequence record in a flat file represents a 1Mb slice of the
genome sequence.
Flat files are broken into chunks of 1000 sequence records for easier
downloading.
- EMBL
- Ensembl database dumps in EMBL nucleotide sequence database format
- GenBank
- Ensembl database dumps in GenBank nucleotide sequence database format
- MySQL
- All Ensembl MySQL databases are available in text format as are the SQL table definition files. These can be imported into to any SQL database for a local installation of a mirror site. Generally, the FTP directory tree contains one one directory per database. For more information about these databases and their Application Programming Interfaces (or APIs) see the API section.
- GTF
- Gene sets for each species. These files include annotations of both coding and non-coding genes. This file format is described here.
- EMF flatfile dumps (variation and comparative data)
- Alignments of resequencing data are available for several species as Ensembl
Multi Format (EMF) flatfile dumps. The accompanying README file describes the file
format.
Also, the same format is used to dump whole-genome multiple alignments as well as gene-based multiple alignments and phylogentic trees used to infer Ensembl orthologues and paralogues. These files are available in the ensembl_compara database which will be found in the multi_species directory.
- BED format files (comparative data)
- Constrained elements calculated using GERP are available in BED format. For more information see the accompanying README file.
BED format is a simple line-based format. The first 3 mandatory column are:
- chromosome name, starting with chr to be compiant with UCSC browser. Example: chrX
- start position. This is a 0-based position
- end position.
Each directory on ftp.ensembl.org contains a README file. This additional document explains the FTP directory structure.