Archive Ensembl HomeArchive Ensembl Home
Home > Help & Documentation

Variation API Tutorial

Introduction

This tutorial is an introduction to the Ensembl Variation API. Knowledge of the Ensembl Core API and of the concepts and conventions in the Ensembl Core API tutorial is assumed. Documentation about the Variation database schema is available at http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/schema/?root=ensembl , and while not necessary for this tutorial, an understanding of the database tables may help as many of the adaptor modules are table-specific.

Code Conventions (and unconventions)

Refer to the Ensembl core tutorial for a good description of the coding conventions normally used in Ensembl. Please note that there may be exceptions to these rules in variation.

Connecting an Ensembl variation database

Connecting to an Ensembl variation database is made simple by using the Bio::EnsEMBL::Registry module:

use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

The use of the registry ensures you will load the correct versions of the Ensembl databases for the software release it can find on a database instance. Using the registry object, you can then create any of number of database adaptors. Each of these adaptors is responsible for generating an object of one type. The Ensembl variation API uses a number of object types that relate to the data stored in the database. For example, in order to generate variation objects, you should first create a variation adaptor:

use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

my $variation_adaptor = $registry->get_adaptor(
	'human',	# species
	'variation',	# database
	'variation'	# object type
);

my $variation = $variation_adaptor->fetch_by_name('rs1333049');

The get_adaptor method will automatically create a connection to the relevant database; in the example above, a connection will be made to the variation database for human. The three parameters passed specify the species, database and object type you require. Below is a non exhaustive list of Ensembl variation adaptors that are most often used

  • IndividualAdaptor to fetch Bio::EnsEMBL::Variation::Individual objects
  • LDFeatureContainerAdaptor to fetch Bio::EnsEMBL::Variation::LDFeatureContainer objects
  • PopulationAdaptor to fetch Bio::EnsEMBL::Variation::Population objects
  • ReadCoverageAdaptor to fetch Bio::EnsEMBL::Variation::ReadCoverage objects
  • TranscriptVariationAdaptor to fetch Bio::EnsEMBL::Variation::TranscriptVariation objects
  • VariationAdaptor to fetch Bio::EnsEMBL::Variation::Variation objects
  • VariationFeatureAdaptor to fetch Bio::EnsEMBL::Variation::VariationFeature objects

Only some of these adaptors will be used for illustration as part of this tutorial through commented perl scripts code.

Variations in the genome

One of the most important uses for the variation database is to be able to get all variations in a certain region in the genome. Below it is a simple commented perl script to illustrate how to get all variations in chromosome 25 in zebrafish.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

my $slice_adaptor = $registry->get_adaptor('danio_rerio', 'core', 'slice'); #get the database adaptor for Slice objects
my $slice = $slice_adaptor->fetch_by_region('chromosome',25); #get chromosome 25 in zebrafish

my $vf_adaptor = $registry->get_adaptor('danio_rerio', 'variation', 'variationfeature'); #get adaptor to VariationFeature object
my $vfs = $vf_adaptor->fetch_all_by_Slice($slice); #return ALL variations defined in $slice

foreach my $vf (@{$vfs}){
  print "Variation: ", $vf->variation_name, " with alleles ", $vf->allele_string, 
        " in chromosome ", $slice->seq_region_name, " and position ", $vf->start,"-",$vf->end,"\n";
}

Consequence type of variations

Another common use of the variation database is to retrieve the effects that variations have on a transcript. In the example below, it is explained how to get all variations in a particular chicken transcript and see what is the effect of that variation in the transcript.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

my $stable_id = 'ENSGALT00000007843'; #this is the stable_id of a chicken transcript
my $transcript_adaptor = $registry->get_adaptor('gallus_gallus', 'core', 'transcript'); #get the adaptor to get the Transcript from the database
my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id); #get the Transcript object

my $trv_adaptor = $registry->get_adaptor('gallus_gallus', 'variation', 'transcriptvariation'); #get the adaptor to get TranscriptVariation objects
my $trvs = $trv_adaptor->fetch_all_by_Transcripts([$transcript]); #get ALL effects of Variations in the Transcript

foreach my $tv (@{$trvs}){
  print "SNP :",$tv->variation_feature->variation_name, " has a consequence/s ", 
    join(",",@{$tv->consequence_type}), " in transcript ", $stable_id, "\n";
  #print the name of the variation and the effect (consequence_type) of the variation in the Transcript
}

Variations, Flanking sequences and Genes

Below is a complete example on how to use the variation API to retrieve different data from the database. In this particular example, we want to get, for a list of variation names, information about alleles, flanking sequences, locations, effects of variations in transcripts, position in the transcript (in case it has a coding effect) and genes containing the transcripts.

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

my $va_adaptor = $registry->get_adaptor('human', 'variation', 'variation'); #get the different adaptors for the different objects needed
my $vf_adaptor = $registry->get_adaptor('human', 'variation', 'variationfeature');
my $gene_adaptor = $registry->get_adaptor('human', 'core', 'gene');

my @rsIds = qw(rs1367827 rs1367830);
foreach my $id (@rsIds){
# get Variation object
  my $var = $va_adaptor->fetch_by_name($id); #get the Variation from the database using the name
  &get_VariationFeatures($var);
}

sub get_VariationFeatures{
  my $var = shift;
  # get all VariationFeature objects: might be more than 1 !!!
  foreach my $vf (@{$vf_adaptor->fetch_all_by_Variation($var)}){
      print $vf->variation_name(),","; # print rsID
      print $vf->allele_string(),","; # print alleles
      print join(",",@{$vf->get_consequence_type()}),","; # print consequenceType
      print substr($var->five_prime_flanking_seq,-10) , "[",$vf->allele_string,"]"; #print the allele string
      print substr($var->three_prime_flanking_seq,0,10), ","; # print RefSeq
      print $vf->seq_region_name, ":", $vf->start,"-",$vf->end; # print position in Ref in format Chr:start-end
      &get_TranscriptVariations($vf); # get Transcript information
  }
}

sub get_TranscriptVariations{
  my $vf = shift; 
  
  # get all TranscriptVariation objects: might be more than 1 !!!
  my $transcript_variations = $vf->get_all_TranscriptVariations; #get ALL the effects of the variation in 
                                                                    # different Transcripts
  if (defined $transcript_variations){
    foreach my $tv (@{$transcript_variations}){
      print ",", $tv->pep_allele_string if (defined $tv->pep_allele_string);
                                              # the AA change, but only if it is in a coding region
      my $gene = $gene_adaptor->fetch_by_transcript_id($tv->transcript->dbID);
      print ",",$gene->stable_id if (defined $gene->external_name); # and the external gene name
    }
  }
  print "\n";
}

LD calculation

In order to be able to use the LD calculation, you need to compile the C source code and install a module, called IPC::Run. There is more information on how to do this in Use LD calculation In the example below, it calculates the LD in a region in human chromosome 6 for a HAPMAP population, but only prints when there is a high LD

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

my $chr = 6;  #defining the region in chromosome 6
my $start = 25_834_000;
my $end = 25_854_000;
my $population_name = 'CSHL-HAPMAP:HapMap-CEU'; #we only want LD in this population

my $slice_adaptor = $registry->get_adaptor('human', 'core', 'slice'); #get adaptor for Slice object
my $slice = $slice_adaptor->fetch_by_region('chromosome',$chr,$start,$end); #get slice of the region


my $population_adaptor = $registry->get_adaptor('human', 'variation', 'population'); #get adaptor for Population object
my $population = $population_adaptor->fetch_by_name($population_name); #get population object from database

my $ldFeatureContainerAdaptor = $registry->get_adaptor('human', 'variation', 'ldfeaturecontainer'); #get adaptor for LDFeatureContainer object
my $ldFeatureContainer = $ldFeatureContainerAdaptor->fetch_by_Slice($slice,$population); #retrieve all LD values in the region

foreach my $r_square (@{$ldFeatureContainer->get_all_r_square_values}){
  if ($r_square->{r2} > 0.8){ #only print high LD, where high is defined as r2 > 0.8
    print "High LD between variations ", $r_square->{variation1}->variation_name,"-", $r_square->{variation2}->variation_name, "\n";
  }
}

Specific strain information

With the apparition of the new technologies, one of the new functionalities that the variation API has is the possibility to work with your specific strain as if it was the reference one, and compare it against others. In the example, we create a StrainSlice object for a region in Craig Venter's sequence and compare it against the reference sequence.

use strict;
use warnings;

use Bio::EnsEMBL::Registry;

my $reg = 'Bio::EnsEMBL::Registry';
my $host= 'ensembldb.ensembl.org';
my $user= 'anonymous';

$reg->load_registry_from_db(
	-host => $host,
    -user => $user
);

# get exon adaptor from core
my $sa = $reg->get_adaptor("human", "core", "slice");

my $slice = $sa->fetch_by_region('chromosome', 8, 9213000, 9216000);

# get strainSlice from the slice
my $venter = $slice->get_by_strain("Venter");

my @differences = @{$venter->get_all_AlleleFeatures_Slice()};

foreach my $diff (@differences){
  print "Locus ", $diff->seq_region_start, "-", $diff->seq_region_end, ", Venter's alleles: ",$diff->allele_string, "\n";
}

Further help

For additional information or help mail the ensembl-dev mailing list. You will need to subscribe to this mailing list to use it. More information on subscribing to any Ensembl mailing list is available from the Ensembl Contacts page.