Bio::EnsEMBL::Funcgen::DBSQL
CoordSystemAdaptor
Toolbar
Summary
Bio::EnsEMBL::Funcgen::DBSQL::CoordSystemAdaptor
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
my $db = Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor->new(...);
my $csa = $db->get_CoordSystemAdaptor();
#
# Fetch by name, schema_build and version(opt).
#
$cs = $csa->fetch_by_name_schema_build_version('chromosome', '39_36a', 'NCBI36');
#As this is a multi-assembly DB, we have to accomodate the idea of schema versions, which will
#enable a mapping from the feature table back to assembly/core DB of origin.
#Old core methods, some may not work as they assume that there will only be one default version
#where are there maybe multiple default versions, one for each assembly/schema_build
#
# Get all coord systems in the database:
#
foreach my $cs (@{$csa->fetch_all()}) {
print $cs->name, ' ', $cs->version, "\n";
}
#
# Fetching by name:
#
#use the default version of coord_system 'chromosome' (e.g. NCBI33):
$cs = $csa->fetch_by_name('chromosome');
#get an explicit version of coord_system 'chromosome':
$cs = $csa->fetch_by_name('chromsome', 'NCBI34');
#get all coord_systems of name 'chromosome':
foreach $cs (@{$csa->fetch_all_by_name('chromosome')}) {
print $cs->name, ' ', $cs->version, "\n";
}
#
# Fetching by rank:
#
$cs = $csa->fetch_by_rank(2);
#
# Fetching the pseudo coord system 'toplevel'
#
#Get the default top_level coord system:
$cs = $csa->fetch_top_level();
#can also use an alias in fetch_by_name:
$cs = $csa->fetch_by_name('toplevel');
#can also request toplevel using rank=0
$cs = $csa->fetch_by_rank(0);
#
# Fetching by sequence level:
#
#Get the coord system which is used to store sequence:
$cs = $csa->fetch_sequence_level();
#can also use an alias in fetch_by_name:
$cs = $csa->fetch_by_name('seqlevel');
#
# Fetching by id
#
$cs = $csa->fetch_by_dbID(1);
Description
The Funcgen CoordSystemAdaptor works slighty different to the core version. As
the Funcgen DB stores features mapped to multiple core/dna DBs the schema and
data versions(i.e. the last bit of the DB name) have to be stored. This maintains
a link between the seq_region_id stored in the Funcgen DB and the seq_region and assembly
tables stored in the core DB on which the features were originally built.
Default versions or ranking has not yet been tested.
This adaptor allows the querying of information from the coordinate system
adaptor.
Note that many coordinate systems do not have a concept of a version
for the entire coordinate system (though they may have a per-sequence version).
The 'chromosome' coordinate system usually has a version (i.e. the
assembly version) but the clonal coordinate system does not (despite having
individual sequence versions). In the case where a coordinate system does
not have a version an empty string ('') is used instead.
Methods
Methods description
Arg [1] : none Example : foreach my $cs (@{$csa->fetch_all()}) { print $cs->name(), ' ', $cs->version(), "\n"; } Description: Retrieves every coordinate system defined in the DB. These will be returned in ascending order of rank. I.e. The highest coordinate system with rank=1 would be first in the array. Returntype : listref of Bio::EnsEMBL::Funcgen::CoordSystems Exceptions : none Caller : general Status : at risk |
Arg [1] : string $name The name of the coordinate system to retrieve. This can be the name of an actual coordinate system or an alias for a coordinate system. Valid aliases are 'toplevel' and 'seqlevel'. Example : foreach my $cs (@{$csa->fetch_all_by_name('chromosome')}){ print $cs->name(), ' ', $cs->version(); } Description: Retrieves all coordinate systems of a particular name Returntype : listref of Bio::EnsEMBL::Funcgen::CoordSystem objects Exceptions : throw if no name argument provided Caller : general Status : Medium |
Arg [1] : int dbID Example : $cs = $csa->fetch_by_dbID(4); Description: Retrieves a coord_system via its internal identifier, or undef if no coordinate system with the provided id exists. Returntype : Bio::EnsEMBL::Funcgen::CoordSystem or undef Exceptions : thrown if no coord_system exists for specified dbID Caller : general Status : Stable |
Arg [1] : string $name The name of the coordinate system to retrieve. Alternatively this may be an alias for a real coordinate system. Valid aliases are 'toplevel' and 'seqlevel'. Arg [2] : optional - string $version The version of the coordinate system to retrieve. If not specified the default version for the appropriate schema_build will be used. Example : $coord_sys = $csa->fetch_by_name('chromosome', 'NCBI36'); # toplevel is an pseudo coord system representing the highest # coord system in a given region # such as the chromosome coordinate system $coord_sys = $csa->fetch_by_name('toplevel'); #seqlevel is an alias for the sequence level coordinate system #such as the clone or contig coordinate system $coord_sys = $csa->fetch_by_name('seqlevel'); Description: Retrieves a coordinate system by its name Returntype : Bio::EnsEMBL::Funcgen::CoordSystem Exceptions : throw if no name argument provided warning if no version provided and default does not exist Caller : general Status : At risk |
Arg [1] : int $rank Example : my $cs = $coord_sys_adaptor->fetch_by_rank(1); Description: Retrieves a CoordinateSystem via its rank. 0 is a special rank reserved for the pseudo coordinate system 'toplevel'. undef is returned if no coordinate system of the specified rank exists. Returntype : Bio::EnsEMBL::Funcgen::CoordSystem Exceptions : none Caller : general Status : At risk |
Arg [1] : none Example : ($id, $name, $version) = $csa->fetch_sequence_level(); Description: Retrieves the coordinate system at which sequence is stored at. Returntype : Bio::EnsEMBL::Funcgen::CoordSystem Exceptions : throw if no sequence_level coord system exists at all throw if multiple sequence_level coord systems exists Caller : general Status : At risk |
Arg [1] : none Example : $cs = $csa->fetch_top_level(); Description: Retrieves the toplevel pseudo coordinate system. Returntype : a Bio::EnsEMBL::Funcgen::CoordSystem object Exceptions : none Caller : general Status : At risk |
Arg [1] : Bio::EnsEMBL::CoordSystem $cs1 Arg [2] : Bio::EnsEMBL::CoordSystem $cs2 Example : foreach my $cs @{$csa->get_mapping_path($cs1,$cs2); Description: Given two coordinate systems this will return a mapping path between them if one has been defined. Allowed Mapping paths are explicitly defined in the meta table. The following is an example:
mysql> select * from meta where meta_key = 'assembly.mapping';
+---------+------------------+--------------------------------------+
| meta_id | meta_key | meta_value |
+---------+------------------+--------------------------------------+
| 20 | assembly.mapping | chromosome:NCBI34|contig |
| 21 | assembly.mapping | clone|contig |
| 22 | assembly.mapping | supercontig|contig |
| 23 | assembly.mapping | chromosome:NCBI34|contig|clone |
| 24 | assembly.mapping | chromosome:NCBI34|contig|supercontig |
| 25 | assembly.mapping | supercontig|contig|clone |
+---------+------------------+--------------------------------------+
For a one-step mapping path to be valid there needs to be
a relationship between the two coordinate systems defined in
the assembly table. Two step mapping paths work by building
on the one-step mapping paths which are already defined.
The first coordinate system in a one step mapping path must
be the assembled coordinate system and the second must be
the component.
Example of use:
my $cs1 = $cs_adaptor->fetch_by_name('contig');
my $cs2 = $cs_adaptor->fetch_by_name('chromosome');
my @path = @{$cs_adaptor->get_mapping_path($cs1,$cs2)};
if(!@path) {
print "No mapping path.";
}
elsif(@path == 2) {
print "2 step mapping path.";
print "Assembled = " . $path[0]->name() . "\n";
print "Component = " . $path[1]->name() . "\n";
} else {
print "Multi step mapping path\n";
}
Returntype : reference to a list of Bio::EnsEMBL::CoordSystem objects
Exceptions : none
Caller : general
Status : At risk |
Arg [1] : See BaseAdaptor for arguments (none specific to this subclass) Example : $cs = $db->get_CoordSystemAdaptor(); #better than new() Description: Creates a new CoordSystem adaptor and caches the contents of the coord_system table in memory. Returntype : Bio::EnsEMBL::Funcgen::DBSQL::CoordSystemAdaptor Exceptions : none Caller : Status : At risk |
Arg [1] : Bio::EnsEMBL::Funcgen::CoordSystem Example : $csa->store($coord_system); Description: Stores a CoordSystem object in the database. Returntype : none Exceptions : Warning if CoordSystem is already stored in this database. Caller : none Status : At risk |
Arg [1] : Bio::EnsEMBL::CoordSystem (could also be Funcgen::CoordSystem) Example : my $funcgen_cs = $csa->validate_coord_system($core_cs); Description: Given a CoordSystem retrieves the corresponding Funcgen CoordSystem or generates new one Returntype : Bio::EnsEMBL::Funcgen::CoordSystem Exceptions : throw if arg not valid and stored Caller : general Status : At risk - just have validate and let DBAdaptor store totally new CSs? |
Methods code
_fetch_all_by_attrib | description | prev | next | Top |
sub _fetch_all_by_attrib
{ my $self = shift;
my $attrib = shift;
my @coord_systems = ();
foreach my $dbID (keys %{$self->{"_is_$attrib"}}) {
push @coord_systems, $self->{"_dbID_cache"}->{$dbID};
}
return\@ coord_systems; } |
sub _fetch_by_attrib
{ my $self = shift;
my $attrib = shift;
my $version = shift;
$version = lc($version) if($version);
my @dbIDs = keys %{$self->{"_is_$attrib"}};
throw("No $attrib coordinate system defined") if(!@dbIDs);
foreach my $dbID (@dbIDs) {
my $cs = $self->{'_dbID_cache'}->{$dbID};
if($version) {
return $cs if(lc($version) eq $cs->version());
} elsif($self->{'_is_default_version'}->{$dbID}) {
return $cs;
}
}
if($version) {
throw("$attrib coord_system with version [$version] does not exist");
}
my $dbID = shift @dbIDs;
my $cs = $self->{'_dbID_cache'}->{$dbID};
my $v = $cs->version();
warning("No default version for $attrib coord_system exists. " .
"Using version [$v] arbitrarily");
return $cs; } |
sub fetch_all
{ my $self = shift;
throw('Not implement rank cache');
my @coord_systems;
foreach my $rank (sort {$a <=> $b} keys %{$self->{'_rank_cache'}}) {
push @coord_systems, $self->{'_rank_cache'}->{$rank};
}
return\@ coord_systems; } |
sub fetch_all_by_name
{ my $self = shift;
my $name = lc(shift);
throw('Name argument is required') if(!$name);
if($name eq 'seqlevel') {
return [$self->fetch_sequence_level()];
} elsif($name eq 'toplevel') {
return [$self->fetch_top_level()];
}
return $self->{'_name_cache'}->{$name} || []; } |
sub fetch_by_dbID
{ my $self = shift;
my $dbID = shift;
throw('dbID argument is required') if(!$dbID);
my $cs = $self->{'_dbID_cache'}->{$dbID};
return undef if(!$cs);
return $cs; } |
sub fetch_by_name
{ my $self = shift;
my $name = lc(shift);
my $version = lc(shift);
my $sbuild = $self->db->_get_schema_build($self->db->dnadb());
my $assembly = $self->db->get_CoordSystemAdaptor->fetch_by_name('chromosome')->version();
my ($cs, $found_cs);
throw('Mandatory argument\' name\'') if(! $name);
warn "Using dnadb(".$sbuild.") to acquire $name" if($name =~ /level/);
if($name eq 'seqlevel') {
return $self->fetch_sequence_level_by_schema_build($sbuild);
} elsif($name eq 'toplevel') {
return $self->fetch_top_level_by_schema_build($sbuild);
}
if(! exists($self->{'_name_cache'}->{$name})) {
if($name =~ /top/) {
warn("Did you mean 'toplevel' coord system instead of '$name'?");
} elsif($name =~ /seq/) {
warn("Did you mean 'seqlevel' coord system instead of '$name'?");
}
return undef;
}
my @coord_systems = @{$self->{'_name_cache'}->{$name}};
foreach $cs (@coord_systems) {
if($version) {
if(lc($cs->version()) eq $version){
$found_cs = $cs;
last;
}
}elsif($cs->version eq $assembly){
$found_cs = $cs;
last;
}
}
if(! $found_cs){
if($version) {
warn "No coord system found for $sbuild version '$version'";
return undef;
}else{
warn "Could not find default CoordSystem for '$sbuild', use next ranking?";
return undef
}
}
return $found_cs; } |
sub fetch_by_rank
{ my $self = shift;
my $rank = shift;
thrw('not implemented rank cache yet');
throw("Rank argument must be defined.") if(!defined($rank));
throw("Rank argument must be a non-negative integer.") if($rank !~ /^\d+$/);
if($rank == 0) {
return $self->fetch_top_level();
}
return $self->{'_rank_cache'}->{$rank}; } |
sub fetch_sequence_level
{ my $self = shift;
throw("Not yet implemented with schema_build");
my @dbIDs = keys %{$self->{'_is_sequence_level'}};
throw('No sequence_level coord_system is defined') if(!@dbIDs);
if(@dbIDs > 1) {
throw('Multiple sequence_level coord_systems are defined.' .
'Only one is currently supported');
}
return $self->{'_dbID_cache'}->{$dbIDs[0]}; } |
sub fetch_top_level
{ my $self = shift;
throw("Not yet implemented with schema_build");
return $self->{'_top_level'}; } |
sub get_mapping_path
{ my $self = shift;
my $cs1 = shift;
my $cs2 = shift;
if(!ref($cs1) || !ref($cs2) ||
!$cs1->isa('Bio::EnsEMBL::CoordSystem') ||
!$cs2->isa('Bio::EnsEMBL::CoordSystem')) {
throw('Two Bio::EnsEMBL::CoordSystem arguments expected.');
}
my $key1 = $cs1->name() . ":" . $cs1->version();
my $key2 = $cs2->name() . ":" . $cs2->version();
my $path = $self->{'_mapping_paths'}->{"$key1|$key2"};
return $path if($path);
$path = $self->{'_mapping_paths'}->{"$key2|$key1"};
if(!$path) {
my %mid1;
my %mid2;
foreach my $path (values(%{$self->{'_mapping_paths'}})) {
next if(@$path != 2);
my $match = undef;
if($path->[0]->equals($cs1)) {
$match = 1;
} elsif($path->[1]->equals($cs1)) {
$match = 0;
}
if(defined($match)) {
my $mid = $path->[$match];
my $midkey = $mid->name() . ':' . $mid->version();
if($mid2{$midkey}) {
my $path = [$cs1,$mid,$cs2];
$self->{'_mapping_paths'}->{"$key1|$key2"} = $path;
$key1 =~ s/\:$//;
$key2 =~ s/\:$//;
$midkey =~ s/\:$//;
warning("Using implicit mapping path between '$key1' and '$key2' " .
"coord systems.\n" .
"An explicit 'assembly.mapping' entry should be added " .
"to the meta table.\nExample: " .
"'$key1|$midkey|$key2'\n");
return $path;
} else {
$mid1{$midkey} = $mid;
}
}
$match = undef;
if($path->[0]->equals($cs2)) {
$match = 1;
} elsif($path->[1]->equals($cs2)) {
$match = 0;
}
if(defined($match)) {
my $mid = $path->[$match];
my $midkey = $mid->name() . ':' . $mid->version();
if($mid1{$midkey}) {
my $path = [$cs2,$mid,$cs1];
$self->{'_mapping_paths'}->{"$key2|$key1"} = $path;
$key1 =~ s/\:$//;
$key2 =~ s/\:$//;
$midkey =~ s/\:$//;
warning("Using implicit mapping path between '$key1' and '$key2' " .
"coord systems.\n" .
"An explicit 'assembly.mapping' entry should be added " .
"to the meta table.\nExample: " .
"'$key1|$midkey|$key2'\n");
return $path;
} else {
$mid2{$midkey} = $mid;
}
}
}
}
return $path || []; } |
sub new
{ my $caller = shift;
my $class = ref($caller) || $caller;
my $self = $class->SUPER::new(@_);
$self->{'_name_cache'} = {};
$self->{'_dbID_cache'} = {};
$self->{'_is_sequence_level'} = {};
$self->{'_is_default_version'} = {};
my $sql = 'SELECT coord_system_id, name, rank, version, attrib, schema_build, core_coord_system_id FROM coord_system';
my @args;
if($self->is_multispecies()) {
$sql.=' where species_id =?';
push(@args, $self->species_id());
}
$sql.=' order by coord_system_id';
my $sth = $self->prepare($sql);
$sth->execute(@args);
my ($dbID, $name, $rank, $version, $attrib, $sbuild, $ccs_id, $cs);
$sth->bind_columns(\$dbID,\$ name,\$ rank,\$ version,\$ attrib,\$ sbuild,\$ ccs_id);
while($sth->fetch()) {
my $seq_lvl = 0;
my $default = 0;
if($attrib) {
foreach my $attrib (split(',', $attrib)) {
$self->{"_is_$attrib"}->{$dbID} = 1;
if($attrib eq 'sequence_level') {
$seq_lvl = 1;
} elsif($attrib eq 'default_version') {
$default = 1;
}
}
}
if(! $cs || ($dbID != $cs->dbID())){
if($cs){
$self->{'_dbID_cache'}->{$cs->dbID()} = $cs;
$self->{'_name_cache'}->{lc($cs->name())} ||= [];
push @{$self->{'_name_cache'}->{lc($cs->name())}}, $cs;
}
$cs = Bio::EnsEMBL::Funcgen::CoordSystem->new
(-DBID => $dbID,
-ADAPTOR => $self,
-NAME => $name,
-VERSION => $version,
);
}
$cs->add_core_coord_system_info(
-RANK => $rank,
-SEQUENCE_LEVEL => $seq_lvl,
-DEFAULT => $default,
-SCHEMA_BUILD => $sbuild,
-CORE_COORD_SYSTEM_ID => $ccs_id,
-IS_STORED => 1,
);
}
if($cs){
$self->{'_dbID_cache'}->{$cs->dbID()} = $cs;
push @{$self->{'_name_cache'}->{lc($cs->name())}}, $cs;
}
$sth->finish();
return $self; } |
sub store
{ my $self = shift;
my $cs = shift;
if(!$cs || !ref($cs) || !$cs->isa('Bio::EnsEMBL::Funcgen::CoordSystem')) {
throw('CoordSystem argument expected.');
}
my $sth;
my $db = $self->db();
my $name = $cs->name();
my $version = $cs->version();
if($name eq 'toplevel' || $name eq 'seqlevel' || !$name) {
throw("[$name] is not a valid name for a storable CoordSystem.");
}
foreach my $sbuild(keys %{$cs->{'core_cache'}}){
my $rank = $cs->{'core_cache'}->{$sbuild}->{'RANK'};
my $seqlevel = $cs->{'core_cache'}->{$sbuild}->{'SEQUENCE_LEVEL'};
my $default = $cs->{'core_cache'}->{$sbuild}->{'DEFAULT'};
my $ccs_id = $cs->{'core_cache'}->{$sbuild}->{'CORE_COORD_SYSTEM_ID'};
if($cs->{'core_cache'}->{$sbuild}->{'IS_STORED'}) {
next;
}
if($rank !~ /^\d+$/) {
throw("Rank attribute must be a positive integer not [$rank]");
}
if($rank == 0) {
throw("Only toplevel CoordSystem may have rank of 0.");
}
my @attrib;
push @attrib, 'default_version' if($default);
push @attrib, 'sequence_level' if($seqlevel);
my $attrib_str = (@attrib) ? join(',', @attrib) : undef;
if(! $cs->dbID()){
$sth = $self->prepare('insert into coord_system (name, version, attrib, rank, schema_build, core_coord_system_id, species_id) values (?,?,?,?,?,?,?)');
$sth->bind_param(1, $name, SQL_VARCHAR);
$sth->bind_param(2, $version, SQL_VARCHAR);
$sth->bind_param(3, $attrib_str, SQL_VARCHAR);
$sth->bind_param(4, $rank, SQL_INTEGER);
$sth->bind_param(5, $sbuild, SQL_VARCHAR);
$sth->bind_param(6, $ccs_id, SQL_INTEGER);
$sth->bind_param(7, $self->species_id(), SQL_INTEGER);
$sth->execute();
my $dbID = $sth->{'mysql_insertid'};
$sth->finish();
if(!$dbID) {
throw("Did not get dbID from store of CoordSystem.");
}
$cs->dbID($dbID);
$cs->adaptor($self);
}else{
my $sql = 'insert into coord_system (coord_system_id, name, version, attrib, rank, schema_build, core_coord_system_id, species_id) values (?,?,?,?,?,?,?,?)';
$sth = $db->dbc->prepare($sql);
$sth->bind_param(1, $cs->dbID(), SQL_INTEGER);
$sth->bind_param(2, $name, SQL_VARCHAR);
$sth->bind_param(3, $version, SQL_VARCHAR);
$sth->bind_param(4, $attrib_str, SQL_VARCHAR);
$sth->bind_param(5, $rank, SQL_INTEGER);
$sth->bind_param(6, $sbuild, SQL_VARCHAR);
$sth->bind_param(7, $ccs_id, SQL_INTEGER);
$sth->bind_param(8, $self->species_id(), SQL_INTEGER);
$sth->execute();
$sth->finish();
}
$cs->{'core_cache'}{$sbuild}{'IS_STORED'} = 1;
}
$self->{'_name_cache'}->{lc($name)} ||= [];
$self->{'_dbID_cache'}->{$cs->dbID()} = $cs;
my $push = 1;
foreach my $name_cs(@{$self->{'_name_cache'}->{lc($name)}}){
if($name_cs->version() eq $cs->version()){
$push = 0;
$name_cs = $cs;
}
}
push @{$self->{'_name_cache'}->{lc($name)}}, $cs if $push;
return $cs; } |
sub validate_and_store_coord_system
{ my ($self, $cs) = @_;
if(! (ref($cs) && $cs->isa('Bio::EnsEMBL::CoordSystem') && $cs->dbID())){
throw('Must provide a valid stored Bio::EnsEMBL::CoordSystem');
}
my $sbuild = $self->db->_get_schema_build($cs->adaptor->db());
my $fg_cs = $self->fetch_by_name($cs->name(), $cs->version());
my $version;
if(! $fg_cs){
if($cs->name ne 'clone' && (! $cs->version)){
my $tmp_cs = $cs->adaptor->fetch_by_name('chromosome');
$version = $tmp_cs->version;
}
$fg_cs = Bio::EnsEMBL::Funcgen::CoordSystem->new(
-NAME => $cs->name(),
-VERSION => $version || $cs->version(),
);
warn "Created new CoordSystem:\t".$fg_cs->name().":".$fg_cs->version()."\n";
}
if(! $fg_cs->contains_schema_build($sbuild)){
$fg_cs->add_core_coord_system_info(
-RANK => $cs->rank(),
-SEQUENCE_LEVEL => $cs->is_sequence_level(),
-DEFAULT => $cs->is_default(),
-SCHEMA_BUILD => $sbuild,
-CORE_COORD_SYSTEM_ID => $cs->dbID(),
-IS_STORED => 0,
);
eval { $fg_cs = $self->store($fg_cs) };
if($@){
warning("$@\nYou do not have permisson to store the CoordSystem for schema_build $sbuild\n".
"Using comparable CoordSystem:\t".$fg_cs->name.':'.$fg_cs->version."\n");
}
}
return $fg_cs;
}
1; } |
General documentation
This module was written by Nathan Johnson, based on the core CoordSystemAdaptor
written by Graham McVicker.
Arg [1] :
Example :
Description:
Returntype :
Exceptions :
Caller :
Status : At risk
_fetch_all_by_attribute | Top |
Arg [1] :
Example :
Description:
Returntype :
Exceptions :
Caller :
Status : At risk