$uni = Bio::DB::Universal->new();
# by default connects to web databases. We can also
# substitute local databases
$embl = Bio::Index::EMBL->new( -filename => '/some/index/filename/locally/stored');
$uni->use_database('embl',$embl);
# treat it like a normal database. Recognises strings
# like gb|XXXXXX and embl:YYYYYY
$seq1 = $uni->get_Seq_by_id("embl:HSHNRNPA");
$seq2 = $uni->get_Seq_by_acc("gb|A000012");
# with no separator, tries to guess database. In this case the
# _ is considered to be indicative of swissprot
$seq3 = $uni->get_Seq_by_id('ROA1_HUMAN');
Artificial database that delegates to specific databases, with a
"smart" (well, smartish) guessing routine for what the ids. No doubt
the smart routine can be made smarter.
The hope is that you can make this database and just throw ids at it -
for most easy cases it will sort you out. Personally, I would be
making sure I knew where each id came from and putting it into its own
database first - but this is a quick and dirty solution.
By default this connects to web orientated databases, with all the
reliability and network bandwidth costs this implies. However you can
subsistute your own local databases - they could be Bio::Index
databases (DBM file and flat file) or bioperl-db based (MySQL based)
or biocorba-based (whatever you like behind the corba interface).
Internally the tags for the databases are
genbank - ncbi dna database
embl - ebi's dna database (these two share accession number space)
swiss - swissprot + sptrembl (EBI's protein database)
We should extend this for RefSeq and other sequence databases which
are out there... ;)
Inspired by Lincoln Stein, written by Ewan Birney.
sub get_Seq_by_acc
{ my ($self,$str) = @_;
my ($tag,$id) = $self->guess_id($str);
return $self->{'db_hash'}->{$tag}->get_Seq_by_acc($id); } |
sub get_Seq_by_id
{ my ($self,$str) = @_;
my ($tag,$id) = $self->guess_id($str);
return $self->{'db_hash'}->{$tag}->get_Seq_by_id($id); } |
sub guess_id
{ my ($self,$str) = @_;
if( $str =~ /(\S+)[:|\/;](\w+)/ ) {
my $tag;
my $db = $1;
my $id = $2;
if( $db =~ /gb/i || $db =~ /genbank/i || $db =~ /ncbi/i ) {
$tag = 'genbank';
} elsif ( $db =~ /embl/i || $db =~ /emblbank/ || $db =~ /^em/i ) {
$tag = 'embl';
} elsif ( $db =~ /swiss/i || $db =~ /^sw/i || $db =~ /sptr/ ) {
$tag = 'swiss';
} else {
$self->throw("Could not guess database type $db from $str");
}
return ($tag,$id);
} else {
my $tag;
if( $str =~ /_/ ) {
$tag = 'swiss';
} elsif ( $str =~ /^[QPR]\w+\d$/ ) {
$tag = 'swiss';
} elsif ( $str =~ /[A-Z]\d+/ ) {
$tag = 'genbank';
} else {
$tag = 'genbank';
}
return ($tag,$str);
} } |
sub new
{ my ($class) = @_;
my $self = {};
bless $self,$class;
$self->{'db_hash'} = {};
$self->use_database('embl',Bio::DB::EMBL->new);
$self->use_database('genbank',Bio::DB::GenBank->new);
$self->use_database('swiss',Bio::DB::GenBank->new);
return $self; } |
sub use_database
{ my ($self,$name,$database) = @_;
$self->{'db_hash'}->{$name} = $database; } |
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bio.perl.org
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email
or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _