Bio::LiveSeq::IO
SRS
Toolbar
Summary
Bio::LiveSeq::IO::SRS - Loader for LiveSeq from EMBL entries with SRS
Package variables
Globals (from "use vars" definitions)
$VERSION = 2.4
Included modules
Carp qw ( cluck croak carp )
lib $ENV { SRSEXE }
srsperl
Inherit
Synopsis
my $db="EMBL";
my $acc_id="M20132";
my $query="embl-acc:$acc_id";
my $loader=Bio::LiveSeq::IO::SRS->load(-db=>"EMBL", -query=>"$query");
my @translationobjects=$loader->entry2liveseq();
my $gene="AR";
my $gene=$loader->gene2liveseq("gene");
NOTE: The only -db now supported is EMBL. Hence it defaults to EMBL.
Description
This package uses the SRS (Sequence Retrieval System) to fetch a sequence
database entry, analyse it and create LiveSeq objects out of it.
An embl-acc ID has to be passed to this package which will return references
to all translation objects created from the EMBL entry.
References to Transcription, DNA and Exon objects can all be retrieved departing
from these.
Alternatively, a specific "gene" name can be specified, together with the
embl-acc ID. This will create a LiveSeq::Gene object with all relevant gene
features attached/created.
Methods
Methods description
Title : embl2hash Function: retrieves with SRS an EMBL entry, parses it and creates a hash that contains all the information. Returns : a reference to a hash Errorcode: 0 Args : an SRS query resulting in one fetched EMBL entry i.e. a string in a form like "embl-acc:accession_number" two array references to skip features and qualifiers (for performance) Example: @valid_features=qw(CDS exon prim_transcript mRNA); @valid_qualifiers=qw(gene codon_start db_xref product rpt_family); $hashref=&embl2hash("$query",\@valid_features,\@valid_qualifiers); |
Title : get_swisshash Usage : $loader->get_swisshash(); Example : $swisshash=$loader->swissprot2hash("SWISS-PROT:P10275") Function: retrieves with SRS a SwissProt entry, parses it and creates a hash that contains all the information. Returns : a reference to a hash Errorcode: 0 Args : the db_xref qualifier's value from an EMBL CDS Feature i.e. a string in the form "SWISS-PROT:accession_number" Note : this can be modified (adding a second argument) to retrieve and parse SWTREMBL, SWALL... entries |
Title : load Usage : my $acc_id="M20132"; my $query="embl-acc:$acc_id"; $loader=Bio::LiveSeq::IO::SRS->load(-db=>"EMBL", -query=>"$query");
Function: loads an entry with SRS from a database into a hash
Returns : reference to a new object of class IO::SRS holding an entry
Errorcode 0
Args : an SRS query resulting in one fetched EMBL (by default) entry |
Title : novelaasequence2gene Usage : $gene=Bio::LiveSeq::IO::SRS->novelaasequence2gene(-aasequence => "MGLAAPTRS*"); : $gene=Bio::LiveSeq::IO::SRS->novelaasequence2gene(-aasequence => "MGLAAPTRS*", -genome => "Homo sapiens"); : $gene=Bio::LiveSeq::IO::SRS->novelaasequence2gene(-aasequence => "MGLAAPTRS*", -genome => "Mitochondrion Homo sapiens", -gene_name => "tyr-kinase");
Function: creates LiveSeq objects from a novel amino acid sequence,
using codon usage database to choose codons according to
relative frequencies.
If a genome latin name is not specified, the default is to use
'Homo sapiens' (taxonomy ID 9606).
Returns : reference to a Gene object containing references to LiveSeq objects
Errorcode 0
Args : string containing an amino acid sequence
string (optional) with a species/genome latin name
string specifying a gene name
Note : SRS access to TAXON and CODONUSAGE databases is required |
Title : swissprot2hash Usage : $loader->swissprot2hash(); Example : $swisshash=$loader->swissprot2hash("swissprot-acc:P10275") Function: retrieves with SRS a SwissProt entry, parses it and creates a hash that contains all the information. Returns : a reference to a hash Errorcode: 0 Args : an SRS query resulting in one entry from SwissProt database i.e. a string in the form "swissprot-acc:accession_number" Note : this can be modified (adding a second argument) to retrieve and parse SWTREMBL, SWALL... entries |
Methods code
cdslocation2transcript | description | prev | next | Top |
sub cdslocation2transcript
{ my ($location)=@_;
my @exonlocs;
my $exonloc;
my @exon;
my @transcript=();
$location =~ s/^join\(//;
$location =~ s/\)$//;
@exonlocs = split (/\,/,$location);
for $exonloc (@exonlocs) {
my @exon=location2range($exonloc);
push (@transcript,\@exon);
}
return (@transcript);
}
} |
sub embl2hash
{ my ($i,$j);
my $query=$_[0];
my %valid_features; my %valid_names;
if ($_[1]) {
%valid_features = map {$_, 1} @{$_[1]}; }
if ($_[2]) {
%valid_names = map {$_, 1} @{$_[2]}; }
my $sess = new Session;
my $set = $sess->query("[$query]", "");
my $numEntries=$set->size();
if ($numEntries < 1) {
carp "No sequence entry found for the query $query";
return (0);
} elsif ($numEntries > 1) {
carp "Too many entries found for the input given";
return (0);
} else {
my $entry = $set->getEntry(0);
my ($entry_ID,$entry_AccNumber,$entry_Molecule,$entry_Description,$entry_Organism,$entry_SeqLength,$entry_Sequence,$entry_Division);
$entry_ID = $entry->fieldToString("id","","","");
$entry_AccNumber = $entry->fieldToString("acc","","","");
$entry_Molecule = $entry->fieldToString("mol","","","");
$entry_Division = $entry->fieldToString("div","","","");
$entry_Description = $entry->fieldToString("des","","","");
$entry_Description =~ s/\n/ /g;
$entry_Organism = $entry->fieldToString("org","","","");
$entry_SeqLength = $entry->fieldToString("sl","","","");
my $loadedentry = $entry->load("EMBL");
$entry_Sequence = $loadedentry->attrStr("sequence");
$entry_Sequence =~ s/\n//g;
my %entryhash;
$entryhash{ID}=$entry_ID;
$entryhash{AccNumber}=$entry_AccNumber;
$entryhash{Molecule}=$entry_Molecule;
$entryhash{Division}=$entry_Division;
$entryhash{Description}=$entry_Description;
$entryhash{Organism}=$entry_Organism;
$entryhash{Sequence}=$entry_Sequence;
$entryhash{SeqLength}=$entry_SeqLength;
my $features = $loadedentry->attrObjList("features");
my $featuresnumber= $features->size();
$entryhash{FeaturesNumber}=$featuresnumber;
my ($feature,$feature_name,$feature_location);
my ($feature_qual_names,$feature_qual_values);
my ($feature_qual_name,$feature_qual_value);
my ($feature_qual_number,$feature_qual_number2);
my @features;
for $i (0..$featuresnumber-1) {
my %feature;
$feature = $features->get($i);
$feature_name = $feature->attrStr("FtKey");
unless ($valid_features{$feature_name}) {
next;
}
$feature_location = $feature->attrStr("FtLocation");
$feature_location =~ s/\n//g;
$feature_qual_names= $feature->attrStrList("FtQualNames");
$feature_qual_values= $feature->attrStrList("FtQualValues");
$feature_qual_number= $feature_qual_names->size();
$feature_qual_number2= $feature_qual_values->size(); if ($feature_qual_number > $feature_qual_number2) {
carp ("Warning with Feature at position $i ($feature_name): There are MORE qualifier names than qualifier values.");
} elsif ($feature_qual_number < $feature_qual_number2) {
carp ("Error with Feature at position $i ($feature_name): There are LESS qualifier names than qualifier values. Stopped");
return (0);
}
$feature{position}=$i;
$feature{name}=$feature_name;
my @range;
if ($feature_name eq "CDS") {
@range=cdslocation2transcript($feature_location);
$feature{locationtype}="joined";
} else {
if (index($feature_location,"join") == -1) {
@range=location2range($feature_location);
} else { @range=joinedlocation2range($feature_location);
$feature{locationtype}="joined";
}
}
$feature{range}=\@range;
$feature{location}="deprecated";
my %feature_qualifiers;
for $j (0..$feature_qual_number-1) {
$feature_qual_name=$feature_qual_names->get($j);
$feature_qual_name =~ s/^\///;
unless ($valid_names{$feature_qual_name}) {
next;
}
$feature_qual_value=$feature_qual_values->get($j);
$feature_qual_value =~ s/^"//; $feature_qual_value =~ s/"$//; $feature_qual_value =~ s/\n//g; $feature_qualifiers{$feature_qual_name}=$feature_qual_value;
}
$feature{qualifiers}=\%feature_qualifiers;
push (@features,\%feature); }
$entryhash{Features}=\@features; my @cds; for $i (0..$#features) {
if ($features[$i]->{'name'} eq "CDS") {
push(@cds,$features[$i]);
}
}
$entryhash{CDS}=\@cds; return (\%entryhash);
}
}
} |
sub get_swisshash
{ my ($self,$query)=@_;
if (index($query,"SWISS-PROT") == -1) {
return (0);
}
$query =~ s/SWISS-PROT/swissprot-acc/;
my $hashref;
eval {
$hashref=&swissprot2hash("$query");
};
my $errormsg;
if ($@) {
foreach $errormsg (split(/\n/,$@)) {
if (index($errormsg,"in cleanup") == -1) {
carp "SRS Swissprot Loader: $@";
}
}
}
unless ($hashref) {
return (0);
} else {
return ($hashref);
} } |
sub joinedlocation2range
{ &cdslocation2transcript; } |
sub load
{ my ($thing, %args) = @_;
my $class = ref($thing) || $thing;
my ($obj,%loader);
my ($db,$query)=($args{-db},$args{-query});
if (defined($db)) {
unless ($db eq "EMBL") {
carp "Note: only EMBL now supported!";
return(0);
}
} else {
$db="EMBL";
}
my $hashref;
if ($db eq "EMBL") {
my $test_transl=0;
my @embl_valid_feature_names=qw(CDS exon prim_transcript intron repeat_unit repeat_region mRNA);
my @embl_valid_qual_names=qw(gene codon_start db_xref product note number rpt_family transl_table);
if ($test_transl) {
push (@embl_valid_qual_names,"translation"); }
eval {
$hashref=&embl2hash("$query",\@embl_valid_feature_names,\@embl_valid_qual_names);
};
my $errormsg;
if ($@) {
foreach $errormsg (split(/\n/,$@)) {
if (index($errormsg,"in cleanup") == -1) {
carp "SRS EMBL Loader: $@";
}
}
}
}
unless ($hashref) { return (0); }
%loader = (db => $db, query => $query, hash => $hashref);
$obj =\% loader;
$obj = bless $obj, $class;
return $obj; } |
sub location2range
{ my ($location)=@_;
my ($start,$end,$strand);
if (index($location,"complement") == -1) { $strand=1;
} else { $location =~ s/complement\(//g;
$location =~ s/\)//g;
$strand=-1;
}
$location =~ s/\<//g;
$location =~ s/\>//g;
my @range=split(/\.\./,$location);
if (scalar(@range) == 1) { $start=$end=$range[0];
} else {
if ($strand == 1) {
($start,$end)=@range;
} else { ($end,$start)=@range;
}
}
return ($start,$end,$strand);
}
} |
sub novelaasequence2gene
{ my ($self, %args) = @_;
my ($gene_name,$species_name,$aasequence)=($args{-gene_name},$args{-genome},$args{-aasequence});
unless ($aasequence) {
carp "aasequence not given";
return (0);
}
unless ($gene_name) {
$gene_name="Novel Unknown";
}
unless ($species_name) {
$species_name="Homo sapiens";
}
my $sess = new Session;
my ($e,$numEntries,$set);
my $codonusage_usage;
my @species_codon_usage;
$set = $sess->query("[codonusage:'$species_name']", "");
$numEntries = $set->size();
if ($numEntries > 0) {
$e = $set->getEntry(0);
$species_name = $e->fieldToString("id","","","");
$codonusage_usage = $e->fieldToString("usage","","","");
@species_codon_usage=split(/\s/,$codonusage_usage); if ($numEntries > 1) {
carp "Search in Codon Usage DB resulted in $numEntries results. Taking the first one: $species_name";
}
} else {
carp "Genome not found in codon usage DB.";
return (0);
}
my $mito_flag = 0;
my $species_origin;
if ($species_name =~ /^Mitochondrion /) {
$mito_flag = 1;
$species_origin = $species_name;
$species_origin =~ s/^Mitochondrion //;
$set = $sess->query("[taxonomy-species:'$species_origin']", "");
} elsif ($species_name =~ /^Chloroplast |^Kinetoplast |^Chromoplast /) {
$species_origin = $species_name;
$species_origin =~ s/^Chromoplast //;
$species_origin =~ s/^Kinetoplast //;
$species_origin =~ s/^Chloroplast //;
$set = $sess->query("[taxonomy-species:'$species_origin']", "");
} else {
$set = $sess->query("[taxonomy-species:'$species_name']", "");
}
$numEntries = $set->size();
my ($taxonomy_id,$taxonomy_gc,$taxonomy_mgc,$taxonomy_name);
if ($numEntries > 0) {
$e = $set->getEntry(0);
$taxonomy_id = $e->fieldToString("id","","","");
$taxonomy_name = $e->fieldToString("species","","","");
$taxonomy_gc = $e->fieldToString("gc","","","");
$taxonomy_mgc = $e->fieldToString("mgc","","","");
if ($numEntries > 1) {
carp "Note! More than one entry found in taxonomy DB for the genome query given. Using the first one: $taxonomy_name ($taxonomy_id)";
}
} else {
carp "Genome not found in taxonomy DB.";
return (0);
}
my $ttabid;
if ($mito_flag) {
$ttabid = $taxonomy_mgc;
} else {
$ttabid = $taxonomy_gc;
}
my $gene=Bio::LiveSeq::IO::Loader::_common_novelaasequence2gene(\@species_codon_usage,$ttabid,$aasequence,$gene_name);
return ($gene);
}
1; } |
sub swissprot2hash
{ my ($i,$j);
my $query=$_[0];
my $sess = new Session;
my $set = $sess->query("[$query]", "");
my $numEntries = $set->size();
if ($numEntries < 1) {
carp "No sequence entry found for the query $query";
return (0);
} elsif ($numEntries > 1) {
carp "Too many entries found for the input given";
return (0);
} else {
my $entry = $set->getEntry(0);
my ($entry_ID,$entry_AccNumber,$entry_Molecule,$entry_Description,$entry_Organism,$entry_SeqLength,$entry_Sequence,$entry_Gene);
$entry_ID = $entry->fieldToString("id","","","");
$entry_AccNumber = $entry->fieldToString("acc","","","");
$entry_Gene = $entry->fieldToString("gen","","","");
$entry_Gene =~ s/\n/ /g;
$entry_Description = $entry->fieldToString("des","","","");
$entry_Description =~ s/\n/ /g;
$entry_Organism = $entry->fieldToString("org","","","");
chop $entry_Organism;
$entry_SeqLength = $entry->fieldToString("sl","","","");
my $loadedentry = $entry->load("Swissprot");
$entry_Sequence = $loadedentry->attrStr("Sequence");
$entry_Sequence =~ s/\n//g;
my %entryhash;
$entryhash{ID}=$entry_ID;
$entryhash{AccNumber}=$entry_AccNumber;
$entryhash{Molecule}=$entry_Molecule;
$entryhash{Gene}=$entry_Gene;
$entryhash{Description}=$entry_Description;
$entryhash{Organism}=$entry_Organism;
$entryhash{Sequence}=$entry_Sequence;
$entryhash{SeqLength}=$entry_SeqLength;
my $features = $loadedentry->attrObjList("Features");
my $featuresnumber= $features->size();
$entryhash{FeaturesNumber}=$featuresnumber;
my ($feature,$feature_name,$feature_description,$feature_location);
my @features;
for $i (0..$featuresnumber-1) {
my %feature;
$feature = $features->get($i);
$feature_name = $feature->attrStr("FtKey");
$feature_location = $feature->attrStr("FtLocation");
$feature_location =~ s/ +/ /g;
$feature_description = $feature->attrStr("FtDescription");
chop $feature_description;
$feature_description =~ s/\nFT / /g;
$feature{position}=$i;
$feature{name}=$feature_name;
$feature{location}=$feature_location;
$feature{description}=$feature_description;
push (@features,\%feature); }
$entryhash{Features}=\@features; return (\%entryhash);
} } |
General documentation
AUTHOR - Joseph A.L. Insana | Top |
Email:
Insana@ebi.ac.uk,
jinsana@gmx.netAddress:
EMBL Outstation, European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambs. CB10 1SD, United Kingdom
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _