BioMart
DatasetI
Toolbar
Summary
BioMart::DatasetI
Package variables
Typeglobs (from "local" definitions)
$Storable::Deparse = 1
Privates (from "my" definitions)
$logger = Log::Log4perl->get_logger(__PACKAGE__)
Included modules
Cwd
DBI
Data::Dumper
Digest::MD5
Log::Log4perl
Storable qw ( store retrieve freeze nfreeze thaw )
XML::Simple qw ( :strict )
Inherit
Synopsis
BioMart::DatasetI objects provide a flexible framework to
allow almost any data source to be made into a BioMart dataset.
This is done by mapping the various attributes and filters into
a ConfigurationTree object, and creating any exportables and
importables that Link a given dataset to other datasets.
Description
The BioMart::DatasetI interface allows any data source
with attributes and filters to be made into a BioMart
dataset, regardless of its underlying access mode (RDBMS,
file system, etc). Each implementation must provide access
to a BioMart::Configuration::ConfigurationTree object
describing its attributes and filters (see perlpod of
BioMart::Configuration::ConfigurationTree object for more
details).
Implementations can also export attributes to other datasets
to be used as filters in those systems, using the Links API.
This provides a way of functionally linking two datasets
together using a common name. Exporting datasets should link
this name to a BioMart::Configuration::AttributeList object
containing one or more BioMart::Configuration::Attribute objects
representing the 'Exportable' for this dataset.
The AttributeList is added to the query targeted for the
exporting subsysytem by the QueryRunner to satisfy the Link
requirements. Importing datasets should link this name to
a BioMart::Configuration::FilterList object representing the
'Importable' for the importing dataset.
The QueryRunner will set the FilterList Table to the
ResultTable from the exporting dataset, and add it to the
query targeted to the importing dataset. This allows two
datasets to implicitly define ways to chain queries between
them.
Methods
Methods description
Usage : my $dataset = BioMart::Dataset::ImpName->new( 'name' => $name, 'display_name' => $display_name, 'configurator' => $configurator, 'initial_batchsize' => 100, 'max_batchsize' => 100000, 'visible' => $visible_boolean, 'version' => $version, 'database' => $instanceName, 'schema' => $schema, 'virtualSchema' => $virtualSchema, 'serverType' => $serverType, 'interfaces' => $interfaces, 'modified' => $modified );
Description: Creates a new instance of a BioMart::Dataset
implementation object named by ImpName. Requires
a name used as the key to this dataset in
the BioMart::Registry, and a diplay_name used
in any User Interfaces. Also requires a
reference to a BioMart::Configurator object.
Initial_batchsize must be set to determine the
number of rows to return in all initial batches
unless explicitly overridden with a batch_size
on getResultTable. Max_batchsize must
be set to limit the number of rows returned in any
batch of a batched query. This should be large
enough so that the system can initiate a query
with a small batch_size, allowing quick response,
while ResultTable steps its subsequent request
batch_size up after each call to get a batch.
The visible flag instructs Graphical Interfaces
to present (or not) the dataset as a choice to
users. Non visible datasets are loaded in the
background, transparent to the user. The version
is used to present visible datasets to the
user (not required for non-visible datasets).
Dataset creation and management is done by the
BioMart::Configurator object. Users should not
need to create DatasetI implementing objects
manually.
Returntype : new BioMart::Dataset implementing object.
Exceptions : Missing or invalid Required Parameters.
Implementation specific Exceptions.
Caller : BioMart::Configurator |
Description : Private interface method to reset any inter-batch state variables to their initial settings. Also sets the exhausted flag to false. Called at the beginning of every new Query recieved (eg, when no BioMart::ResultTable object has been passed into the call to_getResultTable). This method is only required for those DatasetI implementations which maintain state over the extent of a batched query. Implementations needing to reset their state should implement a '__processNewQuery()' method. If it is not implemeted, it does not throw an unimplemented method exception. |
Usage : set the displayName: $ds->displayName($name);
get the displayName:
$ds->displayName;
Description: sets or gets the displayName of the given dataset.
Returntype : scalar $dispname
Exceptions : none
Caller : caller |
Usage : if ($dset->exportableFrom) { ... } Description : Determine if this Dataset can export to other Datasets Returntype : boolean Exception : na Caller : BioMart::QueryRunner |
Usage : set the forceHash setting: $ds->forceHash($name); get the forceHash setting: $ds->forceHash; Description: sets or gets the forceHash of the given dataset. This is used when linking placeholder attributes from a first visible dataset into a second visible dataset. As a natural link does not exist the hashing and merging of attributes has to be forced Returntype : scalar $forceHash Exceptions : none Caller : caller |
Usage : $confTree = $subsys->getAllConfigurationTrees($interface OPTIONAL)
Description: Returns a
BioMart::Configuration::ConfigurationTree
object with all attributes and filters that are
supported by the given dataset for the given interface type.
Returntype : BioMart::Configuration::ConfigurationTree
Exceptions : none
Caller : caller |
Usage : $confTree = $subsys->getConfigurationTree($interface, $dsCounter);
Description: Returns a
BioMart::Configuration::ConfigurationTree
object with all attributes and filters that are
supported by the given dataset for the given interface type.
Returntype : BioMart::Configuration::ConfigurationTree
Exceptions : none
Caller : caller |
Usage : my $count = $subsys->getCount( 'query' => $query, );
Description: Executes a BioMart::Query and returns
a count.
Each Implementation must implement a
_getCount method. It must take
the same parameters as DatasetI->getCount
itself.
Currently, queries involving importables from
other visible datasets which are batched are
not countable. These will return -1 as the count.
Returntype : String $count
Exceptions : Missing or invalid Query object. Unsupported
Attribute/Filter request, invalid Link requests,
Implementation specific Exceptions.
Caller : caller |
Usage : $self->getDirPath() $self->setDirPath('/abc/def/'); To set
Description: get the path to the folder taht contains registry file, where confTrees, _portables, XML directories live
Return type: A string
Exceptions : none
Caller : caller |
Usage : my $exportables = $subsys->getExportables; my $exportables = $subsys->getExportables($linkName);
Description: Returns an array_ref of BioMart::Configuration::AttributeList
objects representing the exportables for the given
dataset, either across all Links, or for a given $LinkName.
Returntype : array_ref of BioMart::Configuration::AttributeList
Exceptions : none
Caller : caller |
Usage : my $importables = $subsys->getImportables; my $importables = $subsys->getImportables($linkName);
Description: Returns an array_ref of BioMart::Configuration::FilterList
objects representing the importables of the given
dataset, either across all Links, or for a given $linkName.
Returntype : array_ref of BioMart::Configuration::FilterList objects
Exceptions : none
Caller : caller |
Usage : $DS->getMode() $DS->setMode('LAZYLOAD'); To set
Description: get the mode, default to MEMORY
Return type: A string
Exceptions : none
Caller : caller |
Usage : fully specified for retrieving a batch of 1000 records, starting with record 100 (eg. 100 - 999) into an existing ResultTable:
my $rTable = $subsys->getResultTable(
'query' => $query,
'batch_start' => 100,
'batch_size' => 1000,
'table' => $rtable
);
minimal, returns all results for a given query as
a BioMart::ResultTable object:
my $rTable = $subsys->getResultTable(
'query' => $query
);
get all rows, starting from record 150
as a BioMart::ResultTable:
my $rTable = $subsys->getResultTable(
'query' => $query,
'batch_start' => 150
);
get only the first 1000 rows (same as
batch_start = 0, batch_size = 1000)
into an existing BioMart::AttributeTable:
my $rTable = $subsys->getResultTable(
'query' => $query,
'batch_size' => 1000,
'table' => $atable
);
Description: Executes a BioMart::Query and returns
a BioMart::ResultTable object.
If a reference to an existing
BioMart::ResultTable (or BioMart::AttributeTable)
is passed in the 'table' parameter,
it is modified and returned, otherwise
a new BioMart::ResultTable object is
created and returned.
Each Implementation must implement a
_getResultTable method. It must take
the same parameters as DatasetI->getResultTable
itself, although it will always recieve a 'table'
parameter, so it will never need to create a new
ResultTable object.
Returntype : BioMart::ResultTable, or undef if Dataset is exhausted for a
batched query
Exceptions : Missing or invalid Query object. Unsupported
Attribute/Filter request, invalid Link requests,
Implementation specific Exceptions.
Caller : caller |
Usage : if ($dset->importableTo) { ... } Description : Determine if this Dataset can import from other Datasets Returntype : boolean Exception : na Caller : BioMart::QueryRunner |
Usage : my $init_batchsize = $dset->initialBatchSize Description : Sets/gets the initialBatchSize on the Dataset Returntype : initialBatchSize or null Exceptions : na Caller : BioMart::QueryRunner |
Usage : set the interfaces: $ds->interfaces($name); get the interfaces: $ds->interfaces; Description: sets or gets the interfaces of the given dataset. Returntype : scalar $interfaces Exceptions : none Caller : caller |
Usage : if ($dset->locationDisplayName) { .. } Description : Sets/gets BioMart::Location's display name for a dataset Returntype : scalar $locationDisplayName Exceptions : na Caller : BioMart::QueryRunner |
Usage : if ($dset->locationName) { .. } Description : Sets/gets BioMart::Location's name for a dataset Returntype : scalar $locationName Exceptions : na Caller : BioMart::QueryRunner |
Usage : my $max_batchsize = $dset->maxBatchSize Description : Sets/gets the maxBatchSize on the Dataset Returntype : maxBatchSize or null Exceptions : na Caller : BioMart::QueryRunner |
Usage : set the modified: $ds->modified($name);
get the modified:
$ds->modified;
Description: sets or gets the modified date time of the given dataset.
Returntype : scalar $modified TIME DATE STAMP
Exceptions : none
Caller : caller |
Usage : set the name: $subsys->name($name);
get the name:
$subsys->name;
Description: sets or gets the name of the given dataset.
Returntype : scalar $name
Exceptions : none
Caller : caller |
Usage : set the database schema of a Dataset: $ds->schema($name); get the database schema of a Dataset: $ds->schema(); Description: sets or gets the database schema of the given dataset. Returntype : scalar $schema Exceptions : none Caller : caller |
Usage : set the serverType: $ds->serverType($name); get the serverType: $ds->serverType; Description: sets or gets the serverType of the given dataset. Returntype : scalar $server_type Exceptions : none Caller : caller |
Usage : $confTree = $subsys->setConfigurationTree($interface, $dsCounter);
Description: Stores to Disk
BioMart::Configuration::ConfigurationTree
object with all attributes and filters that are
supported by the given dataset for the given interface type.
Returntype : none
Exceptions : none
Caller : caller |
Usage : $self->getDirPath() $self->setDirPath('/abc/def/'); To set
Description: get the path to the folder taht contains registry file, where confTrees, _portables, XML directories live
Return type: none
Exceptions : none
Caller : caller |
Usage : $DS->setImportables($interface, 'LAZYLOAD')
Description: Stores the importables associated with datasets
to disk, and sets its value to LAZYLOAD
Returntype : none
Exceptions : none
Caller : caller |
Usage : $DS->getMode() $DS->setMode('LAZYLOAD'); To set
Description: get the mode, default to MEMORY
Return type: none
Exceptions : none
Caller : caller |
Usage :
Description: Helper rouinte for Array comparison by converting
it to string first. Comparing arrays as "@A" eq "@B"
brings flood of warning
Returntype : String
Exceptions : none
Caller : caller |
Usage : if ($dset->version) { .. } Description : Sets/gets version for a dataset Returntype : version Exceptions : na Caller : BioMart::QueryRunner |
Usage : set the virtualSchema: $ds->virtualSchema($name);
get the virtualSchema:
$ds->virtualSchema;
Description: sets or gets the virtualSchema of the given dataset.
Returntype : scalar $virtualSchema or undef
Exceptions : none
Caller : caller |
Usage : if ($dset->visible) { .. } Description : Sets/gets visible for a dataset Returntype : boolean, true if visible, false otherwise Exceptions : na Caller : BioMart::QueryRunner |
Methods code
GenomicMAlignHack | description | prev | next | Top |
sub GenomicMAlignHack
{ my ($self, $val) = @_;
if ($val) {
$self->set('GenomicMAlignHack', $val);
}
return $self->get('GenomicMAlignHack');
}
1; } |
sub _attributeMerge
{ my ($self,$rtable,$importable_size,$linkName, $query) = @_;
$logger->debug("Importable size: $importable_size");
$logger->debug("Link name: $linkName");
my $sequenceType = 'none';
my %prev_dset_hash = %{$self->get('attributeHash')->{$linkName}};
my %this_dset_hash;
my $rows = $rtable->getRows();
HASHROW:foreach my $row(@{$rows}){
my $key_string = '';
my $pKey = '';
for (my $i = 0; $i < $importable_size; $i++)
{
next if (!$$row[$i]);
$logger->debug("Appending ".$$row[$i]);
$key_string .= $$row[$i];
$pKey = $$row[$i] if (!$pKey);
}
$logger->debug("Final key string is: ".$key_string);
next if ($key_string eq "" );
my $hashed_rows;
$hashed_rows = $this_dset_hash{$pKey}{$key_string} if (!$self->GenomicMAlignHack());
$hashed_rows = $this_dset_hash{$key_string}{$key_string} if ($self->GenomicMAlignHack());
my $row_to_add = [@{$row}[$importable_size..@{$row}-1]];
push @$hashed_rows, $row_to_add;
$this_dset_hash{$pKey}{$key_string} = $hashed_rows if (!$self->GenomicMAlignHack());
$this_dset_hash{$key_string}{$key_string} = $hashed_rows if ($self->GenomicMAlignHack());
}
if ($self->isa("BioMart::Dataset::GenomicSequence")
&& ($self->lastDS() == 1 || $self->lastDS() == 2)
&& ($query->getAllAttributes($self->name)->[0]->name()) )
{
$sequenceType = $query->getAllAttributes($self->name)->[0]->name();
$sequenceType = 'ok';
if ($self->get('rowsFromLastBatch'))
{
my $lastBatchRows = $self->get('rowsFromLastBatch');
my $doWriteBack =0;
foreach my $firstkey( keys %$lastBatchRows) {
if (exists $this_dset_hash{$firstkey}) {
foreach my $secondkey (keys %{$lastBatchRows->{$firstkey}}) {
foreach my $rows (@{$lastBatchRows->{$firstkey}{$secondkey}})
{
push @{$prev_dset_hash{$firstkey}{$secondkey}}, $rows;
}
}
delete $lastBatchRows->{$firstkey};
$doWriteBack = 1;
}
}
$self->set('rowsFromLastBatch', $lastBatchRows) if ($doWriteBack);
}
my $saveForNextBatch = undef;
foreach my $firstkey (keys %prev_dset_hash) {
if (!exists $this_dset_hash{$firstkey}) {
$saveForNextBatch ||= $self->get('rowsFromLastBatch');
foreach my $secondkey (keys %{$prev_dset_hash{$firstkey}}) {
foreach my $rows (@{$prev_dset_hash{$firstkey}{$secondkey}})
{
push @{$saveForNextBatch->{$firstkey}{$secondkey}}, $rows;
}
}
}
}
$self->set('rowsFromLastBatch', $saveForNextBatch);
}
my @new_rows;
foreach my $prkey(keys %this_dset_hash)
{
foreach my $key(keys %{$this_dset_hash{$prkey}})
{
my $this_dset_rows = $this_dset_hash{$prkey}{$key};
my $pKey = $prkey;
$logger->warn("Processing key: ".$key);
$logger->warn("This previous rows: ".scalar(@$this_dset_rows));
foreach my $this_dset_row(@$this_dset_rows)
{
my $prev_dset_rows = $prev_dset_hash{$prkey}; if(!$prev_dset_rows) { $prev_dset_rows = $prev_dset_hash{lc($prkey)}; }
if(!$prev_dset_rows) { $prev_dset_rows = $prev_dset_hash{uc($prkey)}; }
if ($prev_dset_rows)
{
my @allRows;
if (defined $prev_dset_hash{$pKey})
{
foreach my $key_string (keys %{$prev_dset_hash{$pKey}}) {
push @allRows, $prev_dset_hash{$pKey}{$key_string} ;
}
}
if (!@allRows && exists $prev_dset_hash{lc($pKey)})
{
foreach my $key_string (keys %{$prev_dset_hash{lc($pKey)}}) {
push @allRows, $prev_dset_hash{lc($pKey)}{$key_string} ;
}
}
if (!@allRows && exists $prev_dset_hash{uc($pKey)})
{
foreach my $key_string (keys %{$prev_dset_hash{uc($pKey)}}) {
push @allRows, $prev_dset_hash{uc($pKey)}{$key_string} ;
}
}
if ($sequenceType eq 'ok')
{
my ($finalRow, $avoidRepeats) = ();
foreach my $subrow (@allRows) {
if($subrow) {
foreach my $row (@$subrow) {
for (my $i = 0; $i < scalar(@{$row}); $i++) {
if ( $row->[$i] ) {
if (!$finalRow->[$i]) {
$finalRow->[$i] = $row->[$i];
}
else {
$finalRow->[$i] .= ';'.$row->[$i] if (!$avoidRepeats->{$i}->{$row->[$i]});
}
$avoidRepeats->{$i}->{$row->[$i]} = 1;
}
}
}
}
}
push @new_rows, [@$this_dset_row,@$finalRow] if ($finalRow);
push @new_rows, [@$this_dset_row,""] if (!$finalRow);
}
else
{
my %avoidDuplication = ();
foreach my $subrow (@allRows) {
if($subrow) {
NEXTROW: foreach my $row (@$subrow) {
my $rowAsString = $self->toString($row);
next NEXTROW if (!$rowAsString || exists $avoidDuplication{$rowAsString} ) ;
$avoidDuplication{$rowAsString} = '';
push @new_rows, [@$this_dset_row,@$row];
}
}
}
}
}
else
{
$logger->debug("There were NO previous rows");
}
}
}
}
$logger->debug("Finished with rows: ".scalar(@new_rows));
$rtable->setRows(\@new_rows);
return $rtable; } |
sub _checkValidParams
{ my $self = shift;
my $conf = $self->getParam(CONFIGURATOR);
unless ($conf->isa("BioMart::Configurator")) {
BioMart::Exception::Configuration->throw("Dataset objects require a BioMart::Configurator object parameter\nReceived object ".$conf."\n");
} } |
sub _getDBH
{ my $self = shift;
my $location=$self->getParam('configurator')->get('location');
$location->openConnection();
return $location->dbh()
|| BioMart::Exception::Database->throw("no dbh database handle available from mart location $location"); } |
sub _hashAttributes
{ my ($self,$tempTable,$exportable_size) = @_;
my %datasetAttributeHash;
my @new_rows;
my @order_of_rows;
my %groupSameKeyRows;
my $rows = $tempTable->getRows();
if ($self->forceHash){
HASHROW1:foreach my $row(@{$rows}){
my $new_row = [@{$row}[@{$row}-$exportable_size..@{$row}-1]];
no warnings 'uninitialized';
if (! exists $groupSameKeyRows{$new_row->[0]})
{
push @order_of_rows, $new_row->[0];
}
push @{$groupSameKeyRows{$new_row->[0]}}, $new_row;
my $key_string = '';
my $pKey = '';
for (my $i = @{$row} - $exportable_size; $i < @{$row}; $i++){
next if (!$$row[$i]);
$key_string .= $$row[$i];
$pKey = $$row[$i] if (!$pKey);
}
next if ($key_string eq "");
my $hashed_rows;
$hashed_rows = $datasetAttributeHash{$pKey}{$key_string} if (!$self->GenomicMAlignHack());
$hashed_rows = $datasetAttributeHash{$key_string}{$key_string} if ($self->GenomicMAlignHack());
my $row_to_add = [@{$row}[0..@{$row}-1-$exportable_size]];
if ($hashed_rows){
foreach my $prev_row (@{$hashed_rows}){
next HASHROW1 if ( (($prev_row && $row_to_add) && ($self->toString($prev_row) eq $self->toString($row_to_add)))
|| (!$prev_row && !$row_to_add) );
}
}
push @$hashed_rows,$row_to_add;
$datasetAttributeHash{$pKey}{$key_string} = $hashed_rows if (!$self->GenomicMAlignHack());
$datasetAttributeHash{$key_string}{$key_string} = $hashed_rows if ($self->GenomicMAlignHack());
}
}
else{
HASHROW:foreach my $row(@{$rows}){
my $new_row = [@{$row}[0..$exportable_size-1]];
no warnings 'uninitialized';
if (! exists $groupSameKeyRows{$new_row->[0]})
{
push @order_of_rows, $new_row->[0];
}
push @{$groupSameKeyRows{$new_row->[0]}}, $new_row;
my $key_string = '';
my $pKey = '';
for (my $i = 0; $i < $exportable_size; $i++){
next if (!$$row[$i]);
$key_string .= $$row[$i];
$pKey = $$row[$i] if (!$pKey);
}
next if ($key_string eq "");
my $hashed_rows;
$hashed_rows = $datasetAttributeHash{$pKey}{$key_string} if (!$self->GenomicMAlignHack());
$hashed_rows = $datasetAttributeHash{$key_string}{$key_string} if ($self->GenomicMAlignHack());
my $row_to_add = [@{$row}[$exportable_size..@{$row}-1]];
if ($hashed_rows){ foreach my $prev_row (@{$hashed_rows}){
next HASHROW if ( (($prev_row && $row_to_add) && ($self->toString($prev_row) eq $self->toString($row_to_add)))
|| (!$prev_row && !$row_to_add) );
}
}
push @$hashed_rows,$row_to_add;
$datasetAttributeHash{$pKey}{$key_string} = $hashed_rows if (!$self->GenomicMAlignHack());
$datasetAttributeHash{$key_string}{$key_string} = $hashed_rows if ($self->GenomicMAlignHack());
}
}
foreach my $rowKey (@order_of_rows)
{
foreach my $row (@{$groupSameKeyRows{$rowKey}})
{
push @new_rows, $row;
}
}
$tempTable->setRows(\@new_rows);
$tempTable->hashedResults(\%datasetAttributeHash) if (%datasetAttributeHash);
if (!%datasetAttributeHash && !$tempTable->hashedResults)
{
$tempTable->hashedResults(\%datasetAttributeHash);
}
return $tempTable; } |
sub _new
{ my ($self, @param) = @_;
$self->SUPER::_new(@param);
$self->addParams(TITLES, @param);
$self->checkRequiredParams(TITLES);
$self->_checkValidParams;
$self->attr('supportedSchemas', undef);
$self->attr('exportables', {}); $self->attr('importables', {}); $self->attr('configurationTrees', {});
$self->attr('exhausted', undef); $self->attr('explicit_batching', undef);
$self->attr('links', []);
$self->attr('cluster', undef);
$self->attr('pathHash', {});
$self->attr('schema',undef);
$self->attr('serverType', undef);
$self->attr('attributeHash', {});
$self->attr('forceHash', undef);
$self->attr('mode', 'MEMORY'); $self->attr('dirPath', undef); $self->attr('LastDS', undef);
$self->attr('GenomicMAlignHack', undef);
} |
sub _processNewQuery
{ my ($self, $query) = @_;
$self->set('exhausted', undef);
$self->set('explicit_batching', undef);
if ($self->can('__processNewQuery')) {
$self->__processNewQuery($query);
} } |
sub _setExhausted
{ my ($self, $exhausted) = @_;
$self->set('exhausted', $exhausted); } |
sub displayName
{ my ($self, $newname) = @_;
if ($newname) {
$self->setParam(DISPLAYNAME, $newname);
}
return $self->getParam(DISPLAYNAME); } |
sub exportableFrom
{ my $self = shift;
my $exportableFrom = ( @{$self->getExportables} > 0 );
return $exportableFrom; } |
sub forceHash
{ my ($self, $forceHash) = @_;
if ($forceHash) {
$self->set('forceHash', $forceHash);
}
return $self->get('forceHash'); } |
sub getAllConfigurationTrees
{ my ($self, @params) = @_;
my $allConfigTrees;
my (%params) = @params;
my $martUser = $params{'martUser'} || 'default';
my $required_interface = $params{'interface'} || 'default';
my $interfacesList = $self->interfaces(); my @interfacesArray = split /,/,$interfacesList;
foreach my $interface(@interfacesArray)
{
if ($required_interface eq $interface) {
my $confTree = $self->getConfigurationTree($interface);
my $martUsersList = $confTree->mart_Users();
my @allusers = split /,/,$martUsersList;
foreach my $user (@allusers)
{
if ($user eq $martUser)
{
push @{$allConfigTrees}, $confTree;
}
}
}
}
return $allConfigTrees;
}
} |
sub getConfigurationTree
{ my ($self,$interface,$dsCounter)=@_;
my %configurationTrees = %{$self->get('configurationTrees')};
my $cacheFile = $self->getDirPath();
$cacheFile .= $self->virtualSchema()."/";
$cacheFile .= "confTrees/";
my $createLinks = undef;
if(($dsCounter) && ($dsCounter eq 'CREATE_ALL_LINKS'))
{
$dsCounter = undef; $createLinks = 1;
}
if ($configurationTrees{$interface})
{
if ($configurationTrees{$interface} eq 'LAZYLOAD') {
$cacheFile .= $self->locationName().".".$self->name().".".$interface;
if(-e $cacheFile) {
my $configurationTreeFromFile;
eval{$configurationTreeFromFile = retrieve($cacheFile)};
if ($createLinks) {
$configurationTrees{$interface} = $configurationTreeFromFile;
$self->set('configurationTrees',\% configurationTrees);
}
return $configurationTreeFromFile;
}
}
return $configurationTrees{$interface};
}
if ($self->can('_getConfigurationTree'))
{
my $configurationTree = $self->_getConfigurationTree($interface, $dsCounter);
$configurationTrees{$interface} = $configurationTree;
$self->set('configurationTrees',\% configurationTrees);
if($self->getMode() eq 'LAZYLOAD')
{
$self->setConfigurationTree($interface, 'LAZYLOAD');
}
return $configurationTree;
}
$self->unimplemented_method();
}
} |
sub getConfigurator
{ my $self = shift;
return $self->getParam(CONFIGURATOR); } |
sub getCount
{ my ($self, @param) = @_;
local($^W) = 0; my(%param) = @param;
my $query = $param{'query'};
unless ($query->isa("BioMart::Query")) {
BioMart::Exception::Query->throw("getResultTable requires a valid BioMart::Query object\nRecieved object ".$query."\n");
}
if ($self->can('_getCount')) {
return $self->_getCount(%param);
}
$self->unimplemented_method(); } |
sub getDirPath
{
my ($self) = @_;
return $self->get('dirPath');
}
} |
sub getExportables
{ my ($self, $linkName, $interface) = @_;
my $exportables = $self->get('exportables');
if ($exportables eq 'LAZYLOAD')
{
$interface ||='default';
my $cacheFile = $self->getDirPath();
$cacheFile .= $self->virtualSchema()."/";
$cacheFile .= "_portables/";
$cacheFile .= $self->locationName().".".$self->name().".".$interface.".exportables";
$exportables = retrieve($cacheFile);
}
if ($linkName && $interface) { return $exportables->{$linkName}->{$interface};
}
my $ref = [];
foreach (values %{$exportables}){
push @{$ref}, values %{$_};
}
return $ref; } |
sub getImportables
{ my ($self, $linkName, $interface) = @_;
my $importables = $self->get('importables');
if ($importables eq 'LAZYLOAD')
{
$interface ||='default';
my $cacheFile = $self->getDirPath();
$cacheFile .= $self->virtualSchema()."/";
$cacheFile .= "_portables/";
$cacheFile .= $self->locationName().".".$self->name().".".$interface.".importables";
$importables = retrieve($cacheFile);
}
if ($linkName && $interface) { return $importables->{$linkName}->{$interface};
}
my $ref = [];
foreach (values %{$importables}){
push @{$ref}, values %{$_};
}
return $ref; } |
sub getMode
{
my ($self) = @_;
return $self->get('mode');
}
} |
sub getResultTable
{ my ($self, @param) = @_;
local($^W) = 0; my(%param) = @param;
my $query = $param{'query'};
unless ($query->isa("BioMart::Query")) {
BioMart::Exception::Query->throw("getResultTable requires a valid BioMart::Query object\nRecieved object ".$query."\n");
}
my $table = $param{'table'};
my $firstbatch;
unless ($table) {
$firstbatch = 1;
$self->_processNewQuery($query); if ($param{'batch_size'}) {
$self->set('explicit_batching',1);
}
else {
$param{'batch_size'} = $self->getParam(INIT_BATCHSIZE);
}
$table = BioMart::ResultTable->new('query' => $query,
'target_dataset' => $self,
'max_batchsize' => $self->getParam(MAX_BATCHSIZE),
'initial_batchsize' => $param{'batch_size'});
if ($param{'web_origin'} && $param{'web_origin'} == 1){
$table->webOrigin(1);
}
$param{'table'} = $table;
}
return undef if ($self->get('exhausted'));
if ($self->can('_getResultTable'))
{
my ($importable_size,$exportable_size,$linkName,$to_hash,$importable);
my $filters = $query->getAllFilters;
foreach my $filter (@$filters)
{
if ($filter->isa("BioMart::Configuration::FilterList")
&& $filter->batching)
{
$importable = $filter;
$importable_size = @{$importable->getAllFilters};
$linkName = $importable->linkName;
my $attribute_table = $importable->getTable;
my $attributeHash = $self->get('attributeHash');
$attributeHash->{$linkName} = $attribute_table->hashedResults;
$self->set('attributeHash',$attributeHash);
if ($self->get('attributeHash')->{$linkName} &&
!$query->getAttributeListByName($linkName))
{
my $alist = BioMart::Configuration::AttributeList->new(
'name' => $linkName,
'dataSetName' => $self->name ,
'interface' => $query->getInterfaceForDataset($self->name));
my $attribute_string = '';
my $comma = '';
my $list_filters = $importable->getAllFilters;
foreach (@$list_filters){
$attribute_string = $attribute_string.$comma.
$_->attribute->name;
$comma = ',';
$alist->addAttribute($_->attribute);
}
$alist->attributeString($attribute_string);
$query->addAttributeListFirst($alist);
}
last;
}
}
my $has_data = $self->_getResultTable(%param);
$logger->debug("Got results") if $has_data;
$logger->debug("Got no results") unless $has_data;
if ($importable){
my $attribute_table = $importable->getTable;
my $attributeHash = $self->get('attributeHash');
$attributeHash->{$linkName} = $attribute_table->hashedResults;
$self->set('attributeHash',$attributeHash);
}
if ($has_data && $has_data > 0 && $linkName &&
$self->get('attributeHash')->{$linkName}){
$logger->debug("Attribute merge using linkName: $linkName");
$logger->debug("Before merge: ".scalar(@{$has_data->get('columns')}));
$table = $self->_attributeMerge($table,$importable_size,$linkName, $query);
$logger->debug("After merge: ".scalar(@{$has_data->get('columns')}));
}
if ($self->forceHash){
$to_hash = 1;
$exportable_size = $self->forceHash;
}
elsif ($query->getAllAttributeLists && $query->getAllAttributes){
foreach my $exportable(@{$query->getAllAttributeLists}){
if ($exportable->linkName){
$to_hash = 1;
$exportable_size = @{$exportable->getAllAttributes};
last;
}
}
}
else {
foreach my $exportable(@{$query->getAllAttributeLists}){
if ($exportable->linkName){
$exportable_size = @{$exportable->getAllAttributes};
last;
}
}
my $first_row = ${$table->getRows()}[0];
my $col_number = @{$first_row} if ($first_row);
if ($col_number && $exportable_size &&
$col_number > $exportable_size){
$to_hash = 1;
}
if (!$col_number && $exportable_size)
{
$to_hash = 1;
}
}
if ($to_hash){
$logger->debug("Attribute hash");
$logger->debug("Before hash: ".scalar(@{$table->get('columns')}));
$table = $self->_hashAttributes($table,$exportable_size);
$logger->debug("After hash: ".scalar(@{$table->get('columns')}));
}
$logger->debug("Returning defined has_data") if $has_data;
return $has_data if ($has_data); $logger->debug("Returning table") if $firstbatch;
return $table if ($firstbatch); $logger->debug("Returning undefined has_data") if $has_data;
return $has_data; }
$self->unimplemented_method(); } |
sub importableTo
{ my $self = shift;
my $importableTo = ( @{$self->getImportables} > 0 );
return $importableTo; } |
sub initialBatchSize
{ my ($self, $initialBatchSize) = @_;
if ($initialBatchSize) {
$self->setParam(INIT_BATCHSIZE, $initialBatchSize);
}
return $self->getParam(INIT_BATCHSIZE); } |
sub interfaces
{ my ($self, $interfaces) = @_;
if ($interfaces) {
$self->set(INTERFACES, $interfaces);
}
return $self->getParam(INTERFACES); } |
sub lastDS
{ my ($self, $val) = @_;
if ($val) {
$self->set('LastDS', $val);
}
return $self->get('LastDS'); } |
sub locationDisplayName
{ my ($self, $database) = @_;
if ($database) {
$self->setParam(LOCATIONDISPLAY, $database);
}
return $self->getParam(LOCATIONDISPLAY); } |
sub locationName
{ my ($self, $schema) = @_;
if ($schema) {
$self->setParam(LOCATIONNAME, $schema);
}
return $self->getParam(LOCATIONNAME);
}
} |
sub maxBatchSize
{ my ($self, $maxBatchSize) = @_;
if ($maxBatchSize) {
$self->setParam(MAX_BATCHSIZE, $maxBatchSize);
}
return $self->getParam(MAX_BATCHSIZE); } |
sub modified
{ my ($self, $modified) = @_;
if ($modified) {
$self->setParam(MODIFIED, $modified);
}
return $self->getParam(MODIFIED); } |
sub name
{ my ($self, $newname) = @_;
if ($newname) {
$self->setParam(NAME, $newname);
}
return $self->getParam(NAME); } |
sub schema
{ my ($self, $schema) = @_;
if ($schema) {
$self->set('schema', $schema);
}
return $self->get('schema'); } |
sub serverType
{ my ($self, $newname) = @_;
if ($newname) {
$self->set(SERVERTYPE, $newname);
}
return $self->get(SERVERTYPE); } |
sub setConfigurationTree
{
my ($self, $interface, $flag) = @_;
my %configurationTrees = %{$self->get('configurationTrees')};
my $cacheFile = $self->getDirPath();
$cacheFile .= $self->virtualSchema()."/";
$cacheFile .= "confTrees/";
$cacheFile .= $self->locationName().".".$self->name().".".$interface;
if(-e $cacheFile) {
unlink $cacheFile;
}
store($configurationTrees{$interface},$cacheFile);
if ($flag eq 'LAZYLOAD')
{
$configurationTrees{$interface} = 'LAZYLOAD';
$self->set('configurationTrees',\% configurationTrees);
} } |
sub setDirPath
{
my ($self, $val) = @_;
if ($val)
{
$self->set('dirPath', $val);
}
}
} |
sub setExportables
{ my ($self, $interface ,$val) = @_;
my $cacheFile = $self->getDirPath();
$cacheFile .= $self->virtualSchema()."/";
$cacheFile .= "_portables/";
$cacheFile .= $self->locationName().".".$self->name().".".$interface.".exportables";
if(-e $cacheFile) {
unlink $cacheFile;
}
store($self->get('exportables'),$cacheFile);
$self->set('exportables', $val);
} |
sub setImportables
{ my ($self, $interface, $val) = @_;
my $cacheFile = $self->getDirPath();
$cacheFile .= $self->virtualSchema()."/";
$cacheFile .= "_portables/";
$cacheFile .= $self->locationName().".".$self->name().".".$interface.".importables";
if(-e $cacheFile) {
unlink $cacheFile;
}
store($self->get('importables'),$cacheFile);
$self->set('importables', $val);
} |
sub setMode
{
my ($self, $val) = @_;
if ($val)
{
$self->set('mode', $val);
}
}
} |
sub toString
{
my ($self, $curRow) = @_;
my $string;
foreach (@{$curRow})
{
$string .= $_ if ($_);
}
return $string; } |
sub version
{ my ($self, $version) = @_;
if ($version) {
$self->setParam(VERSION, $version);
}
return $self->getParam(VERSION); } |
sub virtualSchema
{ my ($self, $newname) = @_;
if ($newname) {
$self->setParam(VIRTSCHEMA, $newname);
}
return $self->getParam(VIRTSCHEMA); } |
sub visible
{ my ($self, $visible) = @_;
if ($visible) {
$self->setParam(VISIBLE, $visible);
}
return $self->getParam(VISIBLE); } |
General documentation
AUTHOR - Arek Kasprzyk, Syed Haider, Richard Holland, Darin London, Damian Smedley | Top |