Bio::EnsEMBL::Funcgen
Importer
Toolbar
Summary
Bio::EnsEMBL::Funcgen::Importer
Package variables
No package variables defined.
Included modules
Synopsis
my $imp = Bio::EnsEMBL::Funcgen::Importer->new(%params);
$imp->register_experiment();
Description
This program is the main class coordinating import of Arrays and experimental data.
It utilises several underlying definitions classes specific to array vendor, array class and
experimental group.
Methods
Methods description
Example : $self->R_norm(@logic_names); Description: Performs R normalisations for given logic names Returntype : none Exceptions : Throws if R exits with error code or if data not not valid for analysis Caller : general Status : At risk |
Arg [1] : Bio::EnsEMBL::Funcgen::Array Example : $self->add_Array($array); Description: Setter for array elements Returntype : none Exceptions : throws if passed non Array or if more than one Array set Caller : Importer Status : At risk - Implement multiple arrays? Move to Experiment? |
Example : my $array_file = $imp->array_file(); Description: Getter/Setter for sanger/design array file Arg [1] : optional - path to adf or gff array definition/mapping file Returntype : string Exceptions : none Caller : general Status : at risk |
Example : my $array_name = $imp->array_name(); Description: Getter/Setter for array name Arg [1] : optional string - name of array Returntype : string Exceptions : none Caller : general Status : at risk |
Example : $imp->array_set(1); Description: Getter/Setter for array set flag Arg [1] : optional boolean - treat all array chips as the same array Returntype : boolean Exceptions : none Caller : general Status : at risk |
Example : foreach my $array(@{$imp->arrays}){ ...do an array of things ...}; Description: Getter for the arrays attribute Returntype : ARRAYREF Exceptions : none Caller : general Status : at risk |
Arg [0] : mandatory - probe name Arg [1] : mandatory - probe dbID Arg [2] : optioanl int - x coord of probe on array Arg [3] : optional int - y coord of probe on array Example : $self->cache_probe_info("Probe1", $probe->dbID()); Or for result files which do not have X & Y, we need to cache X & Y from the design files: $self->cache_probe_info('Probe2', $probe->dbID(), $x, $y); Description: Setter for probe cache values Returntype : none Exceptions : throws is cache conflict encountered Caller : self Status : At risk - merge with following? |
Arg [0] : string - region_name e.g. X Arg [1] : optional - coordinate system name e.g. supercontig, defaults to chromosome Example : my $slice = $self->cache_slice(12); Description: Caches or retrieves from cache a given slice Returntype : Bio::EnsEMBL::Slice Exceptions : throws f no region name specified Caller : self Status : At risk |
Example : $imp->cell_type($ctype); Description: Getter/Setter for Experiment CellType Arg [1] : optional - Bio::EnsEMBL::Funcgen::CellType Returntype : Bio::EnsEMBL::Funcgen::CellType Exceptions : Throws if arg is not valid or stored Caller : general Status : at risk |
Example : my $contact = $imp->contact(); Description: Getter/Setter for the group contact Arg [1] : optional - contact name/email/address Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $self->create_output_dirs(); Description: Does what it says on the tin, creates dirs in the root output dir foreach @dirnames, also set paths in self Arg [1] : mandatory - list of dir names Returntype : none Exceptions : none Caller : general Status : Medium - add throw? |
Example : $imp->db($funcgen_db); Description: Getter/Setter for the db element Arg [1] : optional - Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor Returntype : Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor Exceptions : throws if arg is not an DBAdaptor Caller : general Status : Stable |
Example : my $exp_group = $imp->group(); Description: Getter/Setter for the group name Arg [1] : optional - group name Returntype : string Exceptions : none Caller : general Status : At risk - to be removed, us db->dbc->dbname |
Example : $imp->description("Human chrX H3 Lys 9 methlyation"); Description: Getter/Setter for the experiment element Arg [1] : optional - experiment description Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $self->design_type("binding_site_identification") Description: Getter/Setter for experimental design type Arg [1] : optional - design type Returntype : string Exceptions : none Caller : general Status : At risk |
Example : if($self->dump_fasta()){...do fasta dump...} Description: Getter/Setter for the dump_fasta flag Arg [1] : optional - 0 or 1 Returntype : boolean Exceptions : none Caller : self Status : Stable |
Example : my $exp = $imp->experiment(); Description: Getter/Setter for the Experiment element Arg [1] : optional - Bio::EnsEMBL::Funcgen::Experiment Returntype : Bio::EnsEMBL::Funcgen::Experiment Exceptions : throws if arg is not an Experiment Caller : general Status : Stable |
Example : $imp->experiment_date('2006-11-02'); Description: Getter/Setter for the experiment date Arg [1] : optional - date string in yyyy-mm-dd Returntype : string Exceptions : none Caller : general Status : At risk |
Example : my $esset_name = $imp->experimental_set_name(); Description: Getter/Setter for experimental_set_name Arg [1] : optional - ExperimentalSet name Returntype : string Exceptions : none Caller : general Status : Stable |
Arg [1] : Boolean Example : $importer->farm(1); Description: Flag to turn farm submission on Returntype : Boolean Exceptions : Throws is argument not a boolean Caller : general Status : At risk |
Example : $imp->feature_analysis($fanal); Description: Getter/Setter for Analysis used for creating the imported Features Arg [1] : optional - Bio::EnsEMBL::Analysis Returntype : Bio::EnsEMBL::Analysis Exceptions : Throws if arg is not valid or stored Caller : general Status : at risk |
Example : $imp->description("ExperimentalSet description"); Description: Getter/Setter for the FeatureSet description for an ExperimentalSet import e.g. preprocessed GFF/Bed data Arg [1] : optional - string feature set description Returntype : string Exceptions : none Caller : general Status : At risk |
Example : $imp->feature_type($ftype); Description: Getter/Setter for Experiment FeatureType Arg [1] : optional - Bio::EnsEMBL::Funcgen::FeatureType Returntype : Bio::EnsEMBL::Funcgen::FeatureType Exceptions : Throws if arg is not valid or stored Caller : general Status : at risk |
Example : $imp->format("Tiled"); Description: Getter/Setter for the array format Arg [1] : optional - array format Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $seq_region_id = $self->get_seq_region_id('X'); Description: Calls each method in data_type array from config hash Arg [1] : mandatory - chromosome name Arg [2] : optional - start value Arg [3] : optional - end value Returntype : int Exceptions : none Caller : self Status : At risk |
Arg [1] : mandatory - name of the data element to retrieve from the config hash Example : %dye_freqs = %{$imp->get_config('dye_freqs')}; Description: returns data from the definitions hash Returntype : various Exceptions : none Caller : Importer Status : at risk - replace with direct calls in the inherited Defs class? |
Example : $imp->get_dir("import"); Description: Retrieves full path for given directory Arg [1] : mandatory - dir name Returntype : string Exceptions : none Caller : general Status : at risk - move to Helper? |
Arg[1] : Bio::EnsEMBL::Funcgen::Array Arg[2] : boolean - from db flag, only to be used by Importer->resolve_probe_data ! Example : $self->get_probe_cache_by_Array(); Description: Gets the probe info cache which is an array tied to a file Returntype : Boolean - True if cache has been generated and set successfully Exceptions : none Caller : general Status : At risk |
Arg [1] : mandatory - probe name Example : $pid = $self->get_probe_id_by_name($pname); Description: Getter for probe cache values Returntype : int Exceptions : none Caller : self Status : At risk - merge with previous, move to importer? |
Example : my $exp_group = $imp->group(); Description: Getter/Setter for the group name Arg [1] : optional - group name Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $self->init_import(); Description: Initialises import by creating working directories and by storing the Experiemnt Returntype : none Exceptions : warns and throws depending on recover and Experiment status Caller : general Status : at risk - merge with register_array_design |
Example : $self->init_import(); Description: Initialises import by creating working directories and by storing the Experiemnt Returntype : none Exceptions : warns and throws depending on recover and Experiment status Caller : general Status : at risk - merge with register exeriment |
Example : $imp->vendor("Hinxton"); Description: Getter/Setter for group location Arg [1] : optional - location Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $imp->name('Experiment1'); Description: Getter/Setter for the experiment name Arg [1] : optional - experiment name Returntype : string Exceptions : none Caller : general Status : Stable |
Description : Constructor method
Arg [1] : hash containing optional attributes:
-name Name of Experiment(dir)
-format of array e.g. Tiled(default)
-vendor name of array vendor
-description of the experiment
-pass DB password
-host DB host
-user DB user
-port DB port
-registry_host Host to load registry from
-registry_port Port for registry host
-registry_user User for registry host
-registry_pass Password for registry user
-ssh Flag to set connection over ssh via forwarded port to localhost (default = 0); remove?
-group name of experimental/research group
-location of experimental/research group
-contact e/mail address of primary contact for experimental group
-species
-assembly Genome assembly version i.e. 36 for NCBI36
-recover Recovery flag (default = 0)
-data_dir Root data directory (default = $ENV{'EFG_DATA'})
-output_dir review these dirs ???????
-input_dir ?????????
-import_dir ???????
-norm_dir ??????
-fasta dump FASTA flag (default =0)
-array_set Flag to treat all chip designs as part of same array (default = 0)
-array_name Name for array set
-array_file Path of array file to import for sanger ENCODE array
-result_set_name Name to give the raw and normalised result sets (default uses experiment and analysis name)
-norm_method Normalisation method (Nimblegen default = VSN_GLOG or $ENV{'NORM_METHOD'})
-dbname Override for autogeneration of funcgen dbaname
-reg_config path to local registry config file (default = ~/ensembl.init || undef)
-design_type MGED term (default = binding_site_identification) get from meta/MAGE?
-verbose
ReturnType : Bio::EnsEMBL::Funcgen::Importer
Example : my $Exp = Bio::EnsEMBL::Importer->new(%params);
Exceptions : throws if mandatory params are not set or DB connect fails
Caller : General
Status : Medium - potential for %params names to change, remove %attrdata? |
Example : $imp->norm_analysis($anal); Description: Getter/Setter for the normalisation analysis Arg [1] : optional - Bio::EnsEMBL::Analysis Returntype : Bio::EnsEMBL::Analysis Exceptions : Throws if arg is not valid or stored Caller : general Status : at risk |
Example : my $norm_method = $imp->norm_method() Description: Getter/Setter for normalisation method Arg [1] : mandatory - method name Returntype : string Exceptions : none ? throw if no analysis with logic name Caller : general Status : At risk - restrict to logic_name and validate against DB, allow multiple |
Example : $imp->host("hoastname"); Description: Getter/Setter for the db hostname Arg [1] : optional - db hostname Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $imp->port(3306); Description: Getter/Setter for the db port number Arg [1] : optional - db port number Returntype : int Exceptions : none Caller : general Status : Stable |
Example : $self->read_data("probe") Description: Calls each method in data_type array from config hash Arg [1] : mandatory - data type Returntype : none Exceptions : none Caller : self Status : At risk |
Example : if($imp->recovery()){ ....do recovery code...} Description: Getter/Setter for the recovery flag Arg [1] : optional - 0 or 1 Returntype : boolean Exceptions : none Caller : self Status : Medium - Most recovery now dynamic using status table |
Example : $imp->register_experiment() Description: General control method, performs all data import and normalisations Arg [1] : optional - dnadb DBAdaptor Returntype : none Exceptions : throws if arg is not Bio::EnsEMBL::DBSQL::DBAdaptor Caller : general Status : Medium |
Example : my $reg_host = $imp->registry_host; Description: Accessor for registry host attribute Returntype : string e.g. ensembldb.ensembl.org Exceptions : None Caller : general Status : at risk |
Example : my $reg_pass = $imp->registry_pass; Description: Accessor for registry pass attribute Returntype : string e.g. ensembldb.ensembl.org Exceptions : None Caller : general Status : at risk |
Example : my $reg_port = $imp->registry_port; Description: Accessor for registry port attribute Returntype : string e.g. ensembldb.ensembl.org Exceptions : None Caller : general Status : at risk |
Example : my $reg_user = $imp->registry_user; Description: Accessor for registry user attribute Returntype : string e.g. ensembldb.ensembl.org Exceptions : None Caller : general Status : at risk |
Example : $self->resolve_probe_data(); Description: Resolves DB probe duplicates and builds local probe cache Returntype : none Exceptions : ???? Caller : general Status : At risk |
Example : $imp->result_files(\@files); Description: Getter/Setter for the result file paths Arg [1] : Listref of file paths Returntype : Listref Exceptions : none Caller : general Status : At risk |
Example : $imp->species("homo_sapiens"); Description: Getter/Setter for species Arg [1] : optional - species name(alias?) Returntype : string Exceptions : none ? throw if no alias found? Caller : general Status : Medium - may move reg alias look up to this method |
Arg [1] : mandatory - array chip id Arg [2] : optional - Bio::EnsEMBL::Funcgen::ProbeSet Arg [3] : mandatory - hashref of keys probe id, values are hash of probe/features with values Bio::EnsEMBL::Funcgen::Probe/Features for a given probe set if defined. Example : $self->store_set_probes_features($ac->dbID(), $ops, \%pfs); Description: Stores probe set, probes and probe features Returntype : none Exceptions : none Caller : self Status : Medium |
Example : $start += 1 if $self->ucsc_coords; Description: Getter for UCSC coordinate usage flag Returntype : boolean Exceptions : none Caller : general Status : at risk |
Example : $imp->user("user_name"); Description: Getter/Setter for the db user name Arg [1] : optional - db user name Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $self->validate_group(); Description: Validates groups details Returntype : none Exceptions : throws if insufficient info defined to store new Group and is not already present Caller : general Status : Medium - check location and contact i.e. group name clash? |
Example : $imp->vendor("NimbleGen"); Description: Getter/Setter for array vendor Arg [1] : optional - vendor name Returntype : string Exceptions : none Caller : general Status : Stable |
Example : $imp->verbose(1); Description: Getter/Setter for the verbose flag Arg [1] : optional - 0 or 1 Returntype : int Exceptions : none Caller : general Status : Stable |
Example : $self->vsn_norm(); Description: Convinience/Wrapper method for vsn R normalisation Returntype : none Exceptions : none Caller : general Status : At risk |
Methods code
sub R_norm
{ my ($self, @logic_names) = @_;
my $aa = $self->db->get_AnalysisAdaptor();
my $rset_adaptor = $self->db->get_ResultSetAdaptor();
my $ra_id = $aa->fetch_by_logic_name("RawValue")->dbID();
my %r_config = (
"VSN_GLOG" => {( libs => ['vsn'],
)},
"T.Biweight" => {(
libs => ['affy'],
)},
);
foreach my $logic_name (@logic_names) {
my $norm_anal = $aa->fetch_by_logic_name($logic_name);
my $rset = $self->get_import_ResultSet($norm_anal, 'experimental_chip');
my @chips = ();
if (! $rset) {
$self->log("All ExperimentalChips already have status:\t${logic_name}");
} else { my @dbids;
my $R_file = $self->get_dir("norm")."/${logic_name}.R";
my $job_name = $self->experiment->name()."_${logic_name}";
my $resultfile = $self->get_dir("norm")."/result.${logic_name}.txt";
my $outfile = $self->get_dir("norm")."/${logic_name}.out";
my $errfile = $self->get_dir("norm")."/${logic_name}.err";
my $cmdline = "$ENV{'R_PATH'} --no-save < $R_file"; my $bsub = "bsub -K -J $job_name ".$ENV{'R_BSUB_OPTIONS'}.
" -e $errfile -o $outfile $ENV{'R_FARM_PATH'} CMD BATCH $R_file";
my $r_cmd = (! $self->farm()) ? "$cmdline >$outfile 2>&1" : $bsub;
$self->backup_file($resultfile);
my $query = "options(scipen=20);library(RMySQL);library(Ringo);";
foreach my $lib (@{$r_config{$logic_name}{'libs'}}) {
$query .= "library($lib);";
}
$query .= "con<-dbConnect(dbDriver(\"MySQL\"), host=\"".$self->db->dbc->host()."\", port=\"".$self->db->dbc->port()."\", dbname=\"".$self->db->dbc->dbname()."\", user=\"".$self->db->dbc->username()."\"";
$query .= ", pass=\"".$self->db->dbc->password."\")\n";
foreach my $echip (@{$self->experiment->get_ExperimentalChips()}) {
if ($echip->has_status($logic_name)) {
$self->log("ExperimentalChip ".$echip->unique_id()." already has status:\t$logic_name");
} else {
push @chips, $echip;
my $cc_id = $rset->get_chip_channel_id($echip->dbID());
$self->log("Building $logic_name R cmd for ".$echip->unique_id());
@dbids = ();
foreach my $chan (@{$echip->get_Channels()}) {
if ($chan->type() eq "EXPERIMENTAL") {
push @dbids, $chan->dbID();
} else {
unshift @dbids, $chan->dbID();
}
}
throw("vsn does not accomodate more than 2 channels") if (scalar(@dbids > 2) && $logic_name eq "VSN_GLOG");
$query .= "c1<-dbGetQuery(con, 'select r.probe_id as PROBE_ID, r.score as CONTROL_score, r.X, r.Y from result r, chip_channel c, result_set rs where c.table_name=\"channel\" and c.table_id=${dbids[0]} and c.result_set_id=rs.result_set_id and rs.analysis_id=${ra_id} and c.chip_channel_id=r.chip_channel_id')\n";
$query .= "c2<-dbGetQuery(con, 'select r.probe_id as PROBE_ID, r.score as EXPERIMENTAL_score, r.X, r.Y from result r, chip_channel c, result_set rs where c.table_name=\"channel\" and c.table_id=${dbids[1]} and c.result_set_id=rs.result_set_id and rs.analysis_id=${ra_id} and c.chip_channel_id=r.chip_channel_id')\n";
$query .= "R<-as.matrix(c1['CONTROL_score'])\nG<-as.matrix(c2['EXPERIMENTAL_score'])\n";
$query .= "genes<-cbind(c1['PROBE_ID'], c1['X'], c1['Y'])\n";
$query .= "testRG<-new('RGList', list(R=R, G=G, genes=genes))\n";
$query .= "pdf('".$self->get_dir('norm').'/'.$echip->unique_id."_QC.pdf', paper='a4', height = 15, width = 9)\n";
$query .= "par(mfrow = c(2,2), font.lab = 2)\n";
$query .= "plotDensities(testRG)\n";
$query .= 'meanLogA<-((log(testRG$R, base=exp(2)) + log(testRG$G, base=exp(2)))/2)'."\n";
$query .= 'logIntRatioM<-(log(testRG$R, base=exp(2)) - log(testRG$G, base=exp(2)))'."\n";
$query .= "yMin<-min(logIntRatioM)\n";
$query .= "yMax<-max(logIntRatioM)\n";
$query .= "infCount<-0\n";
$query .= "if( yMax == Inf){; sortedM<-sort(logIntRatioM); lengthM<-length(logIntRatioM); indexM<-lengthM\n"
."while (yMax == Inf){; indexM<-(indexM-1); yMax<-sortedM[indexM];}; infCount<-(lengthM-indexM);}\n";
$query .= "if(infCount == 0){\n";
$query .= 'plot(meanLogA, logIntRatioM, xlab="A - Average Log Ratio",ylab="M - Log Ratio",pch=".",ylim=c(yMin,yMax), main="'.$echip->unique_id.'")'."\n";
$query .= "} else {\n";
$query .= 'plot(meanLogA, logIntRatioM, xlab="A - Average Log Ratio",ylab="M - Log Ratio",pch=".",ylim=c(yMin,yMax), main="'.$echip->unique_id.'", sub=paste(infCount, " Inf values not plotted"));'."}\n";
$query .= 'image(testRG, 1, channel = "green", mycols = c("black", "green4", "springgreen"))'."\n";
$query .= 'image(testRG, 1, channel = "red", mycols = c("black", "green4", "springgreen"))'."\n";
$query .= "dev.off()\n";
if($logic_name eq 'T.Biweight'){
$query .= 'lr_df<-cbind((log(c2["EXPERIMENTAL_score"], base=exp(2)) - log(c1["CONTROL_score"], base=exp(2))))'."\n";
$query .= 'norm_df<-(lr_df["EXPERIMENTAL_score"]-tukey.biweight(as.matrix(lr_df)))'."\n";
$query .= 'formatted_df<-cbind(rep.int(0, length(c1["PROBE_ID"])), c1["PROBE_ID"], sprintf("%.3f", norm_df[,1]), rep.int('.$cc_id.', length(c1["PROBE_ID"])), c1["X"], c1["Y"])'."\n";
}
elsif($logic_name eq 'VSN_GLOG'){
$query .= "raw_df<-cbind(c1[\"CONTROL_score\"], c2[\"EXPERIMENTAL_score\"])\n";
$query .= "norm_df<-vsn(raw_df)\n";
$query .= 'formatted_df<-cbind(rep.int(0, length(c1["PROBE_ID"])), c1["PROBE_ID"], sprintf("%.3f", (exprs(norm_df[,2]) - exprs(norm_df[,1]))), rep.int('.$cc_id.', length(c1["PROBE_ID"])), c1["X"], c1["Y"])'."\n";
}
$query .= "write.table(formatted_df, file=\"${resultfile}\", sep=\"\\t\", col.names=FALSE, row.names=FALSE, quote=FALSE, append=TRUE)\n";
}
}
$query .= "q();";
open(RFILE, ">$R_file") || throw("Cannot open $R_file for writing");
print RFILE $query;
close(RFILE);
my $submit_text = "Submitting $logic_name job";
$submit_text .= ' to farm' if $self->farm;
$self->log("${submit_text}:\t".localtime());
run_system_cmd($r_cmd);
$self->log("Finished $logic_name job:\t".localtime());
$self->log('See '.$self->get_dir('norm').' for ExperimentalChip QC files');
$self->log("Importing:\t$resultfile");
$self->db->load_table_data("result", $resultfile);
$self->log("Finishing importing:\t$resultfile");
foreach my $echip(@chips){
$echip->adaptor->store_status($logic_name, $echip);
}
my $rset_a = $self->db->get_ResultSetAdaptor();
my %seen_rsets;
foreach my $anal_rset(@{$rset_a->fetch_all_by_Experiment($self->experiment)}){
next if($anal_rset->name =~ /_IMPORT$/o);
next if(exists $seen_rsets{$anal_rset->name});
next if $anal_rset->analysis->logic_name eq $norm_anal->logic_name;
$seen_rsets{$rset->name} = 1;
$anal_rset->analysis($norm_anal);
$anal_rset->{'dbID'} = undef;
$anal_rset->{'adaptor'} = undef;
foreach my $table_id(@{$anal_rset->table_ids}){
$anal_rset->{'table_id_hash'}{$table_id} = $rset->get_chip_channel_id($table_id);
}
$self->log('Adding new ResultSet '.$anal_rset->name.' with analysis '.$norm_anal->logic_name);
$rset_a->store($anal_rset);
}
}
}
return;
}
} |
sub add_Array
{ my $self = shift;
if (! $_[0]->isa('Bio::EnsEMBL::Funcgen::Array')) {
throw("Must supply a Bio::EnsEMBL::Funcgen::Array");
} elsif (@_) {
push @{$self->{'arrays'}}, @_;
}
throw("Does not yet support multiple array imports") if(scalar (@{$self->{'arrays'}}) > 1);
return; } |
sub array_file
{ my ($self) = shift;
$self->{'array_file'} = shift if(@_);
return $self->{'array_file'}; } |
sub array_name
{ my ($self) = shift;
$self->{'array_name'} = shift if(@_);
return $self->{'array_name'}; } |
sub array_set
{ my ($self) = shift;
$self->{'array_set'} = shift if(@_);
return $self->{'array_set'}; } |
sub arrays
{ my $self = shift;
if(! defined $self->{'arrays'}){
$self->{'arrays'} = $self->db->get_ArrayAdaptor->fetch_all_by_Experiment($self->experiment());
}
return $self->{'arrays'}; } |
sub cache_probe_info
{ my ($self, $pname, $pid, $x, $y) = @_;
throw('Deprecated, too memory expensive, now resolving DB duplicates and using Tied File cache');
throw("Must provide a probe name and id") if (! defined $pname || ! defined $pid);
$self->{'_probe_cache'}->{$pname} = (defined $x && defined $y) ? [$pid, $x, $y] : [$pid];
return; } |
sub cache_slice
{ my ($self, $region_name, $cs_name) = @_;
throw("Need to define a region_name to cache a slice from") if ! $region_name;
$cs_name ||= 'chromosome';
$self->{'slice_cache'} ||= {};
$region_name =~ s/chr//;
$region_name = "MT" if $region_name eq "M";
if (! exists $self->{'slice_cache'}->{$region_name}) {
$self->{'slice_cache'}->{$region_name} = $self->slice_adaptor->fetch_by_region($cs_name, $region_name);
warn("-- Could not generate a slice for ${cs_name}:$region_name\n") if ! defined $self->{'slice_cache'}->{$region_name};
}
return $self->{'slice_cache'}->{$region_name}; } |
sub cell_type
{ my ($self) = shift;
if (@_) {
my $ctype = shift;
if (! ($ctype->isa('Bio::EnsEMBL::Funcgen::CellType') && $ctype->dbID())) {
throw("Must pass a valid stored Bio::EnsEMBL::Funcgen::CellType");
}
$self->{'cell_type'} = $ctype;
}
return $self->{'cell_type'}; } |
sub contact
{ my ($self) = shift;
$self->{'contact'} = shift if(@_);
return $self->{'contact'}; } |
sub create_output_dirs
{ my ($self, @dirnames) = @_;
foreach my $name (@dirnames) {
if($name eq 'caches'){
$self->{"${name}_dir"} = $ENV{'EFG_DATA'}."/${name}/".$self->db->dbc->dbname() if(! defined $self->{"${name}_dir"});
}
elsif($name eq 'fastas'){
$self->{"${name}_dir"} = $ENV{'EFG_DATA'}."/${name}/" if(! defined $self->{"${name}_dir"});
}
else{
$self->{"${name}_dir"} = $self->get_dir('output')."/${name}" if(! defined $self->{"${name}_dir"});
}
if(! (-d $self->get_dir($name) || (-l $self->get_dir($name)))){
$self->log("Creating directory:\t".$self->get_dir($name));
mkpath $self->get_dir($name) || throw('Failed to create directory: '. $self->get_dir($name));
chmod 0744, $self->get_dir($name);
}
}
return;
}
} |
sub db
{ my $self = shift;
if (defined $_[0] && $_[0]->isa("Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor")) {
$self->{'db'} = shift;
} elsif (defined $_[0]) {
throw("Need to pass a valid Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor");
}
return $self->{'db'}; } |
sub dbname
{ my ($self) = shift;
deprecate('Use $imp->db->dbname');
return $self->db->dbc->dbname;
} |
sub description
{ my $self = shift;
if (@_) {
$self->{'description'} = shift;
}
return $self->{'description'}; } |
sub design_type
{ my $self = shift;
return $self->{'design_type'}; } |
sub dump_fasta
{ my $self = shift;
$self->{'_dump_fasta'} = shift if @_;
return $self->{'_dump_fasta'}; } |
sub experiment
{ my ($self) = shift;
if (@_) {
if (! $_[0]->isa('Bio::EnsEMBL::Funcgen::Experiment')) {
throw("Must pass a Bio::EnsEMBL::Funcgen::Experiment object");
}
$self->{'experiment'} = shift;
}
return $self->{'experiment'}; } |
sub experiment_date
{ my ($self) = shift;
if (@_) {
my $date = shift;
if ($date !~ /[0-9]{4}-[0-9]{2}[0-9]{2}/o) {
throw('Parameter -experiment_date needs to fe in the format: YYYY-MM-DD');
}
$self->{'experiment_date'} = $date;
} elsif ($self->vendor() eq "nimblegen" && ! defined $self->{'experiment_date'}) {
$self->{'experiment_date'} = &get_date("date", $self->get_config("chip_file")),
}
return $self->{'experiment_date'}; } |
sub experimental_set_name
{ my $self = shift;
$self->{'experimental_set_name'} = shift if @_;
return $self->{'experimental_set_name'}; } |
sub farm
{ my ($self, $farm) = @_;
$self->{'farm'} ||= undef;
if (defined $farm) {
throw("Argument to farm must be a boolean 1 or 0") if(! ($farm == 1 || $farm == 0));
$self->{'farm'} = $farm;
}
return $self->{'farm'}; } |
sub feature_analysis
{ my ($self) = shift;
if (@_) {
my $fanal = shift;
if (! (ref ($fanal) && $fanal->isa('Bio::EnsEMBL::Analysis') && $fanal->dbID())) {
throw("Must pass a valid stored Bio::EnsEMBL::Analysis");
}
$self->{'feature_analysis'} = $fanal;
}
return $self->{'feature_analysis'}; } |
sub feature_set_description
{ my $self = shift;
$self->{'feature_set_description'} = shift if @_;
return $self->{'feature_set_description'}; } |
sub feature_type
{ my ($self) = shift;
if (@_) {
my $ftype = shift;
if (! ($ftype->isa('Bio::EnsEMBL::Funcgen::FeatureType') && $ftype->dbID())) {
throw("Must pass a valid stored Bio::EnsEMBL::Funcgen::FeatureType");
}
$self->{'feature_type'} = $ftype;
}
return $self->{'feature_type'}; } |
sub format
{ my ($self) = shift;
$self->{'format'} = shift if(@_);
return $self->{'format'}; } |
sub get_chr_seq_region_id
{ my ($self, $chr, $start, $end) = @_;
return $self->slice_adaptor->fetch_by_region("chromosome", $chr, $start, $end)->get_seq_region_id();
}
} |
sub get_config
{ my ($self, $data_name) = @_;
return $self->get_data('config', $data_name);
} |
sub get_dir
{ my ($self, $dirname) = @_;
return $self->get_data("${dirname}_dir"); } |
sub get_import_ResultSet
{ my ($self, $anal, $table_name) = @_;
if (!($anal && $anal->isa("Bio::EnsEMBL::Analysis") && $anal->dbID())) {
throw("Must provide a valid stored Bio::EnsEMBL::Analysis");
}
$self->log("Getting import $table_name ResultSet for analysis:\t".$anal->logic_name());
my ($rset, @new_chip_channels);
my $result_adaptor = $self->db->get_ResultSetAdaptor();
my $logic_name = $anal->logic_name;
my $status = ($logic_name eq "RawValue") ? "IMPORTED" : $logic_name;
if(($logic_name) eq 'RawValue' && ($table_name eq 'experimental_chip')){
throw("Cannot have an ExperimentalChip ResultSet with a RawValue analysis, either specify 'channel' or another analysis");
}
foreach my $echip (@{$self->experiment->get_ExperimentalChips()}) {
if($table_name eq 'experimental_chip'){
if ($echip->has_status($status)) { $self->log("ExperimentalChip(".$echip->unique_id().") already has status:\t".$status);
}
else {
$self->log("Found ExperimentalChip(".$echip->unique_id().") without status $status");
push @new_chip_channels, $echip;
}
}else{
foreach my $chan(@{$echip->get_Channels()}){
if ($chan->has_status($status)) { $self->log("Channel(".$echip->unique_id()."_".$self->get_config('dye_freqs')->{$chan->dye()}.") already has status:\t".$status);
}
else {
$self->log("Found Channel(".$echip->unique_id()."_".$self->get_config('dye_freqs')->{$chan->dye()}.") without status $status");
push @new_chip_channels, $chan;
}
}
}
if (( ! $rset) && @new_chip_channels) {
my(@tmp) = @{$result_adaptor->fetch_all_by_name_Analysis($self->name()."_IMPORT", $anal)};
if(scalar(@tmp) > 1){
throw('Found more than one IMPORT ResultSet for '.$self->name().'_IMPORT with analysis '.$logic_name);
}
$rset = shift @tmp;
warn("Warning: Could not find recovery ResultSet for analysis ".$logic_name) if ! $rset;
if (! $rset) {
$self->log("Generating new ResultSet for analysis ".$logic_name);
$rset = Bio::EnsEMBL::Funcgen::ResultSet->new
(
-analysis => $anal,
-table_name => $table_name,
-name => $self->name()."_IMPORT",
-feature_type => $self->feature_type(),
-cell_type => $self->cell_type(),
);
($rset) = @{$result_adaptor->store($rset)};
}
}
}
if ($self->recovery()) {
my $ec_adaptor = $self->db->get_ExperimentalChipAdaptor();
foreach my $cc(@new_chip_channels){
if($rset->contains($cc) && $rset->get_chip_channel_id($cc->dbID())){
if($table_name eq 'channel'){
my $chan_name = $ec_adaptor->fetch_by_dbID($cc->experimental_chip_id())->unique_id()."_".
$self->get_config('dye_freqs')->{$cc->dye()};
$self->log("Rolling back results for $table_name:\t".$chan_name);
}else{
$self->log("Rolling back results for $table_name:\t".$cc->unique_id);
}
$self->rollback_results([$rset->get_chip_channel_id($cc->dbID())]);
}
}
}
if ($rset) {
foreach my $cc(@new_chip_channels){
$rset->add_table_id($cc->dbID()) if(! $rset->contains($cc));
}
}
if ($rset) {
$result_adaptor->store_chip_channels($rset);
} else {
$self->log("All ExperimentalChips have status:\t$status");
}
return $rset; } |
sub get_probe_cache_by_Array
{ my ($self, $array, $from_db) = @_;
my $msg = "Getting probe cache for ".$array->name();
$msg .= " from DB" if $from_db;
$self->log($msg);
if(! ($array && $array->isa('Bio::EnsEMBL::Funcgen::Array') && $array->dbID())){
throw('Must provide a valid stored Bio::EnsEMBL::Funcgen::Array object');
}
my $set = 0;
my $cache_file = $self->get_dir('caches').'/'.$array->name().'.probe_cache';
if($from_db){
$cache_file .= '.unresolved';
if(exists $self->{'_probe_cache'}{$array->name()}){
$self->log('Rebuilding probe_cache from DB for '.$array->name(), 1);
delete $self->{'_probe_cache'}{$array->name()}; $self->log('Deleted old cache', 1);
}else{
$self->log('Building probe_cache from DB for '.$array->name(), 1);
}
my $cmd = 'SELECT name, probe_id from probe WHERE array_chip_id IN ('.join(',', @{$array->get_array_chip_ids()}).') ORDER by name, probe_id';
$cmd = 'mysql '.$self->db->connect_string()." -e\" $cmd\" >".$cache_file;
run_system_cmd($cmd);
}
if(-f $cache_file){
$self->log('MD5 check here?',1);
$self->{'_probe_cache'}{$array->name()}{'current_line'} = undef;
$self->{'_probe_cache'}{$array->name()}{'handle'} = open_file($cache_file);
$set = 1;
}
else{
warn 'Failed to get probe cache for array:'.$array->name();
}
return $set;
}
} |
sub get_probe_id_by_name_Array
{ my ($self, $name, $array) = @_;
$self->resolve_probe_data() if(! exists $self->{'_probe_cache'}{$array->name()});
my ($pid, $line);
if($line = $self->{'_probe_cache'}{$array->name()}{'current_line'}){
if($line =~ /^\Q${name}\E\t/){
$pid = (split/\t/o, $line)[1];
}
}
if(! $pid){
while($line = $self->{'_probe_cache'}{$array->name()}{'handle'}->getline()){
if($line =~ /^\Q${name}\E\t/){
$pid = (split/\t/o, $line)[1];
$self->{'_probe_cache'}{$array->name()}{'current_line'} = $line;
last;
}
}
}
if(! $pid){
throw("Did not find probe name ($name) in cache, cache may need rebuilding, results may need sorting, or do you have an anomolaous probe?")
}else{
chomp $pid;
}
return $pid; } |
sub group
{ my ($self) = shift;
$self->{'group'} = shift if(@_);
return $self->{'group'}; } |
sub host
{ my $self = shift;
$self->{'host'} = shift if(@_);
return $self->{'host'}; } |
sub hybridisation_fields
{ my $self = shift;
return ['File[raw]', 'Array[accession]', 'Array[serial]',
(map 'Protocol['.$_.']', (sort (keys %{$self->get_config('protocols')}))),
'BioSource', 'Sample', 'Extract', 'LabeledExtract', 'Immunoprecipitate', 'Hybridization',
'BioSourceMaterial', 'SampleMaterial', 'ExtractMaterial', 'LabeledExtractMaterial',
'Dye', 'BioMaterialCharacteristics[Organism]', 'BioMaterialCharacteristics[BioSourceType]',
'BioMaterialCharacteristics[StrainOrLine]', 'BioMaterialCharacteristics[CellType]',
'BioMaterialCharacteristics[Sex]', 'FactorValue[StrainOrLine]', 'FactorValue[Immunoprecipitate]']; } |
sub init_array_import
{
my ($self) = shift;
$self->create_output_dirs('caches', 'fastas'); } |
sub init_experiment_import
{ my ($self) = shift;
foreach my $tmp ("group", "data_dir") { throw("Mandatory arg $tmp not been defined") if (! defined $self->{$tmp});
}
$self->create_output_dirs('raw', 'norm', 'caches', 'fastas');
throw("No result_files defined.") if (! defined $self->result_files());
if (@{$self->result_files()}) {
$self->log("Found result files arguments:\n\t".join("\n\t", @{$self->result_files()}));
}
if($self->norm_method){
my $norm_anal = $self->db->get_AnalysisAdaptor->fetch_by_logic_name($self->norm_method);
throw($self->norm_method.' is not a valid analysis') if ! $norm_anal;
$self->norm_analysis($norm_anal);
}else{
$self->log('WARNING: No normalisation analysis specified');
}
$self->validate_group();
my $exp_adaptor = $self->db->get_ExperimentAdaptor();
my $exp = $exp_adaptor->fetch_by_name($self->name());
my $xml = $exp_adaptor->fetch_mage_xml_by_experiment_name($self->name());
if( ! $self->{'no_mage'}){
if($self->{'write_mage'} || !( -f $self->get_config('tab2mage_file') || $xml)){
$self->{'write_mage'} = 1;
$self->backup_file($self->get_config('tab2mage_file'));
}
elsif( -f $self->get_config('tab2mage_file')){
$self->backup_file($self->get_config('mage_xml_file'));
my $cmd = 'tab2mage.pl -e '.$self->get_config('tab2mage_file').' -k -t '.$self->get_dir('output').
' -c -d '.$self->get_dir('results');
$self->log('Reading tab2mage file');
my $t2m_exit_code = run_system_cmd($cmd, 1); warn "tab2mage exit code is $t2m_exit_code";
if(! ($t2m_exit_code > -1) && ($t2m_exit_code <255)){
throw("tab2mage failed. Please check and correct:\t".$self->get_config('tab2mage_file')."\n...and try again");
}
$self->{'recover'} = 1;
}
}
if ($self->recovery() && ($exp)) {
$self->log("Using previously stored Experiment:\t".$exp->name);
} elsif ((! $self->recovery()) && $exp) {
throw("Your experiment name is already registered in the database, please choose a different\" name\", this will require renaming you input directory, or specify -recover if you are working with a failed/partial import.");
} else {
$exp = Bio::EnsEMBL::Funcgen::Experiment->new(
-GROUP => $self->group(),
-NAME => $self->name(),
-DATE => $self->experiment_date(),
-PRIMARY_DESIGN_TYPE => $self->design_type(),
-DESCRIPTION => $self->description(),
-ADAPTOR => $self->db->get_ExperimentAdaptor(),
);
($exp) = @{$exp_adaptor->store($exp)};
}
$self->experiment($exp);
return; } |
sub init_tab2mage_export
{ my $self = shift;
$self->backup_file($self->get_config('tab2mage_file')) if(-f $self->get_config('tab2mage_file'));
my $t2m_file = open_file($self->get_config('tab2mage_file'), '>');
my $exp_section = "experiment section\ndomain\t".(split/@/, $self->contact())[1]."\naccession\t\n".
"quality_control\tbiological_replicate\nexperiment_design_type\tbinding_site_identification\n".
"name\t".$self->name()."\nrelease_date\t\nsubmission_date\t\nsubmitter\t???\n".
"submitter_email\t???\ninvestigator\t???\ninvestigator_email\t???\norganization\t???\naddress\t".
"???\npublication_title\t\nauthors\t\njournal\t\nvolume\t\nissue\t\npages\t\nyear\t\npubmed_id\t\n";
my $protocol_section = "Protocol section\naccession\tname\ttext\tparameters\n";
foreach my $protocol(sort (keys %{$self->get_config('protocols')})){
$protocol_section .= $self->get_config('protocols')->{$protocol}->{'accession'}.
"\t".$self->get_config('protocols')->{$protocol}->{'name'}.
"\t".$self->get_config('protocols')->{$protocol}->{'text'}."\t";
$protocol_section .= (defined $self->get_config('protocols')->{$protocol}->{'parameters'}) ?
$self->get_config('protocols')->{$protocol}->{'parameters'}."\t\n" : "\t\n";
}
my $hyb_header = "\nHybridization section\n".join("\t", @{$self->hybridisation_fields()});
print $t2m_file $exp_section."\n".$protocol_section."\n".$hyb_header."\n";
return $t2m_file;
}
} |
sub location
{ my ($self) = shift;
$self->{'location'} = shift if(@_);
return $self->{'location'}; } |
sub name
{ my ($self) = shift;
$self->{'name'} = shift if(@_);
return $self->{'name'}; } |
sub new
{ my ($caller) = shift;
my $reg = "Bio::EnsEMBL::Registry";
my $class = ref($caller) || $caller;
my ($name, $format, $vendor, $group, $location, $contact, $species,
$array_name, $array_set, $array_file, $data_dir, $result_files,
$ftype_name, $ctype_name, $exp_date, $desc, $user, $host, $port,
$pass, $dbname, $db, $assm_version, $design_type, $output_dir, $input_dir,
$farm, $ssh, $fasta, $recover, $reg_config, $write_mage, $no_mage, $eset_name,
$norm_method, $old_dvd_format, $feature_analysis, $reg_db, $parser_type,
$ucsc_coords, $verbose, $fset_desc, $release, $reg_host, $reg_port, $reg_user, $reg_pass)
= rearrange(['NAME', 'FORMAT', 'VENDOR', 'GROUP', 'LOCATION', 'CONTACT', 'SPECIES',
'ARRAY_NAME', 'ARRAY_SET', 'ARRAY_FILE', 'DATA_DIR', 'RESULT_FILES',
'FEATURE_TYPE_NAME', 'CELL_TYPE_NAME', 'EXPERIMENT_DATE', 'DESCRIPTION',
'USER', 'HOST', 'PORT', 'PASS', 'DBNAME', 'DB', 'ASSEMBLY', 'DESIGN_TYPE',
'OUTPUT_DIR', 'INPUT_DIR', 'FARM', 'SSH', 'DUMP_FASTA', 'RECOVER', 'REG_CONFIG', 'WRITE_MAGE',
'NO_MAGE', 'EXPERIMENTAL_SET_NAME', 'NORM_METHOD', 'OLD_DVD_FORMAT',
'FEATURE_ANALYSIS', 'REGISTRY_DB', 'PARSER', 'UCSC_COORDS', 'VERBOSE',
'FEATURE_SET_DESCRIPTION', 'RELEASE', 'REGISTRY_HOST', 'REGISTRY_PORT',
'REGISTRY_USER', 'REGISTRY_PASS'], @_);
throw("Mandatory argument -vendor not defined") if ! defined $vendor;
my $parser_error;
my $vendor_parser = ucfirst(lc($vendor));
eval {require "Bio/EnsEMBL/Funcgen/Parsers/${vendor_parser}.pm";};
if($@){
$parser_error .= "There is no valid parser for the vendor your have specified:\t".$vendor.
"\nMaybe this is a typo or maybe you want to specify a default import format using the -parser option\n".$@;
}
if(defined $parser_type){
eval {require "Bio/EnsEMBL/Funcgen/Parsers/${parser_type}.pm";};
if($@){
$parser_type = ucfirst(lc($parser_type));
eval {require "Bio/EnsEMBL/Funcgen/Parsers/${parser_type}.pm";};
if($@){
my $txt = "There is no valid parser for the -parser format your have specified:\t".$parser_type."\n";
if(! $parser_error){
$txt .= "Maybe this is a typo or maybe you want run with the default $vendor_parser parser\n";
}
throw($txt.$@);
}
if(! $parser_error){
warn("WARNING\t::\tYou are over-riding the default ".$vendor." parser with -parser ".$parser_type);
}
}
}
else{
throw($parser_error) if $parser_error;
$parser_type = $vendor_parser;
}
unshift @ISA, 'Bio::EnsEMBL::Funcgen::Parsers::'.$parser_type;
my $self = $class->SUPER::new(@_);
$self->{'name'} = $name || throw('Mandatory param -name not met'); $self->{'user'} = $user || $ENV{'EFG_WRITE_USER'};
$self->vendor(uc($vendor)); $self->{'format'} = uc($format) || 'TILED'; $self->group($group) if $group;
$self->location($location) if $location;
$self->contact($contact) if $contact;
$species || throw('Mandatory param -species not met');
$self->array_name($array_name) if $array_name;
$self->array_set($array_set) if $array_set;
$self->array_file($array_file) if $array_file;
$self->{'data_dir'} = $data_dir || $ENV{'EFG_DATA'};
$self->result_files($result_files)if $result_files;
$self->experiment_date($exp_date) if $exp_date;
$self->description($desc) if $desc; $self->feature_set_description($fset_desc) if $fset_desc;
$assm_version || throw('Mandatory param -assembly not met');
$self->{'design_type'} = $design_type || 'binding_site_identification'; $self->{'output_dir'} = $output_dir if $output_dir; $self->{'input_dir'} = $input_dir if $input_dir; $self->farm($farm) if $farm;
$self->{'ssh'} = $ssh || 0;
$self->{'_dump_fasta'} = $fasta || 0;
$self->{'recover'} = $recover || 0;
$self->{'reg_config'} = $reg_config || ((-f "$ENV{'HOME'}/.ensembl_init") ? "$ENV{'HOME'}/.ensembl_init" : undef);
$self->{'write_mage'} = $write_mage || 0;
$self->{'no_mage'} = $no_mage || 0;
$self->{'experimental_set_name'} = $eset_name if $eset_name;
$self->{'old_dvd_format'} = $old_dvd_format || 0;
$self->{'ucsc_coords'} = $ucsc_coords || 0;
$self->{'verbose'} = $verbose || 0;
$self->{'release'} = $release;
if($reg_host && $self->{'reg_config'}){
warn "You have specified registry parameters and a config file:\t".$self->{'reg_config'}.
"\nOver-riding config file with specified paramters:\t${reg_user}@${reg_host}:$reg_port";
}
warn "Need to fully implement norm_method is validate_mage, remove ENV NORM_METHOD?";
$self->{'norm_method'} = $norm_method || $ENV{'NORM_METHOD'};
if ($self->vendor ne 'NIMBLEGEN'){
$self->{'no_mage'} = 1;
warn "Hardcoding no_mage for non-NIMBLEGEN imports";
}
if($self->{'no_mage'} && $self->{'write_mage'}){
throw('-no_mage and -write_mage options are mutually exclusive, please select just one');
}
if ($reg_host || ! defined $self->{'_reg_config'}) {
$reg_host ||= 'ensembldb.ensembl.org';
$reg_user ||= 'anonymous';
if(! $reg_port && $reg_host eq 'ensdb-archive'){
$reg_port = 5304;
}
my $version_text= ($self->{'release'}) ? 'version '.$self->{'release'} : 'current version';
$self->log("Loading $version_text registry from $reg_user".'@'.$reg_host);
$reg->load_registry_from_db(
-host => $reg_host,
-user => $reg_user,
-port => $reg_port,
-pass => $reg_pass,
-db_version => $self->{'release'}, -verbose => $self->verbose,
);
throw('Not sensible to set the import DB as the default eFG DB from ensembldb, please define db params') if ((! $dbname) && (! $db));
}
else{
$self->log("Loading registry from:\t".$self->{'_reg_config'});
$reg->load_all($self->{'_reg_config'}, 1);
}
my $alias = $reg->get_alias($species) || throw("Could not find valid species alias for $species\nYou might want to clean up:\t".$self->get_dir('output'));
$self->species($alias);
$self->{'param_species'} = $species;
if($db){
if(! (ref($db) && $db->isa('Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor'))){
$self->throw('-db must be a valid Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor');
}
}
else{
if($reg_db){
$self->log('WARNING: Loading eFG DB from Registry');
$db = $reg->get_DBAdaptor($self->species(), 'funcgen');
throw("Unable to retrieve ".$self->species." funcgen DB from the registry") if ! $db;
}
else{
$dbname || throw('Must provide a -dbname if not using default custom registry config');
$pass || throw('Must provide a -pass parameter');
if(! defined $host){
$self->log('WARNING: Defaulting to localhost');
$host = 'localhost';
}
$port ||= 3306;
my $host_ip = '127.0.0.1';
if ($self->{'ssh'}) {
$host = `host localhost`;
if (! (exists $ENV{'EFG_HOST_IP'})) {
warn "Environment variable EFG_HOST_IP not set for ssh mode, defaulting to $host_ip for $host";
} else {
$host_ip = $ENV{'EFG_HOST_IP'};
}
if ($self->host() ne 'localhost') {
warn "Overriding host ".$self->host()." for ssh connection via localhost($host_ip)";
}
}
my $dbhost = ($self->{'ssh'}) ? $host_ip : $host;
$db = $reg->reset_DBAdaptor($self->species(), 'funcgen', $dbname, $dbhost, $port, $self->user, $pass,
{
-dnadb_host => $reg_host,
-dnadb_port => $reg_port,
-dnadb_assembly => $assm_version,
-dnadb_user => $reg_user,
-dnadb_pass => $reg_pass,
});
}
}
$self->db($db);
$db->dbc->db_handle;
$db->dnadb->dbc->db_handle;
$db->dbc->disconnect_when_inactive(1);
$db->dnadb->dbc->disconnect_when_inactive(1);
if($feature_analysis){
my $fanal = $self->db->get_AnalysisAdaptor->fetch_by_logic_name($feature_analysis);
throw("The Feature Analysis $feature_analysis does not exist in the database") if(!$fanal);
$self->feature_analysis($fanal);
}
if($ctype_name){
my $ctype = $self->db->get_CellTypeAdaptor->fetch_by_name($ctype_name);
throw("The CellType $ctype_name does not exist in the database") if(!$ctype);
$self->cell_type($ctype);
}
if ($ftype_name) {
my $ftype = $self->db->get_FeatureTypeAdaptor->fetch_by_name($ftype_name);
throw("The FeatureType $ftype_name does not exist in the database") if(!$ftype);
$self->feature_type($ftype);
}
$self->{'input_dir'} ||= $self->get_dir("data").'/input/'.$self->{'param_species'}.'/'.$self->vendor().'/'.$self->name();
throw('input_dir is not defined or does not exist ('.
$self->get_dir('input').')') if(! -d $self->get_dir('input'));
$self->set_config();
$self->debug(2, "Importer class instance created.");
$self->debug_hash(3,\$ self);
return ($self); } |
sub norm_analysis
{ my ($self) = shift;
if (@_) {
my $anal = shift;
if (! (ref($anal) && $anal->isa('Bio::EnsEMBL::Analysis') && $anal->dbID())) {
throw("Must pass a valid stored Bio::EnsEMBL::Analysis");
}
$self->{'norm_analysis'} = $anal;
}
return $self->{'norm_analysis'}; } |
sub norm_method
{ my $self = shift;
if (@_) {
$self->{'norm_method'} = shift;
} elsif (! defined $self->{'norm_method'}) {
$self->{'norm_method'}= $self->get_config('norm_method');
}
return $self->{'norm_method'}; } |
sub pass
{ my $self = shift;
$self->{'pass'} = shift if(@_);
return $self->{'pass'}; } |
sub port
{ my $self = shift;
$self->{'port'} = shift if(@_);
return $self->{'port'}; } |
sub read_data
{ my($self, $data_type) = @_;
map {my $method = "read_${_}_data"; $self->$method()} @{$self->get_config("${data_type}_data")};
return; } |
sub recovery
{ my $self = shift;
$self->{'recover'} = shift if(@_);
return $self->{'recover'}; } |
sub register_experiment
{ my ($self) = shift;
if (@_) {
if ( ! $_[0]->isa("Bio::EnsEMBL::DBSQL::DBAdaptor")) {
throw("You need to pass a valid dnadb adaptor to register the experiment");
}
$self->db->dnadb($_[0]);
} elsif ( ! $self->db) {
throw("You need to pass/set a DBAdaptor with a DNADB attached of the relevant data version");
}
$self->init_experiment_import();
if($self->{'write_mage'} || $self->{'no_mage'}){
$self->read_data("array");
if(! $self->{'no_mage'}){
$self->log("PLEASE CHECK AND EDIT AUTOGENERATED TAB2MAGE FILE:\t".$self->get_config('tab2mage_file'));
$self->log('REMEMBER TO REMOVE -write_mage FLAG BEFORE UPDATING');
exit;
}
}
elsif(! $self->{'no_mage'}){ $self->validate_mage() if (! $self->{'skip_validate'});
}
$self->read_data("probe");
$self->read_data("results");
my $norm_method = $self->norm_method();
if (defined $norm_method) {
$self->R_norm($norm_method);
}
return; } |
sub registry_host
{ return $_[0]->{'reg_host'}; } |
sub registry_pass
{ return $_[0]->{'reg_pass'};
}
} |
sub registry_port
{ return $_[0]->{'reg_port'}; } |
sub registry_user
{ return $_[0]->{'reg_user'}; } |
sub resolve_probe_data
{ my $self = shift;
$self->log("Resolving probe data", 1);
warn "Probe cache resolution needs to accomodate probesets too!";
foreach my $array(@{$self->arrays()}){
my $resolve = 0;
if($self->get_probe_cache_by_Array($array)){
foreach my $achip(@{$array->get_ArrayChips()}){
if($achip->has_status('RESOLVED')){
$self->log("ArrayChip has RESOLVED status:\t".$achip->design_id()); next;
}else{
$self->log("Found un-RESOLVED ArrayChip:\t".$achip->design_id());
$resolve = 1;
last;
}
}
}else{ $resolve = 1;
$self->log('No probe cache found for array '.$array->name());
}
if($resolve){
$self->log('Resolving array duplicates('.$array->name().') and rebuilding probe cache.', 1);
$self->get_probe_cache_by_Array($array, 1);
my ($line, $name, $pid, @pids);
my $tmp_name = '';
my $tmp_id = '';
while ($line = $self->{'_probe_cache'}{$array->name}{'handle'}->getline()){
($name, $pid) = split/\t/o, $line;
if($name eq $tmp_name){
if($pid != $tmp_id){
push @pids, $pid;
}
}
elsif($name ne $tmp_name){ $self->tidy_duplicates(\@pids) if(scalar(@pids) > 1);
$tmp_name = $name;
$tmp_id = $pid;
@pids = ($pid);
}
}
$self->tidy_duplicates(\@pids) if(scalar(@pids) > 1);
my $cmd = 'mv '.$self->get_dir('caches').'/'.$array->name().'.probe_cache.unresolved '.
$self->get_dir('caches').'/'.$array->name().'.probe_cache';
run_system_cmd($cmd);
$self->get_probe_cache_by_Array($array);
foreach my $achip(@{$array->get_ArrayChips()}){
if(! $achip->has_status('RESOLVED')){
$self->log("Updating ArrayChip to RESOLVED status:\t".$achip->design_id());
$achip->adaptor->store_status('RESOLVED', $achip);
}
}
$self->log('Finished building probe cache for '.$array->name(), 1);
}
}
$self->log('Finished resolving probe data', 1);
return; } |
sub result_files
{ my ($self) = shift;
$self->{'result_files'} = shift if(@_);
return $self->{'result_files'}; } |
sub slice_adaptor
{ my $self = shift;
if (! defined $self->{'slice_adaptor'}) {
$self->{'slice_adaptor'} = $self->db->get_SliceAdaptor();
}
return $self->{'slice_adaptor'}; } |
sub species
{ my $self = shift;
$self->{'species'} = shift if(@_);
return $self->{'species'}; } |
sub store_set_probes_features
{ my ($self, $ac_id, $pf_hash, $ops) = @_;
if ($ops) {
$ops->size(scalar(keys %$pf_hash));
($ops) = $self->db->get_ProbeSetAdaptor->store($ops);
}
for my $probe_id (keys %$pf_hash) {
my $probe = $pf_hash->{$probe_id}->{'probe'};
$probe->probeset($ops) if $ops;
($probe) = @{$self->db->get_ProbeAdaptor->store($probe)};
foreach my $feature (@{$pf_hash->{$probe_id}->{'features'}}) {
$feature->probe($probe);
($feature) = @{$self->db->get_ProbeFeatureAdaptor->store($feature)};
}
}
undef $ops; undef %{$pf_hash};
return; } |
sub tidy_duplicates
{ my ($self, $pids) = @_;
my $pfa = $self->db->get_ProbeFeatureAdaptor();
my ($feature, %features);
foreach my $dup_id(@$pids){
foreach $feature(@{$pfa->fetch_all_by_Probe_id($dup_id)}){
push @{$features{$feature->seq_region_name().':'.$feature->start()}}, $feature;
}
}
my (@reassign_ids, @delete_ids);
foreach my $seq_start_key(keys %features){
my $reassign_features = 1;
foreach $feature(@{$features{$seq_start_key}}){
if($feature->probe_id() == $pids->[0]){
$reassign_features = 0;
}else{
push @delete_ids, $feature->dbID();
}
}
if($reassign_features){
my $new_fid = pop @delete_ids;
push @reassign_ids, $new_fid;
}
}
$pfa->reassign_features_to_probe(\@reassign_ids, $pids->[0]) if @reassign_ids;
$pfa->delete_features(\@delete_ids) if @delete_ids;
return;
}
1; } |
sub ucsc_coords
{ my $self = shift;
return $self->{'ucsc_coords'}; } |
sub user
{ my $self = shift;
$self->{'user'} = shift if(@_);
return $self->{'user'}; } |
sub validate_group
{ my ($self) = shift;
my $group_ref = $self->db->fetch_group_details($self->group());
if (! $group_ref) {
if ($self->location() && $self->contact()) {
$self->db->import_group($self->group(), $self->location, $self->contact());
} else {
throw("Group ".$self->group()." does not exist, please specify a location and contact to register the group");
}
}
return; } |
sub validate_mage()
{ my ($self, $mage_xml, $update) = @_;
$self->log("Validating mage file:\t".$self->get_config('mage_xml_file'));
my (%echips, @log);
my $rset_adaptor = $self->db->get_ResultSetAdaptor;
my $chan_anal = $self->db->get_AnalysisAdaptor->fetch_by_logic_name('RawValue');
my $chip_anal = $self->db->get_AnalysisAdaptor->fetch_by_logic_name($self->norm_method());
my $chan_rset = $self->get_import_ResultSet($chan_anal, 'channel');
my $rset = $self->get_import_ResultSet($chip_anal, 'experimental_chip');
if(! $rset){
if($chan_rset){
$self->log('Identified partial Channel only import, updating MAGE-XML');
}
else{
($chan_rset) = @{$rset_adaptor->fetch_all_by_name_Analysis($self->experiment->name.'_IMPORT', $chan_anal)};
$self->log('All ExperimentalChips imported, updating MAGE-XML only');
}
($rset) = @{$rset_adaptor->fetch_all_by_name_Analysis($self->experiment->name.'_IMPORT', $chip_anal)};
}
if(! $rset){
throw('Cannot find ResultSet, are you trying to import a new experiment which already has a tab2mage file present? Try removing the file, or specifying the -write_mage flag to parse_and_import.pl');
}
if(! -l $self->get_dir('output').'/MAGE-ML.dtd'){
system('ln -s '.$ENV{'EFG_DATA'}.'/MAGE-ML.dtd '.$self->get_dir('output').'/MAGE-ML.dtd') == 0 ||
throw('Failed to link MAGE-ML.dtd');
}
$self->log('VALIDATING MAGE XML');
my $reader = Bio::MAGE::XML::Reader->new();
$mage_xml ||= $self->get_config('mage_xml_file');
$self->{'mage'} = $reader->read($mage_xml);
foreach my $mage_exp(@{$self->{'mage'}->getExperiment_package->getExperiment_list()}){
if($mage_exp->getName() ne $self->name()){
$self->log('MAGE experiment name ('.$mage_exp->getName().') does not match import name ('.$self->name().')');
}
foreach my $assay (@{$mage_exp->getBioAssays()}){
if($assay->isa('Bio::MAGE::BioAssay::PhysicalBioAssay')){ $self->log('Validating PhysicalBioAssay "'.$assay->getName()."'\n");
my $bioassc = $assay->getBioAssayCreation(); my $array = $bioassc->getArray(); my $design_id = $array->getArrayDesign->getIdentifier();
my $chip_uid = $array->getArrayIdentifier();
foreach my $echip(@{$rset->get_ExperimentalChips()}){
if($echip->unique_id() eq $chip_uid){
$self->log("Found ExperimentalChip:\t".$chip_uid);
if(! exists $echips{$chip_uid}){
$echips{$chip_uid} = {(
total_biorep => undef,
total_biotechrep => undef,
experimental_biorep => undef,
experimental_biotechrep => undef,
total_dye => undef,
experimental_dye => undef,
cell_type => undef,
feature_type => undef,
)};
}
my ($achip) = @{$self->db->get_ArrayChipAdaptor->fetch_all_by_ExperimentalChips([$echip])};
if($achip->design_id() ne $design_id){
push @log, "ArrayDesign Identifier (${design_id}) does not match ArrayChip design ID (".
$achip->design_id().")\n\tSkipping channel and replicate validation";
}
else {
foreach my $src_biomat (@{$bioassc->getSourceBioMaterialMeasurements()}) { my $biomat = $src_biomat->getBioMaterial();
foreach my $treat (@{$biomat->getTreatments()}) {
foreach my $ssrc_biomat (@{$treat->getSourceBioMaterialMeasurements()}) { my $sbiomat = $ssrc_biomat->getBioMaterial();
if ($sbiomat->getName() =~ /BR[0-9]+_TR[0-9]+$/o) {
if (! defined $echips{$chip_uid}{'total_biotechrep'}) {
$echips{$chip_uid}{'total_biotechrep'} = $sbiomat->getName();
}
else{
push @log, "Found two TOTAL Channels on same chip with biotechreps:\t".$sbiomat->getName().
" and ".$echips{$chip_uid}{'total_biotechrep'};
}
}else{
my $fv_ref = $assay->getBioAssayFactorValues();
if(! defined $fv_ref){
throw('No FactorValues found, you must populate the "Immunoprecipitate" field. Maybe you forgot to specify -feature_type?');
}
my ($feature_type);
foreach my $fvalue(@{$fv_ref}){
if($fvalue->getValue()->getCategory() eq 'Immunoprecipitate'){
$feature_type = $fvalue->getName();
$feature_type =~ s/anti\s*-\s*//;
$feature_type =~ s/\s*antibody\s*//;
}
}
$echips{$chip_uid}{'feature_type'} = $feature_type;
}
foreach my $ttreat (@{$sbiomat->getTreatments()}) {
foreach my $tsrc_biomat (@{$ttreat->getSourceBioMaterialMeasurements()}) {
my $tbiomat = $tsrc_biomat->getBioMaterial();
if ($tbiomat->getName() =~ /BR[0-9]+_TR[0-9]+$/o) {
if (! defined $echips{$chip_uid}{'experimental_biotechrep'}) {
$echips{$chip_uid}{'experimental_biotechrep'} = $tbiomat->getName();
}
else{
push @log, "Found two EXPERIMENTAL Channels on same chip with biotechreps:\t".$tbiomat->getName().
" and ".$echips{$chip_uid}{'experimental_biotechrep'};
}
my $dye = $biomat->getLabels()->[0]->getName();
foreach my $chan (@{$echip->get_Channels()}) {
if ($chan->type() eq 'EXPERIMENTAL') {
if (uc($dye) ne uc($chan->dye())) {
push @log, "EXPERIMENTAL channel dye mismatch:\tMAGE = ".uc($dye).' vs DB '.uc($chan->dye);
} else {
$echips{$chip_uid}{'experimental_dye'} = uc($dye);
}
}
}
}
else {
if (! defined $echips{$chip_uid}{'total_biorep'}) {
$echips{$chip_uid}{'total_biorep'} = $tbiomat->getName();
}
else{
push @log, "Found two TOTAL Channels on same chip with biotechreps:\t".$tbiomat->getName().
" and ".$echips{$chip_uid}{'total_biorep'};
}
my $dye = $biomat->getLabels()->[0]->getName();
foreach my $chan (@{$echip->get_Channels()}) {
if ($chan->type() eq 'TOTAL') {
if (uc($dye) ne uc($chan->dye())) {
push @log, "TOTAL channel dye mismatch:\tMAGE = ".uc($dye).' vs DB '.uc($chan->dye);
}
else {
$echips{$chip_uid}{'total_dye'} = uc($dye);
}
}
}
}
foreach my $ftreat (@{$tbiomat->getTreatments()}) {
foreach my $fsrc_biomat (@{$ftreat->getSourceBioMaterialMeasurements()}) {
my $fbiomat = $fsrc_biomat->getBioMaterial();
my $cell_type;
if($fbiomat->getName() =~ /BR[0-9]+$/o){
if(! defined $echips{$chip_uid}{'experimental_biorep'}){
$echips{$chip_uid}{'experimental_biorep'} = $fbiomat->getName();
}
else{
push @log, "Found two Experimental Channels on same chip with bioreps:\t".$fbiomat->getName().
" and ".$echips{$chip_uid}{'experimental_biorep'};
}
foreach my $xtreat (@{$fbiomat->getTreatments()}) {
foreach my $xsrc_biomat (@{$xtreat->getSourceBioMaterialMeasurements()}) {
my $xbiomat = $xsrc_biomat->getBioMaterial();
foreach my $char(@{$xbiomat->getCharacteristics()}){
$cell_type = $char->getValue() if($char->getCategory() eq 'CellType');
}
}
}
}else{
foreach my $char(@{$fbiomat->getCharacteristics()}){
$cell_type = $char->getValue() if($char->getCategory() eq 'CellType');
}
}
if(! defined $echips{$chip_uid}{'cell_type'}){
$echips{$chip_uid}{'cell_type'} = $cell_type;
}
elsif( $echips{$chip_uid}{'cell_type'} ne $cell_type){
push @log, "Found Channels on same chip (${chip_uid}) with different cell types:\t".
$cell_type." and ".$echips{$chip_uid}{'cell_type'};
}
}
}
}
}
}
}
}
}
} } } } }
my (%bio_reps, %tech_reps);
my $ct_adaptor = $self->db->get_CellTypeAdaptor();
my $ft_adaptor = $self->db->get_FeatureTypeAdaptor();
foreach my $echip (@{$rset->get_ExperimentalChips()}) {
my ($biorep, $biotechrep);
if (! exists $echips{$echip->unique_id()}) {
push @log, "No MAGE entry found for ExperimentalChip:\t".$echip->unique_id();
}
else {
foreach my $chan_type('total', 'experimental'){
$biorep = $echips{$echip->unique_id()}{$chan_type.'_biorep'};
$biotechrep = $echips{$echip->unique_id()}{$chan_type.'_biotechrep'};
if (! defined $biotechrep) {
push @log, 'ExperimentalChip('.$echip->unique_id().') Extract field do not meet naming convention(SAMPLE_BRN_TRN)';
} elsif ($biotechrep !~ /\Q$biorep\E/) {
push @log, "Found Extract(techrep) vs Sample(biorep) naming mismatch\t${biotechrep}\tvs\t$biorep";
}
if ( ! $echips{$echip->unique_id()}{$chan_type.'_dye'}) {
push @log, "No ".uc($chan_type)." channel found for ExperimentalChip:\t".$echip->unique_id();
}
}
if($echips{$echip->unique_id()}{'experimental_biorep'} ne $echips{$echip->unique_id()}{'total_biorep'}){
push @log, "Found biorep mismatch between channels of ExperimentalChip ".$echip->unique_id().":\n".
"\tEXPERIMENTAL\t".$echips{$echip->unique_id()}{'experimental_biorep'}."\tTOTAL\t".
$echips{$echip->unique_id()}{'total_biorep'};
}
if($echips{$echip->unique_id()}{'experimental_biotechrep'} ne $echips{$echip->unique_id()}{'total_biotechrep'}){
push @log, "Found biotechrep mismatch between channels of ExperimentalChip ".$echip->unique_id().":\n".
"\tEXPERIMENTAL\t".$echips{$echip->unique_id()}{'experimental_biotechrep'}."\tTOTAL\t".
$echips{$echip->unique_id()}{'total_biotechrep'};
}
}
if(exists $bio_reps{$biorep}){
if(! defined $bio_reps{$biorep}{'cell_type'}){
push @log, "Found undefined CellType for biorep $biorep";
}
elsif($bio_reps{$biorep}{'cell_type'}->name() ne $echips{$echip->unique_id()}{'cell_type'}){
push @log, "Found CellType mismatch between $biorep and ExperimentalChip ".$echip->unique_id();
}
if(! defined $bio_reps{$biorep}{'feature_type'}){
push @log, "Found undefined FeatureType for biorep $biorep";
}
elsif($bio_reps{$biorep}{'feature_type'}->name() ne $echips{$echip->unique_id()}{'feature_type'}){
push @log, "Found FeatureType mismatch between $biorep and ExperimentalChip ".$echip->unique_id();
}
if(! exists $tech_reps{$biotechrep}){
$tech_reps{$biotechrep}{'cell_type'} = $bio_reps{$biorep}{'cell_type'};
$tech_reps{$biotechrep}{'feature_type'} = $bio_reps{$biorep}{'feature_type'};
}
}else{
if(defined $echips{$echip->unique_id()}{'cell_type'}){
my $cell_type = $ct_adaptor->fetch_by_name($echips{$echip->unique_id()}{'cell_type'});
if(! defined $cell_type){
push @log, 'CellType '.$echips{$echip->unique_id()}{'cell_type'}.' does not exist in the database, please use the import_type.pl script';
}else{
$bio_reps{$biorep}{'cell_type'} = $cell_type;
$tech_reps{$biotechrep}{'cell_type'} = $cell_type;
}
}else{
warn "No CellType specified for ExperimentalChip:\t".$echip->unique_id()."\n";
}
if(defined $echips{$echip->unique_id()}{'feature_type'}){
my $feature_type = $ft_adaptor->fetch_by_name($echips{$echip->unique_id()}{'feature_type'});
if(! defined $feature_type){
push @log, 'FeatureType '.$echips{$echip->unique_id()}{'feature_type'}.' does not exist in the database, please use the import_type.pl script';
}
else{
$bio_reps{$biorep}{'feature_type'} = $feature_type;
$tech_reps{$biotechrep}{'feature_type'} = $feature_type;
}
}else{
warn "No FeatureType specified for ExperimentalChip:\t".$echip->unique_id()."\n";
}
}
push @{$tech_reps{$biotechrep}{'echips'}}, $echip->unique_id();
push @{$bio_reps{$biorep}{'echips'}}, $echip->unique_id();
}
if (@log) {
$self->log("MAGE VALIDATION REPORT\n::\t".join("\n::\t", @log));
throw("MAGE VALIDATION FAILED\nPlease correct tab2mage file and try again:\t".$self->get_config('tab2mage_file'));
} else {
$self->log('MAGE VALDIATION SUCCEEDED');
}
my (%rsets);
my %types = (
feature => {},
cell => {},
);
my $eca = $self->db->get_ExperimentalChipAdaptor();
foreach my $echip (@{$rset->get_ExperimentalChips()}) {
my ($cell_type, $feature_type);
foreach my $biorep (keys %bio_reps){
foreach my $chip_uid(@{$bio_reps{$biorep}{'echips'}}){
if($chip_uid eq $echip->unique_id()){
$echip->biological_replicate($biorep);
$cell_type = $bio_reps{$biorep}{'cell_type'};
$feature_type = $bio_reps{$biorep}{'feature_type'};
if(! defined $rsets{$biorep}){
$rsets{$biorep} = Bio::EnsEMBL::Funcgen::ResultSet->new
(
-NAME => $biorep, -ANALYSIS => $rset->analysis(),
-TABLE_NAME => 'experimental_chip',
-FEATURE_TYPE => $feature_type,
-CELL_TYPE => $cell_type,
);
$types{'feature'}{$feature_type->name()} = $feature_type;
$types{'cell'}{$cell_type->name()} = $cell_type;
$self->log("Created BioRep ResultSet:\t".$rsets{$biorep}->log_label);
}
$rsets{$biorep}->add_table_id($echip->dbID(), $rset->get_chip_channel_id($echip->dbID()));
}
}
}
$echip->feature_type($feature_type);
$echip->cell_type($cell_type);
foreach my $techrep(keys %tech_reps){
foreach my $chip_uid(@{$tech_reps{$techrep}{'echips'}}){
if($chip_uid eq $echip->unique_id()){
$echip->technical_replicate($techrep);
if(! defined $rsets{$techrep}){
$rsets{$techrep} = Bio::EnsEMBL::Funcgen::ResultSet->new
(
-NAME => $techrep, -ANALYSIS => $rset->analysis(),
-TABLE_NAME => 'experimental_chip',
-FEATURE_TYPE => $tech_reps{$techrep}{'feature_type'},
-CELL_TYPE => $tech_reps{$techrep}{'cell_type'},
);
$self->log("Created TechRep ResultSet:\t".$rsets{$techrep}->log_label);
}
$rsets{$techrep}->add_table_id($echip->dbID(), $rset->get_chip_channel_id($echip->dbID()));
}
}
}
$echip->adaptor->update_replicate_types($echip); }
my $sql;
if(scalar keys %{$types{'feature'}} >1){
$self->log('Resetting IMPORT FeatureType to NULL for multi-FeatureType Experiment');
$sql = "UPDATE result_set set feature_type_id='NULL' where result_set_id in (".$rset->dbID().', '.$chan_rset->dbID().')';
}else{
my ($ftype) = values %{$types{'feature'}};
if(! defined $rset->feature_type()){
$self->log('Updating IMPORT FeatureType to '.$ftype->name());
$sql = "UPDATE result_set set feature_type_id=".$ftype->dbID()." where result_set_id in (".$rset->dbID().', '.$chan_rset->dbID().')';
}
elsif($rset->feature_type->dbID ne $ftype->dbID()){
warn 'FeatureType mismatch between IMPORT sets('.$rset->feature_type->name().') vs meta sets('.$ftype->name.
"\nUpdating to IMPORT to match meta";
$self->log('WARNING: FeatureType mismatch. Updating IMPORT FeatureType('.$rset->feature_type->name().') to match meta('.$ftype->name.')');
$sql = "UPDATE result_set set feature_type_id=".$ftype->dbID()." where result_set_id in (".$rset->dbID().', '.$chan_rset->dbID().')';
}
}
$self->db->dbc->do($sql) if $sql;
undef $sql;
if(scalar keys %{$types{'cell'}} >1){
$self->log('Resetting IMPORT CellType to NULL for multi-CellType Experiment');
my $sql = "UPDATE result_set set cell_type_id='NULL' where result_set_id in (".$rset->dbID().', '.$chan_rset->dbID().')';
}else{
my ($ctype) = values %{$types{'cell'}};
if(! defined $rset->cell_type()){
$self->log('Updating IMPORT CellType to '.$ctype->name());
$sql = "UPDATE result_set set cell_type_id=".$ctype->dbID()." where result_set_id in (".$rset->dbID().', '.$chan_rset->dbID().')';
}
elsif($rset->cell_type->dbID ne $ctype->dbID()){
warn 'CellType mismatch between IMPORT sets('.$rset->cell_type->name().') vs meta sets('.$ctype->name.
"\nUpdating to IMPORT to match meta";
$self->log('WARNING: FeatureType mismatch. Updating IMPORT CellType('.$rset->cell_type->name().') to match meta('.$ctype->name.')');
$sql = "UPDATE result_set set cell_type_id=".$ctype->dbID()." where result_set_id in (".$rset->dbID().', '.$chan_rset->dbID().')';
}
}
$self->db->dbc->do($sql) if $sql;
my %toplevel_sets;
my $toplevel_cnt = 1;
foreach my $new_rset(values %rsets){
my $ftype_name = (defined $new_rset->{'feature_type'}) ? $new_rset->{'feature_type'}->name() : undef;
my $ctype_name = (defined $new_rset->{'cell_type'}) ? $new_rset->{'cell_type'}->name() : undef;
if(! exists $toplevel_sets{$ftype_name}){
$toplevel_sets{$ftype_name} = {};
$toplevel_sets{$ftype_name}{'feature_type'} = $new_rset->{'feature_type'};
}
if(! exists $toplevel_sets{$ftype_name}{$ctype_name}){
$toplevel_sets{$ftype_name}{$ctype_name}{'cell_type'} = $new_rset->{'cell_type'};
$toplevel_sets{$ftype_name}{$ctype_name}{'rsets'} = [$new_rset];
}else{
push @{$toplevel_sets{$ftype_name}{$ctype_name}{'rsets'}}, $new_rset;
}
}
foreach my $ftype_name(keys %toplevel_sets){
foreach my $ctype_name(keys %{$toplevel_sets{$ftype_name}}){
next if $ctype_name eq 'feature_type';
$rsets{$self->experiment->name().'_'.$toplevel_cnt} = Bio::EnsEMBL::Funcgen::ResultSet->new
(
-NAME => $self->experiment->name(),
-ANALYSIS => $rset->analysis(),
-TABLE_NAME => 'experimental_chip',
-FEATURE_TYPE => $toplevel_sets{$ftype_name}{'feature_type'},
-CELL_TYPE => $toplevel_sets{$ftype_name}{$ctype_name}{'cell_type'},
);
$self->log("Created toplevel ResultSet for:\t". $rsets{$self->experiment->name().'_'.$toplevel_cnt}->log_label);
foreach my $new_rset(@{$toplevel_sets{$ftype_name}{$ctype_name}{'rsets'}}){
foreach my $ec_id(@{$new_rset->table_ids()}){
if(! $rsets{$self->experiment->name().'_'.$toplevel_cnt}->get_chip_channel_id($ec_id)){
$rsets{$self->experiment->name().'_'.$toplevel_cnt}->add_table_id($ec_id, $new_rset->get_chip_channel_id($ec_id));
}
}
}
$toplevel_cnt++;
}
}
my @previous_rep_sets;
my @supporting_rset_dsets;
map {push @previous_rep_sets, $_ if $_->name !~ /_IMPORT$/}
@{$rset_adaptor->fetch_all_by_Experiment_Analysis($self->experiment, $chip_anal)};
if(@previous_rep_sets){
$self->log('Found previously stored ResultSets');
foreach my $prev_rset(@previous_rep_sets){
my $rset_dset = $self->rollback_ResultSet($prev_rset);
push @supporting_rset_dsets, $rset_dset if @$rset_dset;
}
}
$self->log('Storing ResultSets');
foreach my $new_rset(values %rsets){
my $replace_txt;
foreach my $prs(@supporting_rset_dsets){
my ($pset, $dset) = @$prs;
if($pset->log_label eq $new_rset->log_label){
$self->log("Found update supporting ResultSet clash, renaming to:\tOLD_".$rset->log_label);
my $sql = 'UPDATE result_set set name="OLD_'.$rset->name.'" where result_set_id='.$pset->dbID;
$self->db->dbc->do($sql);
if($dset->product_FeatureSet){
$self->log('Associated DataSet('.$dset->name.') has already been processed. It is not wise to replace a supporting set without first rolling back the FeatureSet, as there may be additional supporting data');
warn 'Associated DataSet('.$dset->name.') has already been processed. It is not wise to replace a supporting set without first rolling back the FeatureSet, as there may be additional supporting data';
}
$replace_txt = 'Proposed ResultSet(dbID) replacement for DataSet('.$dset->name."):\t".$pset->dbID.' > ';
}
}
$new_rset->add_status('DAS_DISPLAYABLE');
my ($new_rset) = @{$rset_adaptor->store($new_rset)};
if(defined $replace_txt){
$self->log($replace_txt.$new_rset->dbID);
}
}
my $xml_file = open_file($self->get_config('mage_xml_file'));
$self->experiment->mage_xml(do{ local ($/); <$xml_file>});
close($xml_file);
$self->experiment($self->db->get_ExperimentAdaptor->update_mage_xml_by_Experiment($self->experiment()));
return; } |
sub vendor
{ my ($self) = shift;
if(@_){
$self->{'vendor'} = shift;
$self->{'vendor'} = uc($self->{'vendor'});
}
return $self->{'vendor'}; } |
sub verbose
{ my ($self) = shift;
$self->{'verbose'} = shift if(@_);
return $self->{'verbose'}; } |
sub vsn_norm
{ my $self = shift;
return $self->R_norm("VSN_GLOG"); } |
General documentation
Example : $imp->validate_mage() if(! $imp->{'write_mage'};
Description: Validates auto-generated and manually edited mage against
Experiment information, aswell as checking replicate defitions.
Updates mage_xml table and replicate information accordingly.
Any other differences are logged or an error is thrown if the
difference is deemed critical.
Returntype : none
Exceptions : throws if ...?
Caller : Bio::EnsEMBL::Funcgen::Importer
Status : At risk