Archive Ensembl HomeArchive Ensembl Home
Home > Help & Documentation

Database Schema

This document refers to the schema of the Ensembl Funcgen version 51.


Introduction

The Ensembl Funcgen (eFG) DB can be thought of as containing three distinct types of inter-related data and a set of association data which connects these types. Alongside these there are also tables to facilitate the interaction between eFG and the core database to enable coordinate or 'seq_region' information to be stored about the various features withing eFG.

  • Experimental data - This includes meta data about a given experiment alongside the raw experimental data.

  • Platform data - Experimental platform technology meta data and the necessary information to interpret experimental data produced using a given platform. e.g. Array designs

  • Processed data - This consitutes any features which have been generated from experimental data, externally curated data or combined analyses of both.

  • Association data - Some key tables which provides links between experimental data and processed data.

  • Coordinate data - A registry of coordinate data to allow features to be stored on multiple coordinate systems originating from different version of the core DB.


List of Tables


General Tables

feature_type

Contains information about genomic features used in various tables within the schema e.g. the feature, set or experimental_chip tables

Table descriptionField information
+-----------------+---------------------------------------------+------+-----+---------+----------------+
| Field           | Type                                        | Null | Key | Default | Extra          |
+-----------------+---------------------------------------------+------+-----+---------+----------------+
| feature_type_id | int(10) unsigned                            | NO   | PRI | NULL    | auto_increment |
| name            | varchar(40)                                 | NO   | MUL |         |                |
| class           | enum('Insulator','DNA','Regulatory Feature',|      |     |         |                |
|                 | 'Histone','RNA','Polymerase',               |      |     |         |                |
|                 | 'Transcription Factor',                     |      |     |         |                | 
|                 | 'Transcription Factor Complex','Overlap',   |      |     |         |                |
|                 | 'Regulatory Motif','Region','Enhancer')     | YES  |     | NULL    |                |
| description     | varchar(255)                                | YES  |     | NULL    |                |
+-----------------+---------------------------------------------+------+-----+---------+----------------+


Common abbreviated name, nomeclatures/ontologies used where possible 
e.g. HGNC names, Brno histone nomenclatures etc.

General class of feature



Here is an exmaple of some histone feature types:

mysql> select * from feature_type where class='HISTONE' limit 10;
+-----------------+----------+---------+-------------------------------------+
| feature_type_id | name     | class   | description                         |
+-----------------+----------+---------+-------------------------------------+
|               2 | H4ac     | Histone | Histone 4 Acetylation               |
|               3 | H3ac     | Histone | Histone 3 Acetylation               |
|               1 | H3K9ac   | Histone | Histone 3 Lysine 9 Acetylation      |
|               4 | H3K4me3  | Histone | Histone 3 Lysine 4 Tri-Methylation  |
|               5 | H3K4me2  | Histone | Histone 3 Lysine 4 Di-Methylation   |
|               6 | H3K4me1  | Histone | Histone 3 Lysine 4 Mono-Methylation |
|              10 | H4K20me3 | Histone | Histone 4 Lysine 20 Tri-Methylation |
|              11 | H3K27me3 | Histone | Histone 3 Lysine 27 Tri-Methylation |
|              12 | H3K36me3 | Histone | Histone 3 Lysine 36 Tri-Methylation |
|              13 | H3K79me3 | Histone | Histone 3 Lysine 79 Tri-Methylation |
+-----------------+----------+---------+-------------------------------------+

associated_feature_type

This is a simple link table to capture many to many relationships between feature_type and any feature table.

Table descriptionField information
+-----------------+-------------------------------------------+------+-----+-----------+-------+
| Field           | Type                                      | Null | Key | Default   | Extra |
+-----------------+-------------------------------------------+------+-----+-----------+-------+
| feature_id      | int(10) unsigned                          | NO   | PRI |           |       | 
| feature_table   | enum('annotated','external','regulatory') | NO   | PRI | annotated |       | 
| feature_type_id | int(10) unsigned                          | NO   | PRI |           |       | 
+-----------------+-------------------------------------------+------+-----+-----------+-------+

cell_type

Contains information about cell/tissue types used in various tables within the schema e.g. the feature, set or experimental_chip tables

Table descriptionField information
+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| cell_type_id  | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| name          | varchar(120)     | NO   | UNI |         |                |
| display_label | varchar(20)      | YES  |     | NULL    |                |
| description   | varchar(40)      | YES  |     | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+


Common abbreviated name

Here are a few examples of cell_type records:

mysql> select * from cell_type limit 5;
+--------------+---------+---------------+------------------------------------------+
| cell_type_id | name    | display_label | description                              |
+--------------+---------+---------------+------------------------------------------+
|            1 | HeLa    | NULL          | Human Epithelial Carcinoma Cells         |
|            2 | GM06990 | NULL          | Human B-Lymphocyte Cells                 |
|            3 | U2OS    | NULL          | Human Bone Osteosarcoma Epithelial Cells |
|            4 | CD4     | NULL          | Human CD4 T-Cells                        |
|            5 | IMR90   | NULL          | Human Fetal Lung Fibroblast              |
+--------------+---------+---------------+------------------------------------------+

status

A table for recording generic status entries for different table entries. e.g. experimental_chip, result_set.

Table descriptionField information
+----------------+------------------+------+-----+---------+-------+
| Field          | Type             | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+-------+
| table_id       | int(10) unsigned | NO   | PRI | 0       |       |
| table_name     | varchar(20)      | NO   | PRI |         |       |
| status_name_id | int(10)          | NO   | PRI | 0       |       |
+----------------+------------------+------+-----+---------+-------+




e.g. experimental_chip, result_set.



status_name

Contains the list of valid status names which can be assigned to variosu table records.

Table description
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| status_name_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| name           | varchar(20)      | YES  | UNI | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+

The status and status_name tables allow multiple states to be assigned to an given record. The following table shows the current status to table associations.

mysql> select distinct(s.table_name), sn.name from status_name sn left join status s on s.status_name_id=sn.status_name_id;
+-------------------+-----------------+
| table_name        | name            |
+-------------------+-----------------+
| NULL              | DAS DISPLAYABLE |
| result_set        | DISPLAYABLE     |
| data_set          | DISPLAYABLE     |
| feature_set       | DISPLAYABLE     |
| experimental_chip | DISPLAYABLE     |
| array_chip        | IMPORTED        |
| channel           | IMPORTED        |
| array_chip        | RESOLVED        |
| experimental_chip | VSN_GLOG        |
| experimental_chip | Parzen          |
+-------------------+-----------------+


Used for automatic display of DAS tracks


Used for display in Ensembl web browser tracks

Provides tracking of import processes

Tracks resolution of array probe import
Tracks which analyses have been performed
Tracks which analyses have been performed

analysis

This table is identical to the core analysis table. It stores information about analyses which can be assigned to features or feature sets.

Table description
+-----------------+------------------+------+-----+---------------------+----------------+
| Field           | Type             | Null | Key | Default             | Extra          |
+-----------------+------------------+------+-----+---------------------+----------------+
| analysis_id     | int(10) unsigned | NO   | PRI | NULL                | auto_increment |
| created         | datetime         | NO   |     | 0000-00-00 00:00:00 |                |
| logic_name      | varchar(40)      | NO   | UNI |                     |                |
| db              | varchar(120)     | YES  |     | NULL                |                |
| db_version      | varchar(40)      | YES  |     | NULL                |                |
| db_file         | varchar(120)     | YES  |     | NULL                |                |
| program         | varchar(80)      | YES  |     | NULL                |                |
| program_version | varchar(40)      | YES  |     | NULL                |                |
| program_file    | varchar(80)      | YES  |     | NULL                |                |
| parameters      | varchar(255)     | YES  |     | NULL                |                |
| module          | varchar(80)      | YES  |     | NULL                |                |
| module_version  | varchar(40)      | YES  |     | NULL                |                |
| gff_source      | varchar(40)      | YES  |     | NULL                |                |
| gff_feature     | varchar(40)      | YES  |     | NULL                |                |
+-----------------+------------------+------+-----+---------------------+----------------+

See the core analysis documentation for more details.


analysis_description

This table is identical to the core analysis_description table. It stores descriptive information about analyses.

Table description
+---------------+------------------+------+-----+---------+-------+
| Field         | Type             | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+-------+
| analysis_id   | int(10) unsigned | NO   | MUL | 0       |       |
| description   | text             | YES  |     | NULL    |       |
| display_label | varchar(255)     | YES  |     | NULL    |       |
| displayable   | tinyint(1)       | NO   |     | 1       |       |
| web_data      | text             | YES  |     | NULL    |       |
+---------------+------------------+------+-----+---------+-------+

See the core analysis_description documentation for more details.


Experimental Data Tables

egroup

The egroup or 'experimental group' table contains information about research groups which are associated with experiments.

Table descriptionField Information
+-----------+----------------------+------+-----+---------+----------------+
| Field     | Type                 | Null | Key | Default | Extra          |
+-----------+----------------------+------+-----+---------+----------------+
| egroup_id | smallint(6) unsigned | NO   | PRI | NULL    | auto_increment |
| name      | varchar(40)          | NO   |     |         |                |
| location  | varchar(120)         | YES  |     | NULL    |                |
| contact   | varchar(40)          | YES  |     | NULL    |                |
+-----------+----------------------+------+-----+---------+----------------+





group location
email of primary contact within the group

Here is an example using our group information:

mysql> select * from egroup;
+-----------+------+----------+--------------------+
| egroup_id | name | location | contact            |
+-----------+------+----------+--------------------+
|         1 | efg  | Hinxton  | njohnson@ebi.ac.uk |
+-----------+------+----------+--------------------+

experiment

Contains information about individual experiments.

Table descriptionField Information
+---------------------+----------------------+------+-----+------------+----------------+
| Field               | Type                 | Null | Key | Default    | Extra          |
+---------------------+----------------------+------+-----+------------+----------------+
| experiment_id       | int(10) unsigned     | NO   | PRI | NULL       | auto_increment |
| name                | varchar(30)          | YES  | UNI | NULL       |                |
| egroup_id           | smallint(6) unsigned | YES  | MUL | NULL       |                |
| date                | date                 | YES  |     | 0000-00-00 |                |
| primary_design_type | varchar(30)          | YES  | MUL | NULL       |                |
| description         | varchar(255)         | YES  |     | NULL       |                |
| mage_xml_id         | int(10) unsigned     | YES  |     | NULL       |                |
+---------------------+----------------------+------+-----+------------+----------------+






date of experiment
The primary design type(MGED term) e.g. binding_site_identification

Link to the mage_xml record

Here is an example using the 'ctcf_ren' experiment:

mysql> select * from experiment where name like 'ctcf_ren';
+---------------+----------+-----------------------+------+-----------------------------+-------------+-------------+
| experiment_id | name     | experimental_group_id | date | primary_design_type         | description | mage_xml_id |
+---------------+----------+-----------------------+------+-----------------------------+-------------+-------------+
|            12 | ctcf_ren |                     1 | NULL | binding_site_identification | NULL        |           2 |
+---------------+----------+-----------------------+------+-----------------------------+-------------+-------------+

mage_xml

Mage XML records for a given experiment.

Table descriptionField Information
+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| mage_xml_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| xml         | text             | YES  |     | NULL    |                |
+-------------+------------------+------+-----+---------+----------------+




XML stored as text


experimental_design_type

This is a generic link table between a design_type and any table where it might logically be applied e.g. experiment, experimental_chip etc.

Table descriptionField Information
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| design_type_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| table_name     | varchar(40)      | NO   | PRI |         |                |
| table_id       | int(10) unsigned | NO   | PRI | 0       |                |
+----------------+------------------+------+-----+---------+----------------+



design type to be associated
table name of associated record 
table id of associated record


design_type

The design type table holds appropriate terms from ontologies such as MGED.

Table description
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| design_type_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| name           | varchar(255)     | YES  | MUL | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+

experimental_variable

The experimental_variable table is design to store any meta data which might be used for querying rather than simpluy storing in the mage_xml table.

Table description
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| table_id   | int(10) unsigned | NO   | PRI | 0       |       |
| table_name | varchar(20)      | NO   | PRI |         |       |
| name       | varchar(40)      | YES  |     | NULL    |       |
| unit       | varchar(40)      | YES  |     | NULL    |       |
| value      | varchar(40)      | YES  |     | NULL    |       |
+------------+------------------+------+-----+---------+-------+

experimental_chip

Each record in this table represent the unique physical instance of an array_chip.

Table descriptionField Information
+----------------------+------------------+------+-----+---------+----------------+
| Field                | Type             | Null | Key | Default | Extra          |
+----------------------+------------------+------+-----+---------+----------------+
| experimental_chip_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| unique_id            | varchar(20)      | NO   | MUL | 0       |                |
| experiment_id        | int(10) unsigned | NO   | MUL | 0       |                |
| array_chip_id        | int(10) unsigned | NO   |     | 0       |                |
| feature_type_id      | int(10) unsigned | YES  | MUL | NULL    |                |
| cell_type_id         | int(10) unsigned | YES  |     | NULL    |                |
| biological_replicate | varchar(40)      | YES  |     | NULL    |                |
| technical_replicate  | varchar(40)      | YES  |     | NULL    |                |
+----------------------+------------------+------+-----+---------+----------------+




Unique id for the experimental_chip




The biological replicate denomination
The technical replicate denomination

The biological_replicate and technical_replicate fields are populated dependent on the tab2MAGE file generated during the import procedure. These values are then used dynamically to resolve individual results across replicates. Here we see the experimental_chips from the 'Stunnenberg_all_OID_1963' experiment representing two biological replicates:

mysql> select ec.* from experimental_chip ec, experiment e where e.name='Stunnenberg_all_OID_1963' and  ec.experiment_id=e.experiment_id;
+----------------------+-----------+---------------+---------------+-----------------+--------------+----------------------+---------------------+
| experimental_chip_id | unique_id | experiment_id | array_chip_id | feature_type_id | cell_type_id | biological_replicate | technical_replicate |
+----------------------+-----------+---------------+---------------+-----------------+--------------+----------------------+---------------------+
|                    1 | 46092     |             1 |             1 |               1 |            3 | 1                    | NULL                |
|                    2 | 46082     |             1 |             1 |               1 |            3 | 2                    | NULL                |
|                    3 | 46075     |             1 |             2 |               1 |            3 | 2                    | NULL                |
|                    4 | 46078     |             1 |             2 |               1 |            3 | 1                    | NULL                |
+----------------------+-----------+---------------+---------------+-----------------+--------------+----------------------+---------------------+

channel

Each record in this table represents a channel of an experimental_chip.

Table descriptionField Information
+----------------------+------------------+------+-----+---------+----------------+
| Field                | Type             | Null | Key | Default | Extra          |
+----------------------+------------------+------+-----+---------+----------------+
| channel_id           | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| experimental_chip_id | int(11) unsigned | YES  | MUL | NULL    |                |
| sample_id            | varchar(20)      | YES  |     | NULL    |                |
| dye                  | varchar(20)      | YES  |     | NULL    |                |
| type                 | varchar(20)      | YES  |     | NULL    |                |
| description          | varchar(255)     | YES  |     | NULL    |                |
+----------------------+------------------+------+-----+---------+----------------+






e.g. Cy5 or Cy5
e.g. 'EXPERIMENTAL' sample or 'TOTAL' or control sample


Here we see each channel from the Stunnenberg_all_OID_1963 experiment.

mysql> select c.* from experimental_chip ec, channel c, experiment e where e.name='Stunnenberg_all_OID_1963' and  ec.experiment_id=e.experiment_id and ec.experimental_chip_id=c.experimental_chip_id;
+------------+----------------------+-----------+------+--------------+-------------+
| channel_id | experimental_chip_id | sample_id | dye  | type         | description |
+------------+----------------------+-----------+------+--------------+-------------+
|          1 |                    1 | NULL      | Cy3  | TOTAL        | NULL        |
|          2 |                    1 | NULL      | Cy5  | EXPERIMENTAL | NULL        |
|          3 |                    2 | NULL      | Cy3  | TOTAL        | NULL        |
|          4 |                    2 | NULL      | Cy5  | EXPERIMENTAL | NULL        |
|          5 |                    3 | NULL      | Cy3  | TOTAL        | NULL        |
|          6 |                    3 | NULL      | Cy5  | EXPERIMENTAL | NULL        |
|          7 |                    4 | NULL      | Cy3  | TOTAL        | NULL        |
|          8 |                    4 | NULL      | Cy5  | EXPERIMENTAL | NULL        |
+------------+----------------------+-----------+------+--------------+-------------+

chip_channel

This table allows a unique table ID to be assigned to each experimental_chip or channel to enable linkage to the result table. The chip_channel table also records the members of each result set, as such this table may be non-redundant, as the same chip or channel may be present in more than one result_set.

Table descriptionField Information
+-----------------+------------------+------+-----+---------+----------------+
| Field           | Type             | Null | Key | Default | Extra          |
+-----------------+------------------+------+-----+---------+----------------+
| chip_channel_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| result_set_id   | int(10) unsigned | NO   | PRI | 0       |                |
| table_id        | int(10) unsigned | NO   |     | 0       |                |
| table_name      | varchar(20)      | NO   |     |         |                |
+-----------------+------------------+------+-----+---------+----------------+






Generic 'table' fields allows linkage to the experimental_chip or channel tables


Here we see channel and 'VSN' normalised experimental_chip level chip_channel entries:

mysql> select cc.*, a.logic_name from chip_channel cc, result_set rs, analysis a where rs.name like 'Stunnenberg_all_OID_1963%' and rs.analysis_id=a.analysis_id and rs.result_set_id=cc.result_set_id order by cc.result_set_id;
+-----------------+---------------+----------+-------------------+------------+
| chip_channel_id | result_set_id | table_id | table_name        | logic_name |
+-----------------+---------------+----------+-------------------+------------+
|               7 |            11 |        7 | channel           | RawValue   |
|               4 |            11 |        4 | channel           | RawValue   |
|               1 |            11 |        1 | channel           | RawValue   |
|               8 |            11 |        8 | channel           | RawValue   |
|               5 |            11 |        5 | channel           | RawValue   |
|               2 |            11 |        2 | channel           | RawValue   |
|               6 |            11 |        6 | channel           | RawValue   |
|               3 |            11 |        3 | channel           | RawValue   |
|               9 |            12 |        1 | experimental_chip | VSN_GLOG   |
|              12 |            12 |        4 | experimental_chip | VSN_GLOG   |
|              11 |            13 |        3 | experimental_chip | VSN_GLOG   |
|              10 |            13 |        2 | experimental_chip | VSN_GLOG   |
+-----------------+---------------+----------+-------------------+------------+

result

This table stores all raw and normalised probe level results.

Table descriptionField Information
+-----------------+----------------------+------+-----+---------+----------------+
| Field           | Type                 | Null | Key | Default | Extra          |
+-----------------+----------------------+------+-----+---------+----------------+
| result_id       | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| probe_id        | int(10) unsigned     | YES  | MUL | NULL    |                |
| score           | double               | YES  |     | NULL    |                |
| chip_channel_id | int(10) unsigned     | NO   | MUL | 0       |                |
| X               | smallint(4) unsigned | YES  |     | NULL    |                |
| Y               | smallint(4) unsigned | YES  |     | NULL    |                |
+-----------------+----------------------+------+-----+---------+----------------+







X Y coords to resolve on plate replicates for the same 
probe and enable plate plotting and spatial normalisation

Here are a few example records:

mysql> select * from result where X is not NULL limit 10;
+-----------+----------+---------+-----------------+------+------+
| result_id | probe_id | score   | chip_channel_id | X    | Y    |
+-----------+----------+---------+-----------------+------+------+
|   5961427 | 11584206 | 1304.44 |             123 |  104 |  686 |
|   5961428 | 11394741 | 1185.56 |             123 |  108 |  184 |
|   5961429 | 11704268 | 2087.78 |             123 |  586 | 1004 |
|   5961430 | 11557658 |  937.56 |             123 |   80 |  616 |
|   5961431 | 11456495 | 1375.78 |             123 |  487 |  347 |
|   5961432 | 11569266 | 1495.44 |             123 |  464 |  646 |
|   5961433 | 11500438 |  672.89 |             123 |  335 |  463 |
|   5961434 | 11417312 |    2015 |             123 |  325 |  243 |
|   5961435 | 11531252 | 1923.33 |             123 |   16 |  546 |
|   5961436 | 11568810 |  639.11 |             123 |  317 |  645 |
+-----------+----------+---------+-----------------+------+------+

See also probe.

result_feature

This table stores a feature representation of results from a given result_set. The are stored in precomputed windows to enable faster display at greater levels of zoom.

Table descriptionField Information
+-------------------+----------------------+------+-----+---------+----------------+
| Field             | Type                 | Null | Key | Default | Extra          |
+-------------------+----------------------+------+-----+---------+----------------+
| result_feature_id | int(10) unsigned     | NO   | PRI | NULL    | auto_increment | 
| result_set_id     | int(10) unsigned     | NO   | MUL |         |                | 
| seq_region_id     | int(10) unsigned     | NO   |     |         |                | 
| seq_region_start  | int(10)              | NO   |     |         |                | 
| seq_region_end    | int(10)              | NO   |     |         |                | 
| seq_region_strand | tinyint(4)           | NO   |     |         |                | 
| window_size       | smallint(5) unsigned | NO   |     |         |                | 
| score             | double               | YES  |     | NULL    |                | 
+-------------------+----------------------+------+-----+---------+----------------+








Size of window in bp over which score is calculated



result_set

Each record represent a distinct set of result. One may be a superset of another, with biological replicates being split up into individual technical replicate result_sets. Names can be replicated as it is possible to have the same set of data with differing analyses e.g. the raw data and the normalised data for a full experimental import set.

Table descriptionField Information
+-----------------+------------------+------+-----+---------+----------------+
| Field           | Type             | Null | Key | Default | Extra          |
+-----------------+------------------+------+-----+---------+----------------+
| result_set_id   | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| analysis_id     | int(10) unsigned | YES  |     | NULL    |                |
| name            | varchar(40)      | YES  | MUL | NULL    |                |
| cell_type_id    | int(10) unsigned | YES  |     | NULL    |                |
| feature_type_id | int(10) unsigned | YES  |     | NULL    |                |
+-----------------+------------------+------+-----+---------+----------------+




Normalisation or RawValue analysis for experimental_chips or channels respectively




Here are a few result sets from experiments using the SangerPCR Encode array:

mysql> select rs.* from result_set rs, analysis a where a.logic_name='SangerPCR' and rs.analysis_id=a.analysis_id;
+---------------+-------------+------------------------+--------------+-----------------+
| result_set_id | analysis_id | name                   | cell_type_id | feature_type_id |
+---------------+-------------+------------------------+--------------+-----------------+
|             1 |           8 | H4ac-HeLa_IMPORT       |            1 |               2 |
|             2 |           8 | H3ac-HeLa_IMPORT       |            1 |               3 |
|             3 |           8 | H3K4Me3-HeLa_IMPORT    |            1 |               4 |
|             4 |           8 | H3K4Me2-HeLa_IMPORT    |            1 |               5 |
|             5 |           8 | H3K4Me1-HeLa_IMPORT    |            1 |               6 |
|             6 |           8 | H3K4Me1-GM06990_IMPORT |            2 |               6 |
|             7 |           8 | H3K4Me2-GM06990_IMPORT |            2 |               5 |
|             8 |           8 | H3ac-GM06990_IMPORT    |            2 |               3 |
|             9 |           8 | H4ac-GM06990_IMPORT    |            2 |               2 |
|            10 |           8 | H3K4Me3-GM06990_IMPORT |            2 |               4 |
+---------------+-------------+------------------------+--------------+-----------------+

experimental_set

The experimental_set and experimental_subset tables provides a generic import for pre-processed functional genomics data which does follow the micro-array technology paradigm. As technology moves fast in the field of functional genoics, this provides a convenient way to incorporate new data which would otherwise be outside the eFG DB and the analyses it supports. This has been very useful with the rapid adoption of the various 2nd generation sequencing platforms.

The experimental_set table can be thought of as analagous to the result_set table, defining technical and biological replicate sets within a given experiment.

Table descriptionField Information
+---------------------+------------------+------+-----+---------+----------------+
| Field               | Type             | Null | Key | Default | Extra          |
+---------------------+------------------+------+-----+---------+----------------+
| experimental_set_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| experiment_id       | int(10) unsigned | YES  | MUL | NULL    |                |
| feature_type_id     | int(10) unsigned | YES  | MUL | NULL    |                |
| cell_type_id        | int(10) unsigned | YES  | MUL | NULL    |                |
| format              | varchar(20)      | YES  |     | NULL    |                |
| vendor              | varchar(40)      | YES  |     | NULL    |                |
| name                | varchar(40)      | NO   | UNI | 0       |                |
+---------------------+------------------+------+-----+---------+----------------+








Format of experimental platform technology
Vendor of experimental platform technology


Here are a few example showing experimental_set records for ChIP-Seq experiments:

mysql> select * from experimental_set limit 10;
+---------------------+---------------+-----------------+--------------+------------+--------+---------------+
| experimental_set_id | experiment_id | feature_type_id | cell_type_id | format     | vendor | name          |
+---------------------+---------------+-----------------+--------------+------------+--------+---------------+
|                   1 |            13 |               8 |            4 | SEQUENCING | SOLEXA | CD4_DNASE     |
|                   2 |            14 |               8 |            2 | SEQUENCING | SOLEXA | GM06990_DNASE |
|                   3 |            19 |               9 |            4 | SEQUENCING | SOLEXA | CD4_CTCF      |
|                   4 |            16 |              23 |            4 | SEQUENCING | SOLEXA | CD4_H2AZ      |
|                   5 |            17 |              24 |            4 | SEQUENCING | SOLEXA | CD4_H2BK5me1  |
|                   6 |            18 |              25 |            4 | SEQUENCING | SOLEXA | CD4_H3K27me1  |
|                   7 |            20 |              26 |            4 | SEQUENCING | SOLEXA | CD4_H3K27me2  |
|                   8 |            21 |              11 |            4 | SEQUENCING | SOLEXA | CD4_H3K27me3  |
|                   9 |            22 |              27 |            4 | SEQUENCING | SOLEXA | CD4_H3K36me1  |
|                  10 |            23 |              12 |            4 | SEQUENCING | SOLEXA | CD4_H3K36me3  |
+---------------------+---------------+-----------------+--------------+------------+--------+---------------+

experimental_subset

The experimental_subset table simply records the members of the experimental_set. The name field is generic to allow different types subset to be accomodated. Currently the filenames of contributory files are used to keep some form of import tracking. This allows for incremental import to an experimental_set

Table descriptionField Information
+------------------------+------------------+------+-----+---------+----------------+
| Field                  | Type             | Null | Key | Default | Extra          |
+------------------------+------------------+------+-----+---------+----------------+
| experimental_subset_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| experimental_set_id    | int(10) unsigned | NO   | MUL | 0       |                |
| name                   | varchar(30)      | NO   |     | 0       |                |
+------------------------+------------------+------+-----+---------+----------------+





e.g. file name

Here we see the filenames for each experimental_set have been recorded:

mysql> select * from experimental_subset limit 10;
+------------------------+---------------------+--------------------------------+
| experimental_subset_id | experimental_set_id | name                           |
+------------------------+---------------------+--------------------------------+
|                      3 |                   3 | CD4_CTCF.clstr_gt1.bed         |
|                      4 |                   1 | CD4_parzen_02.scores.bed       |
|                      5 |                   2 | GM06990_parzen_0115.scores.bed |
|                      6 |                   4 | CD4_H2AZ.clstr_gt1.bed         |
|                      7 |                   5 | CD4_H2BK5me1.clstr_gt1.bed     |
|                      8 |                   6 | CD4_H3K27me1.clstr_gt1.bed     |
|                      9 |                   7 | CD4_H3K27me2.clstr_gt1.bed     |
|                     10 |                   8 | CD4_H3K27me3.clstr_gt1.bed     |
|                     11 |                   9 | CD4_H3K36me1.clstr_gt1.bed     |
|                     12 |                  10 | CD4_H3K36me3.clstr_gt1.bed     |
+------------------------+---------------------+--------------------------------+

Platform Data Tables

array

This table contains information about an array product.

Table descriptionField Information
+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| array_id    | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| name        | varchar(40)      | YES  |     | NULL    |                |
| format      | varchar(20)      | YES  |     | NULL    |                |
| vendor      | varchar(40)      | YES  | MUL | NULL    |                |
| description | varchar(255)     | YES  |     | NULL    |                |
| type        | varchar(20)      | YES  |     | NULL    |                |
+-------------+------------------+------+-----+---------+----------------+





e.g. Tiled, Expression, SNP, Custom
Array vendor e.g. Nimblegen

e.g. OLIGO, PCR

Here we see an entry for the latest version of the Sanger ENCODE PCR array:

mysql> select * from array where vendor='Sanger';
+----------+-------------+--------+--------+-------------------------------+------+
| array_id | name        | format | vendor | description                   | type |
+----------+-------------+--------+--------+-------------------------------+------+
|        2 | ENCODE3.1.1 | TILED  | SANGER | Sanger ENCODE PCR array 3.1.1 | PCR  |
+----------+-------------+--------+--------+-------------------------------+------+

array_chip

The array_chip table contains information about individual chips or slides which belong to a larger array set.

Table descriptionField Information
+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| array_chip_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| design_id     | varchar(20)      | YES  |     | NULL    |                |
| array_id      | int(10) unsigned | NO   | MUL | 0       |                |
| name          | varchar(40)      | YES  |     | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+





This must be a unique id withing the array set.




Here we see the first 10 of 38 array chips which consitute the Nimblegen 2005-05-10_HG17Tiling_Set array:

mysql> select * from array_chip order by design_id limit 10;
+---------------+-----------+----------+-----------------------------+
| array_chip_id | design_id | array_id | name                        |
+---------------+-----------+----------+-----------------------------+
|             4 | 2031      |        1 | 2005-05-10_HG17Tiling_Set01 |
|             5 | 2032      |        1 | 2005-05-10_HG17Tiling_Set02 |
|             6 | 2033      |        1 | 2005-05-10_HG17Tiling_Set03 |
|             7 | 2034      |        1 | 2005-05-10_HG17Tiling_Set04 |
|             8 | 2035      |        1 | 2005-05-10_HG17Tiling_Set05 |
|             9 | 2036      |        1 | 2005-05-10_HG17Tiling_Set06 |
|            10 | 2037      |        1 | 2005-05-10_HG17Tiling_Set07 |
|            11 | 2038      |        1 | 2005-05-10_HG17Tiling_Set08 |
|            12 | 2039      |        1 | 2005-05-10_HG17Tiling_Set09 |
|            13 | 2040      |        1 | 2005-05-10_HG17Tiling_Set10 |
+---------------+-----------+----------+-----------------------------+

probe

The probe table contains information about unique probes. As some vendors refer to the same probe within a probeset by a different name, depending on what array it is on, this table can have multiple entries for a given probe_id to reflect each instance of that probe, on a different array_chip or with a different name.

Table descriptionField Information
+---------------+----------------------+------+-----+---------+----------------+
| Field         | Type                 | Null | Key | Default | Extra          |
+---------------+----------------------+------+-----+---------+----------------+
| probe_id      | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| probe_set_id  | int(10) unsigned     | YES  | MUL | NULL    |                |
| name          | varchar(40)          | NO   | PRI |         |                |
| length        | smallint(6) unsigned | NO   |     | 0       |                |
| array_chip_id | int(10) unsigned     | NO   | MUL | 0       |                |
| class         | varchar(20)          | YES  |     | NULL    |                |
+---------------+----------------------+------+-----+---------+----------------+








e.g. EXPERIMENTAL, CONTROL, CLUSTER, SPIKE-IN etc.

Here we see a handful of probes from the Nimblegen 2005-05-10_HG17Tiling_Set array which have no probe_set associations:

mysql> select * from probe limit 10;
+----------+--------------+----------------+--------+---------------+--------------+
| probe_id | probe_set_id | name           | length | array_chip_id | class        |
+----------+--------------+----------------+--------+---------------+--------------+
|  4368959 |            0 | chrXP23688251  |     50 |             2 | EXPERIMENTAL |
|  4368960 |            0 | chr22P38797630 |     50 |             2 | EXPERIMENTAL |
|  4368961 |            0 | chrXP19006022  |     50 |             2 | EXPERIMENTAL |
|  4368962 |            0 | chr22P44857921 |     50 |             2 | EXPERIMENTAL |
|  4368963 |            0 | chrXP35916231  |     50 |             2 | EXPERIMENTAL |
|  4368964 |            0 | chrXP3578999   |     50 |             2 | EXPERIMENTAL |
|  4368965 |            0 | chrXP30674179  |     50 |             2 | EXPERIMENTAL |
|  4368966 |            0 | chrXP83863933  |     50 |             2 | EXPERIMENTAL |
|  4368967 |            0 | chr22P47957550 |     50 |             2 | EXPERIMENTAL |
|  4368968 |            0 | chrXP74276518  |     50 |             2 | EXPERIMENTAL |
+----------+--------------+----------------+--------+---------------+--------------+

probe_set

Information about probe sets.

Table descriptionField Information
+--------------+----------------------+------+-----+---------+----------------+
| Field        | Type                 | Null | Key | Default | Extra          |
+--------------+----------------------+------+-----+---------+----------------+
| probe_set_id | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| name         | varchar(20)          | NO   |     |         |                |
| size         | smallint(6) unsigned | NO   |     | 0       |                |
| family       | varchar(20)          | YES  |     | NULL    |                |
+--------------+----------------------+------+-----+---------+----------------+





Number of probes in probe_set
Wider classification of probe_set


probe_design

The probe_design table allows for storage of probe specific design information when using the eFG DB to facilitate custom array design.

Table descriptionField Information
+-----------------+------------------+------+-----+---------+-------+
| Field           | Type             | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| probe_id        | int(10) unsigned | NO   | PRI | 0       |       |
| analysis_id     | int(10) unsigned | NO   | PRI | 0       |       |
| score           | double           | YES  |     | NULL    |       |
| coord_system_id | int(10) unsigned | NO   | PRI | 0       |       |
+-----------------+------------------+------+-----+---------+-------+




Link to analysis which will contain design specfic paramters 
Design score derived from analysis e.g. uniqueness score
Required as designs are dependent on the genome assembly version


Processed Data Tables

feature_set

Each record represents a set of common features based on the output of an analysis.

Table descriptionField Information
+-----------------+-------------------------------------------+------+-----+---------+----------------+
| Field           | Type                                      | Null | Key | Default | Extra          |
+-----------------+-------------------------------------------+------+-----+---------+----------------+
| feature_set_id  | int(10) unsigned                          | NO   | PRI | NULL    | auto_increment |
| feature_type_id | int(10) unsigned                          | NO   | MUL | 0       |                |
| analysis_id     | int(10) unsigned                          | YES  |     | NULL    |                |
| cell_type_id    | int(10) unsigned                          | YES  |     | NULL    |                |
| name            | varchar(40)                               | YES  | UNI | NULL    |                |
| type            | enum('annotated','regulatory','external') | YES  |     | NULL    |                |
+-----------------+-------------------------------------------+------+-----+---------+----------------+




The analysis used to generate this set of features
Each feature should have the same feature_type
Each feature should have the same cell_type

Defines the type of feature contained by this feature_set

Here we can see the current 'external' feature_sets:

mysql> select * from feature_set where type='external';
+----------------+-----------------+-------------+--------------+-----------------------+----------+
| feature_set_id | feature_type_id | analysis_id | cell_type_id | name                  | type     |
+----------------+-----------------+-------------+--------------+-----------------------+----------+
|             66 |              48 |          17 |         NULL | miRanda miRNA         | external |
|             65 |              49 |          18 |         NULL | cisRED group motifs   | external |
|             64 |              48 |          18 |         NULL | cisRED search regions | external |
|             63 |              45 |          19 |         NULL | VISTA enhancer set    | external |
+----------------+-----------------+-------------+--------------+-----------------------+----------+

annotated_feature

Each record represents an annotation on the genome based on the output of an analysis.

Table descriptionField Information
+----------------------+------------------+------+-----+---------+----------------+
| Field                | Type             | Null | Key | Default | Extra          |
+----------------------+------------------+------+-----+---------+----------------+
| annotated_feature_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| seq_region_id        | int(10) unsigned | NO   | MUL | 0       |                |
| seq_region_start     | int(10) unsigned | NO   |     | 0       |                |
| seq_region_end       | int(10) unsigned | NO   |     | 0       |                |
| seq_region_strand    | tinyint(1)       | NO   |     | 0       |                |
| display_label        | varchar(60)      | YES  |     | NULL    |                |
| score                | double           | YES  |     | NULL    |                |
| feature_set_id       | int(10) unsigned | NO   | MUL | 0       |                |
+----------------------+------------------+------+-----+---------+----------------+






Coordinate information

Generated from an analysis
Links this feature to a wider set of features


Here we see a handful of annotated_features from the H4ac-HeLa_IMPORT feature_set:

mysql> select af.* from annotated_feature af, feature_set fs where af.feature_set_id=fs.feature_set_id and fs.name='H4ac-HeLa_IMPORT' limit 10;
+----------------------+---------------+------------------+----------------+-------------------+---------------+-------+----------------+
| annotated_feature_id | seq_region_id | seq_region_start | seq_region_end | seq_region_strand | display_label | score | feature_set_id |
+----------------------+---------------+------------------+----------------+-------------------+---------------+-------+----------------+
|                11777 |            73 |        115638286 |      115640172 |                 0 |               |  5.52 |              2 |
|                11778 |            73 |        115927096 |      115928972 |                 0 |               |  4.56 |              2 |
|                11779 |            73 |        116099885 |      116102142 |                 0 |               |   4.8 |              2 |
|                11780 |            73 |        116377509 |      116380481 |                 0 |               | 5.225 |              2 |
|                11781 |            73 |        116381790 |      116385824 |                 0 |               |  3.31 |              2 |
|                11782 |            73 |        116385805 |      116392279 |                 0 |               |  1.64 |              2 |
|                11783 |            73 |        116696459 |      116699296 |                 0 |               |  2.53 |              2 |
|                11784 |            45 |        131429336 |      131431902 |                 0 |               |  3.11 |              2 |
|                11785 |            45 |        131431900 |      131435369 |                 0 |               |  2.32 |              2 |
|                11786 |            45 |        131435349 |      131444027 |                 0 |               |  2.57 |              2 |
+----------------------+---------------+------------------+----------------+-------------------+---------------+-------+----------------+

external_feature

External features are analogous to annotated_features, but are sourced from and externally curated set of data e.g. The cisRED regulatory motif set.

Table description
+---------------------+------------------+------+-----+---------+----------------+
| Field               | Type             | Null | Key | Default | Extra          |
+---------------------+------------------+------+-----+---------+----------------+
| external_feature_id | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| seq_region_id       | int(10) unsigned | NO   | MUL | 0       |                |
| seq_region_start    | int(10) unsigned | NO   |     | 0       |                |
| seq_region_end      | int(10) unsigned | NO   |     | 0       |                |
| seq_region_strand   | tinyint(1)       | NO   |     | 0       |                |
| display_label       | varchar(60)      | YES  |     | NULL    |                |
| feature_type_id     | int(10) unsigned | YES  | MUL | NULL    |                |
| feature_set_id      | int(10) unsigned | NO   | MUL | 0       |                |
+---------------------+------------------+------+-----+---------+----------------+

regulatory_feature

Each record represents a regulatory annotation on the genome based on the output of the Ensmbl 'Regulatory Build'.

Table descriptionField Information
+-----------------------+-----------------------+------+-----+---------+----------------+
| Field                 | Type                  | Null | Key | Default | Extra          |
+-----------------------+-----------------------+------+-----+---------+----------------+
| regulatory_feature_id | int(10) unsigned      | NO   | PRI | NULL    | auto_increment |
| seq_region_id         | int(10) unsigned      | NO   | MUL | 0       |                |
| seq_region_start      | int(10) unsigned      | NO   |     | 0       |                |
| seq_region_end        | int(10) unsigned      | NO   |     | 0       |                |
| seq_region_strand     | tinyint(1)            | NO   |     | 0       |                |
| display_label         | varchar(60)           | YES  |     | NULL    |                |
| feature_type_id       | int(10) unsigned      | YES  | MUL | NULL    |                |
| feature_set_id        | int(10) unsigned      | YES  | MUL | NULL    |                |
| stable_id             | mediumint(8) unsigned | YES  | MUL | NULL    |                |
+-----------------------+-----------------------+------+-----+---------+----------------+





Coordinate information


Binary display string used to classify feature
Regulatory feature type assigned by classfication of binary string
Links this feature to a wider set of features
Integer representation of the stable identifier for this regulatory feature

Here we see some regulatory_feature recrods from the current regulatory build.

mysql> select rf.* from regulatory_feature rf, feature_set fs where rf.feature_Set_id=fs.feature_set_id and fs.name='RegulatoryFeatures' limit 10;
+-----------------------+---------------+------------------+----------------+-------------------+------------------------------------+-----------------+----------------+-----------+
| regulatory_feature_id | seq_region_id | seq_region_start | seq_region_end | seq_region_strand | display_label                      | feature_type_id | feature_set_id | stable_id |
+-----------------------+---------------+------------------+----------------+-------------------+------------------------------------+-----------------+----------------+-----------+
|                605239 |           137 |                2 |            363 |                 0 | 1001000100111111101101110000000011 |              44 |             57 |    113231 |
|                605240 |           137 |              371 |            543 |                 0 | 1000000100001000101100010000000011 |              44 |             57 |    113232 |
|                605241 |           137 |              560 |            641 |                 0 | 1000000100001000101100010000000011 |              44 |             57 |    113233 |
|                605242 |           137 |              700 |            781 |                 0 | 1000000100011001101100010000000011 |              44 |             57 |    113234 |
|                605243 |           137 |              792 |            885 |                 0 | 1000000100001000001000000000000011 |              44 |             57 |    113235 |
|                605244 |           137 |              909 |           1044 |                 0 | 1000000100001000100000000000000011 |              44 |             57 |    113236 |
|                605245 |           137 |             1263 |           1296 |                 0 | 1000000000000000000100000000000011 |              44 |             57 |    113237 |
|                605246 |           137 |             1377 |           1463 |                 0 | 1000000100001000100100110000000011 |              44 |             57 |    113238 |
|                605247 |           137 |             1515 |           1565 |                 0 | 1000000000000000000000000000000011 |              44 |             57 |    113239 |
|                605248 |           137 |             1691 |           1734 |                 0 | 1000000100011000100000000000000011 |              44 |             57 |    113240 |
+-----------------------+---------------+------------------+----------------+-------------------+------------------------------------+-----------------+----------------+-----------+

regulatory_attribute

Regulatory features are built using co-occurrence analysis of other types of feature or attribute features. The regulatory_attribute table store information about these underlying attribute features for each regulatory_feature.

Table descriptionField Information
+-------------------------+------------------------------+------+-----+-----------+-------+
| Field                   | Type                         | Null | Key | Default   | Extra |
+-------------------------+------------------------------+------+-----+-----------+-------+
| regulatory_feature_id   | int(10) unsigned             | NO   | PRI | 0         |       |
| attribute_feature_id    | int(10) unsigned             | NO   | PRI | 0         |       |
| attribute_feature_table | enum('annotated','external') | NO   | PRI | annotated |       |
+-------------------------+------------------------------+------+-----+-----------+-------+




The table id of the attribute feature
The source feature table type of the attribute feature

This table shows all the annotated_features which make up the regulatory_feature with stable_id ENSR00000113231, in this case 26 annotated_features.

mysql> select ra.* from regulatory_attribute ra, regulatory_feature rf where rf.stable_id=113231 and rf.regulatory_feature_id=ra.regulatory_feature_id;
+-----------------------+----------------------+-------------------------+
| regulatory_feature_id | attribute_feature_id | attribute_feature_table |
+-----------------------+----------------------+-------------------------+
|                605239 |             25853743 | annotated               |
|                605239 |             25853797 | annotated               |
|                605239 |             25853909 | annotated               |
|                605239 |             25853980 | annotated               |
|                605239 |             25854098 | annotated               |
|                605239 |             25854099 | annotated               |
|                605239 |             25854100 | annotated               |
|                605239 |             25854190 | annotated               |
|                605239 |             25854312 | annotated               |
|                605239 |             25854313 | annotated               |
|                605239 |             25854314 | annotated               |
|                605239 |             25854315 | annotated               |
|                605239 |             25854434 | annotated               |
|                605239 |             25854435 | annotated               |
|                605239 |             25854650 | annotated               |
|                605239 |             25854732 | annotated               |
|                605239 |             25854734 | annotated               |
|                605239 |             25854891 | annotated               |
|                605239 |             25855005 | annotated               |
|                605239 |             25855192 | annotated               |
|                605239 |             25855193 | annotated               |
|                605239 |             25855296 | annotated               |
|                605239 |             25855297 | annotated               |
|                605239 |             25855298 | annotated               |
|                605239 |             25855300 | annotated               |
|                605239 |             25855327 | annotated               |
+-----------------------+----------------------+-------------------------+

Association Data Tables

These tables provide a link between processed feature_sets and there underlying data.

data_set

This table represents the central entry point to a set of data and links to the product feature_set.

Table descriptionField Information
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| data_set_id    | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| feature_set_id | int(10) unsigned | NO   | PRI | 0       |                |
| name           | varchar(40)      | YES  | UNI | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+




The product feature_set.


Here we see some data_sets which represent H3K4me3 feature types:

mysql> select ds.* from data_set ds, feature_type ft, feature_set fs where ft.name='H3K4me3' and ds.feature_set_id=fs.feature_set_id and fs.feature_type_id=ft.feature_type_id;
+-------------+----------------+------------------------+
| data_set_id | feature_set_id | name                   |
+-------------+----------------+------------------------+
|           3 |              4 | H3K4Me3-HeLa_IMPORT    |
|          10 |             11 | H3K4Me3-GM06990_IMPORT |
|          15 |             15 | Wiggle_H3K4me3         |
|          46 |             49 | CD4_H3K4me3            |
+-------------+----------------+------------------------+

supporting_set

This table provides the association between a data_set and it's underlying supporting sets.

Table descriptionField Information
+-------------------+-----------------------------------------+------+-----+---------+-------+
| Field             | Type                                    | Null | Key | Default | Extra |
+-------------------+-----------------------------------------+------+-----+---------+-------+
| data_set_id       | int(10) unsigned                        | NO   | PRI | 0       |       |
| supporting_set_id | int(10) unsigned                        | NO   | PRI | 0       |       |
| type              | enum('result','feature','experimental') | YES  | MUL | NULL    |       |
+-------------------+-----------------------------------------+------+-----+---------+-------+




The table id of the supporting_set
The type of set

Here we see all the supporting sets for the current regulatory build:

mysql> select ss.* from supporting_set ss, data_set ds where ds.name='RegulatoryFeatures' and ds.data_Set_id=ss.data_Set_id;
+-------------+-------------------+---------+
| data_set_id | supporting_set_id | type    |
+-------------+-------------------+---------+
|          54 |                12 | feature |
|          54 |                13 | feature |
|          54 |                14 | feature |
|          54 |                15 | feature |
|          54 |                16 | feature |
|          54 |                17 | feature |
|          54 |                20 | feature |
|          54 |                21 | feature |
|          54 |                22 | feature |
|          54 |                29 | feature |
|          54 |                30 | feature |
|          54 |                31 | feature |
|          54 |                32 | feature |
|          54 |                33 | feature |
|          54 |                34 | feature |
|          54 |                35 | feature |
|          54 |                36 | feature |
|          54 |                37 | feature |
|          54 |                38 | feature |
|          54 |                39 | feature |
|          54 |                40 | feature |
|          54 |                41 | feature |
|          54 |                42 | feature |
|          54 |                43 | feature |
|          54 |                44 | feature |
|          54 |                45 | feature |
|          54 |                46 | feature |
|          54 |                47 | feature |
|          54 |                48 | feature |
|          54 |                49 | feature |
|          54 |                50 | feature |
|          54 |                51 | feature |
+-------------+-------------------+---------+

Here is the representation of a direct feature import from a ChIP-Seq experiment:

mysql> select ss.* from supporting_set ss, data_set ds where ds.name='CD4_H3K36me3' and ds.data_set_id=ss.data_set_id;
+-------------+-------------------+--------------+
| data_set_id | supporting_set_id | type         |
+-------------+-------------------+--------------+
|          48 |                10 | experimental |
+-------------+-------------------+--------------+

... and the supporting sets for the analysis performed on a ChIP-chip experiment:

mysql> select ss.* from supporting_set ss, data_set ds where ds.name='Nessie_NG_STD_2_ctcf_ren_BR1' and ds.data_set_id=ss.data_set_id;
+-------------+-------------------+--------+
| data_set_id | supporting_set_id | type   |
+-------------+-------------------+--------+
|          22 |                18 | result |
+-------------+-------------------+--------+

Coordinate Data Tables

coord_system

This mirrors and extends the core coord_system table, holding core informnation alongside eFG information provides a reference back to the core DB which any given feature may have been imported with. Each non-redundant identical core coord_system is given a unique ID within the eFG DB. This allows features on the same coord_system to be loaded using different core DBs. To access the data loaded using these different core DBs simply requires the use of the correct coord_system. This is essential to provide backwards compatibilty and to avoid having to update the coordinate information for every release of ensembl.

Table descriptionField Information
+----------------------+-----------------------------------------+------+-----+---------+----------------+
| Field                | Type                                    | Null | Key | Default | Extra          |
+----------------------+-----------------------------------------+------+-----+---------+----------------+
| coord_system_id      | int(10)                                 | NO   | PRI | NULL    | auto_increment |
| name                 | varchar(40)                             | NO   | MUL |         |                |
| version              | varchar(40)                             | YES  |     | NULL    |                |
| rank                 | int(11)                                 | NO   |     | 0       |                |
| attrib               | set('default_version','sequence_level') | YES  |     | NULL    |                |
| schema_build         | varchar(10)                             | NO   | PRI |         |                |
| core_coord_system_id | int(10)                                 | NO   | PRI | 0       |                |
+----------------------+-----------------------------------------+------+-----+---------+----------------+








core DB of origin
coord_system_id in core DB

Here are the current coord_systems stored in the human eFG DB:

mysql> select * from coord_system order by schema_build;
+-----------------+-------------+---------+------+-----------------+--------------+----------------------+
| coord_system_id | name        | version | rank | attrib          | schema_build | core_coord_system_id |
+-----------------+-------------+---------+------+-----------------+--------------+----------------------+
|            2458 | chromosome  | NCBI34  |    1 | default_version | 25_34e       |                    1 |
|               2 | chromosome  | NCBI35  |    1 | default_version | 36_35i       |                    1 |
|               1 | chromosome  | NCBI36  |    1 | default_version | 42_36d       |                   22 |
|               1 | chromosome  | NCBI36  |    1 | default_version | 43_36e       |                   17 |
|              23 | supercontig | NCBI36  |    2 | default_version | 44_36f       |                   15 |
|               1 | chromosome  | NCBI36  |    1 | default_version | 44_36f       |                   17 |
|               1 | chromosome  | NCBI36  |    1 | default_version | 45_36g       |                    1 |
|              23 | supercontig | NCBI36  |    2 | default_version | 45_36g       |                   15 |
|               1 | chromosome  | NCBI36  |    1 | default_version | 46_36h       |                   17 |
|              23 | supercontig | NCBI36  |    2 | default_version | 46_36h       |                   15 |
|               1 | chromosome  | NCBI36  |    1 | default_version | 47_36i       |                   17 |
|              23 | supercontig | NCBI36  |    2 | default_version | 47_36i       |                   15 |
|               1 | chromosome  | NCBI36  |    1 | default_version | 48_36j       |                   17 |
|              23 | supercontig | NCBI36  |    2 | default_version | 48_36j       |                   15 |
+-----------------+-------------+---------+------+-----------------+--------------+----------------------+

See the core coord_system documentation for more details.


seq_region

This table also mirrors and extends the core seq_region table, providing a reference table for an seq_region_id conversion which may be necessary when handling slice based queries using a slice generated from a given core DB. As internal seq_region_ids are not stable between releases using the same coord_systems, it is necessary to record their core DB of origin.

Table descriptionField Information
mysql> desc seq_region;
+--------------------+------------------+------+-----+---------+----------------+
| Field              | Type             | Null | Key | Default | Extra          |
+--------------------+------------------+------+-----+---------+----------------+
| seq_region_id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| name               | varchar(40)      | NO   | PRI |         |                |
| coord_system_id    | int(10) unsigned | NO   | MUL | 0       |                |
| core_seq_region_id | int(10) unsigned | NO   |     | 0       |                |
| schema_build       | varchar(10)      | NO   | PRI |         |                |
+--------------------+------------------+------+-----+---------+----------------+







core DB of origin
coord_system_id in core DB

Here see the differing core seq_regions for chromosome 12 and how they are represented in the eFG DB:

mysql> select * from seq_region where name=12;
+---------------+------+-----------------+--------------------+--------------+
| seq_region_id | name | coord_system_id | core_seq_region_id | schema_build |
+---------------+------+-----------------+--------------------+--------------+
|            61 | 12   |               1 |             226056 | 45_36g       |
|            61 | 12   |               1 |             226056 | 43_36e       |
|            61 | 12   |               1 |             226056 | 44_36f       |
|            61 | 12   |               1 |             226056 | 42_36d       |
|           113 | 12   |               2 |             965907 | 36_35i       |
|           283 | 12   |            2458 |             965488 | 25_34e       |
|            61 | 12   |               1 |             226056 | 46_36h       |
|            61 | 12   |               1 |             226056 | 47_36i       |
|            61 | 12   |               1 |             226056 | 48_36j       |
+---------------+------+-----------------+--------------------+--------------+

See the core seq_region documentation for more details.