BiologicalNetworks


Database

Architecture

Introduction

  1. The System Architecture
  2. Database Schema
  3. Representation of the Data
  4. Physical Storage Schema (DDL)

The System Architecture

BiologicalNetworks system consists of six major parts: Client Side Application, Graph Query Engines, Integration Client/Manager, Data Importer, Schema Mapping Tools and Data Warehouse.

Client Application implements all business logic and a significant part of the user interface. Two novel Graph Query Engines store and query molecular interaction network and directed acyclic graphs such as ontologies and taxonomies, using specialized algorithms (see Chen et al. (2005) for example) customized for each kind of graph. Data Importer can accept data from the external data source, validate it against the schema, and store the data in the warehouse. Integration Client/Manager specifies a new database that needs to be integrated in the system. The user provides the system with the schema of the new source, and the schema is validated and stored in the Schema Library.

Database Architecture

Database Schema

BiologicalNetworks database was designed to represent generic network data.
Current implementation defines three classes of vertices – primary nodes (primary objects), connector nodes (events of interaction or regulation between primary objects) and graph nodes (complex objects (protein complexes, cell processes) that might contain graphs). Connector nodes are identified by mechanism and effect type. Vertices are stored in the table Nodes. Nodes themselves can be of several types (Proteins, Small Molecules, Cell Processes, Expression Controls, Binding, Protein Modification etc.), which are recorded in the table NodeType.

The current implementation supports three classes of links between vertices– directed and non directed (as defined by the field Direction in the table Edges) and membership (as defined by the field Relation in the table Edges). Directed links can describe biological notion of regulation, such as “protein A activates protein B”; non-directed links are used to describe binding events: “protein A binds protein B”; membership links describe situations such as “protein complex P contains protein A”. The database structure does not limit the number of different classes of nodes and edges. Also, there can be any number of node types, attribute types.

The two component networks that we promote are a natural extension of the data models accepted for metabolic databases, and will provide a unifying ground for such diverse phenomena as signaling pathways, protein interaction maps, and gene expression networks.

Any node can have an arbitrary number of associated attributes. Node attributes usually store information about functional class, localization, chemical structure etc. These attribute, as well as names of proteins and other objects, are searchable fields. Attribute values can be translated into color or shape coding of nodes during visualization. Connector node attributes store information about tissues, cell types, experimental conditions, and other biologically meaningful details. All fields are searchable.

Attribute of an object can be assigned a single value, as well as multiple values. Different properties can form sets to describe complex attributes, such as experimental conditions that might include cell type, tissue, organism, species, p_values etc.

Representation of the Data

Database tables

Physical Storage Schema (DDL)