Back to AZGeNetScene Visualizer Demo
On starting the Visualizer, you'll see a section under Search called Advanced Options. Among these are options for selecting "relation aggregation" configurations, which are settings that will hopefully make networks easier to read and use. Default settings are provided, but when you become more familiar with the system, you may prefer to experiment with these and create your own. An explanation of these settings follows.
Introduction
Features
Aggregation Levels
Additional Options
Further Information
"Relation aggregation" refers to the way relations returned by a search are organized. Because the relations are extracted directly from biomedical text, different words can be used to express the same concepts. The result is that the same concept, for example, mdm2 inhibits apoptosis, can be shown as several different relations: mdm2 - inhibited - apoptosis, mdm2 - has been shown to inhibit - apoptosis, mdm2 gene -inhibit - apoptosis, etc. Aggregation addresses this problem by consolidating relations that "mean the same thing" in order to reduce noise in the network and better reveal the underlying structure of regulatory networks. This will hopefully enable researchers to use the system more efficiently and effectively.
Determining when relations "mean the same thing," however, can differ depending on users’ needs. If a researcher is looking for high-level, general information on how mdm2 affects apoptosis, he would probably be satisfied with a relation like mdm2 - inhibit - apoptosis, in which there is no attempt to distinguish, for example, mdm2 gene and mdm2 protein. For this researcher’s purpose, these particulars are not important and ignoring them provides a less cluttered display and quicker access to the knowledge he needs. Another researcher, however, may be looking for information specific to mdm2 protein, and want to keep a distinction between a gene and its products. She would then not want mdm2 gene and mdm2 protein combined into the same object mdm2, but kept in different relations. The Advanced Options with respect to aggregation attempt to provide users with this flexibility.
When relation aggregation is used, relations that have been consolidated are shown in the AZGeNetScene table and network display as a single relation representing the underlying consolidated ones. While this simplifies the display, the underlying relations have not disappeared; for each aggregated relation, the set of consolidated relations that it represents may be viewed by rolling the mouse over the aggregated relation. Double-clicking an aggregated relation in either the table or network will also retrieve the abstract texts for the biomedical articles that the underlying relations were extracted from. In this way, aggregation can organize and reorganize relations pulled directly from text without information loss.
Several features have been identified by which relation objects and the connectors between these objects can be organized, or grouped. These features are represented by checkboxes in Advanced Options. Selecting one of these checkboxes indicates that before two relation objects or connectors are combined, they must share the same value for this feature. That is, their values for this feature must "match." In the previous scenario, for example, the first researcher is not interested in maintaining a distinction between genes and proteins; he does not require objects to have the same gene/protein feature value in order to combine them into the same relation. Consequently, he would leave this feature unselected before performing a search. The second researcher, however, is interested in the distinction, so she would select the feature before searching. Detailed descriptions of the different features are given below.
Aggregatable Substance refers to genes and gene products. All references to a particular gene or a particular protein share the same Aggregatable Substance value. For example, p53, tp53, and trp53 are associated with the same Aggregatable Substance, which is different from that shared by QM and tumor suppressor QM. When this feature is selected for aggregation, two relation objects must contain references to the same Aggregatable Substance before they are combined (if any Aggregatable Substance is identified at all).
Function refers to a biological process, such as apoptosis or angiogenesis, or an action performed on an aggregatable substance, such as phosphorylation or inhibition. An example: if Aggregatable Substance is selected for aggregation, but this feature is not, p53 inhibition and p53 activation will be combined into a relation entity p53. If this feature is selected, the two will be kept distinct, as they have different values for the Function feature.
Substance Type refers to the "type" of aggregatable substance. Right now there are three recognized types: gene, protein, and mRNA. If a Substance Type is identified for a relation object, then the value of Substance Type must match that of another before it is combined with it. For example, if Aggregatable Substance is selected, but this feature is not, then p53 protein and p53 gene are combined into a relation entity p53. If this feature is selected, the two are kept distinct, as they have different values for the Substance Type feature.
Mutation only has two values, mutated or not mutated. If Aggregatable Substance is selected and this feature is not, then wild-type p53 and mutated p53 are combined into p53. If this feature is selected, the two will be kept distinct, as wild-type has a Mutation value "no", while mutated has Mutation value "yes".
Connector Associator refers, essentially, to verbs. This feature attempts to resolve verbs that occur in multiple forms, but have the same stem. For example, inhibit, inhibits, inhibited, and inhibiting all share the Connector Associator value "inhibit". These are all essentially the same verb, but without aggregation, the different morphological forms of the verb prevent the system from recognizing them as such and displaying them together.
Connector Type goes a step further than resolving morphological forms of the same verb. This feature groups verbs into one of four "types": Activates, Inhibits, Causal, and Association. Activates includes any connector that indicates up-regulation: activate, promote, induce, etc. Inhibits includes any connector that indicates down-regulation: inhibit, degrade, suppress, etc. Causal is the value assigned to a connector when it is known that the relationship between two objects is directional, though the exact nature of the relationship is not known. Association is assigned when neither the nature nor the direction of the relationship is known, but a connection exists. An effect of selecting this feature is that the same two objects can have no more than four links between them, which is useful for reducing large, cluttered networks.
Residuals encompasses all the text of a relation object or connector that cannot be associated with one of the above features. Selecting this feature for aggregation means that in order for two objects or two connectors to be combined, all of the text of the objects or connectors that could not be assigned a feature value must match exactly. For example, if all features are selected for aggregation except Residuals, then the relation sequential transfer of wild-type p53 - efficiently induces - apoptosis is consolidated with wild-type p53 - frequently induces - apoptosis, since the objects and connectors in these two relations share the same values for all identified features (aggregatable substance, mutation, connector associator, and function). The string sequential transfer, however, was not assigned any feature value. Consequently, selecting Residuals for aggregation would not consolidate these two relations, as the residual value for the first object in the first relation (sequential transfer), does not match that for the second relation (which has an empty string for a residual value).
The user may select any combination of features for aggregation, but some predetermined combinations are offered as "levels of aggregation". These are represented by a dropdown box in Advanced Options; selecting one of the levels in the dropdown automatically selects the associated features. Baseline is a special level in which there is no aggregation at all. At this level, only relations in which the exact text of both objects and the associated connector match are combined into one relation in the display. Otherwise each relation is shown separately. All predefined aggregation levels are described below.
1. Baseline - the full text of the entity or connector labels must match.
2. Feature Match
a) all identified entity features must match
b) morphological forms of the same connector verb are combined
3. Typed Substance
a) entities with different identifiable substance types are not matched
b) morphological forms of the same connector verb are combined
4. Aggregatable Substance
a) references to a gene and its gene products are matched
b) morphological forms of the same connector verb are combined
5. Simple Pathway
a) references to a gene and its gene products are matched
b) connector verbs are classified into one of 4 categories
Baseline aggregation is used to provide the maximum amount of differentiation in a visualization. We expect that it is most useful as a baseline for comparison of aggregation system results. Baseline aggregation makes no attempt to combine equivalent objects unless they are labeled with exactly the same words. Thus neither the relation nor any of the elements of Mdm2 - inhibits - apoptosis would be matched with Mdm2 genes - are involved in - regulation of apoptosis. Baseline aggregation minimizes information loss but accomplishes very little consolidation.
Feature Match aggregation increases network consolidation by comparing feature values assigned to an entity. If a substance has been identified as mutated, or recognized as present in a particular tissue type or cellular domain, it is matched only with similarly identified items. For instance, mdm2 antisense oligodeoxynucleotide - induces - Apoptosis and anti-sense MDM2 - induces - apoptosis would be aggregated because the connectors and second entity (induces - apoptosis) match and both MDM2 entities can be identified as mutated (antisensed) forms of the substance MDM2.
Typed Substance aggregation differentiates between substance types. In a network aggregated at this level, relations involving protein MDM2 would not be combined with references to the MDM2 gene. In this case, Mdm2 oncoprotein inhibits apoptosis would be aggregated with MDM2 oncoprotein - has been shown to inhibit - apoptosis but not with MDM2 genes - are involved in - regulation of apoptosis because the gene would not be considered equivalent to the protein.
Aggregatable Substance aggregation assigns equivalence to references to a gene and its related gene products. At this level of aggregation, no attempt is made to distinguish between interactions related to a particular gene and interactions for the protein that gene encodes. This is partly a practical matter. Across a set of abstracts the exact same phrase is frequently used to refer to a gene and to the related protein making it difficult to distinguish these references. Analysis of nearby words and other cues in the document can help address but not eliminate this ambiguity. Also, as a matter of application, a researcher studying effects of the gene TP53 might well be interested in references to the protein p53 because the presence of the protein is related to expression of the gene. Entities here may also be biological functions and connectors at this level are combined when the same verb is expressed in a different morphological form (e.g., MDM2 - inhibits - apoptosis and MDM2 oncoprotein - has been shown to inhibit - apoptosis are considered to be equivalent relations).
Simple Pathway aggregation creates a high level view of the information extracted from a text. This kind of relation can be used for example, as input into a data mining algorithm. Connecting verbs would be classified as belonging to one of 4 categories: induce, inhibit, directional association, non-directional association. Relations at this level might be comparable to relations extracted by parsers that identify only single semantic types of relations where only inhibition relations are extracted. Relations could be filtered so that only items with two recognizable substances and particular types of interaction are included. For a researcher, this simple representation of a pathway relation can be viewed as an outline or backbone of the regulatory network. For example, each of these relations,
MDM2 inhibits apoptosis
MDM2 oncoprotein abrogates apoptosis
Human MDM2 interferes with p53-mediated cell death
can be aggregated into a simple relation MDM2 inhibits apoptosis. Gene MDM2 and its product MDM2 oncoprotein are matched to each other as the aggregatable substance MDM2, the verbs inhibits, abrogates and interferes with would all belong to the category inhibit, and apoptosis and cell-death would be identified as equivalent functions.
Two additional checkboxes appear in Advanced Options: Simple Pathway Filter and Single Link Mode. These features are not specifically associated with relation aggregation, but are provided for convenience in the relation display.
Simple Pathway Filter restricts the type of relations returned by a search. When this filter is selected, only relations determined to be part of a gene regulatory network are returned, that is, relations in which both objects have an identified value for either the Aggregatable Substance or Function features. This option reduces network noise, which is especially useful in large networks.
Single Link Mode reduces all relations between the same two objects to one link. If there is any connection between two objects, Single Link Mode will indicate only that a connection exists, and not attempt to characterize the nature of the connection. This is again particularly convenient for large networks. As with the other types of relation consolidation, the individual connections making up a link in Single Link Mode can be retrieved by rolling the mouse over the relation, or double-clicking it to get all the source abstracts for the underlying relations.
Papers detailing the aggregation process are available from the UA Artificial Intelligence Lab. For more information please contact the AZGeNetScene team or visit the AZGeNetScene Project home page.