Active Module Identification

Principle

Note

Active Module Identification (AMI) is performed using DOMINO [1]. The analysis is running on their server [2].

DOMINO is looking for active modules in a network (e.g. protein-protein interaction (PPI) network) (Fig. 1 - middle part).

First, DOMINO defines target genes as active genes. Then DOMINO tries to identify active modules.

Active modules are subnetworks identified as relevant and composed of active genes (i.e. target genes) and other associated genes. Ideally, they will represent functional modules and can thereby reveal biological processes involved in a specific condition.

Finally, we performed an overlap analysis between each identified active module by DOMINO and rare disease pathways.

Overview of the DOMINO algorithm

The Fig. 3 is an overview of the DOMINO algorithm.

DOMINO method

Fig. 3 : Schematic illustration of DOMINO (Fig3 from DOMINO’s paper [1])

A - Step 0: The network is clustered into disjoint and highly connected subnetworks (slices) with the Louvain algorithm, based on modularity optimization.
B - Step 1: The relevant slices (where active genes are over-represented) are detected using the Hypergeometric test. Pvalue are corrected with the FDR method.
C - Step 2a: The most active sub-slice is identified on each relevant slices.
D - Step 2b: The sub-slices are split into putative active modules using the Newmann-Girvan modularity algorithm.
E - Step 3: The final set of active module is identified (under a threshold of Bonferroni qval<=0.05).

For more details, see the DOMINO’s paper [1].

Usage

By default, data are directly retrieved from databases using queries (Fig. 4: section data retrieved by queries). Chemical target genes are retrieved from the Comparative Toxicogenomics Database [3] (CTD) using --chemicalsFile parameter. All rare disease pathways are retrieved from WikiPathways [4] automatically. And the biological network is also downloaded from the Network Data Exchange [5] (NDEx) using --netUUID and --networkFile parameters.

You can provide your own target genes, pathways/processes of interest and biological network (Fig. 4: section data provided by user) using --targetGenesFile, --GMT, --backgroundFile and --networkFile.

The network file is required --networkFile whereas --outputPath is optional.

dominoUsageFig

Fig. 4 : Input and output of Active Modules Identification (AMI)

(Left part) - By default, chemical target genes, rare disease pathways and biological networks are retrieved using automatic queries. The user can also provide their own data. Required inputs are represented with pink and green solid border line boxes whereas optional input are represented with dashed border line boxes. (Right part) - Output files that are in pink, are created only if the input data are retrieved by queries.

Input parameters for the AMI

Warning

  • Gene IDs have to be consistent between input data (target genes, GMT and networks)

  • When data are retrieved by queries, HGNC IDs are used.

To use data retrieved from databases, see parameters on the Data retrieved by queries tab.
To provide your own data, see parameters on the Data provided by user tab.
-c, --chemicalsFile FILENAME

Contains a list of chemicals. They have to be in MeSH identifiers (e.g. D014801). Each line contains one or several chemical IDs, separated by “;” [FORMAT] [required]

--directAssociation BOOLEAN
TRUE: retrieve genes targeted by chemicals, from CTD
FALSE: retrieve genes targeted by chemicals and theirs descendant chemicals, from CTD
[default: True]
--nbPub INTEGER

Each interaction between target gene and chemical can be associated with publications. You can filter these interactions according the number of associated publications. You can define a minimum number of publications to keep an association. [default: 2]

--netUUID TEXT

Network UUID to download biological network from NDEx (e.g. 079f4c66-3b77-11ec-b3be-0ac135e8bacf)

-n, --networkFile FILENAME

Network file name that contains network or to save network. The file is in SIF format [required]

-o, --outputPath PATH

Folder name to save results. [default: OutputResults]

Use-cases command lines

Examples of command lines with Data retrieved by queries and Data provided by user.

odamnet domino  --chemicalsFile useCases/InputData/chemicalsFiles.csv \
                --directAssociation FALSE \
                --nbPub 2 \
                --networkFile useCases/InputData/PPI_HiUnion_LitBM_APID_gene_names_190123.sif \
                --netUUID bfac0486-cefe-11ed-a79c-005056ae23aa \
                --outputPath useCases/OutputResults_useCase1

References