Use-case 2: data are provided by the user

Context

In Ozisik et al., 2021 [1] paper, an overlap analysis is performed between vitamin A and D target genes and processes related to Congenital Anomalies of the Kidney and Urinary Tract (CAKUT).

In this use-case, we illustrate how to use ODAMNet with data provided by user. We use data from Ozisik et al., 2021 [1] paper. Results are compared with those found in the Ozisik et al., 2021 [1] paper.

Vitamin A target genes are coming from Balmer and Blomhoff [2] paper and pathways/processes related to CAKUT are coming from WikiPathways [3], Reactome [4] and Gene Ontology (GO) [5], [6]. Biological networks used are presented in the Networks used page and are coming from the Network Data Exchange (NDEx) [7].

GitHub and GitHub are available in GitHub.

This section presents you how to apply the three different approaches proposed.

Overlap analysis

The Overlap analysis searches intersecting genes between vitamin A target genes and genes involved in pathways/processes related to CAKUT pathways. See Overlap analysis page for more details.

Running Overlap analysis with data provided by user

Target genes file is given using the --targetGenesFile parameter. It contains the list of vitamin A target genes [FORMAT].

Pathways/processes related to CAKUT are given using the --GMT parameter [FORMAT]. Background genes of each source of pathways/processes is required (--backgroundFile parameter). The file contains the list of background file names.

Results files are saved into useCases/OutputResults_useCase2/ folder.

odamnet overlap --targetGenesFile useCases/InputData/VitA-Balmer2002-Genes.txt \
                --GMT useCases/InputData/PathwaysOfInterest.gmt \
                --backgroundFile useCases/InputData/PathwaysOfInterestBackground.txt \
                --outputPath useCases/OutputResults_useCase2

Overlap_genesList_withpathOfInterest.csv file is created. It contains results of the overlap analysis between vitamin A target genes and CAKUT related pathways/processes.

For more details about this output file, see Overlap analysis output files section.

Results of Overlap analysis with data provided by user

Data provided by user overview

Target genes are coming from Balmer and Blomhoff [2]. Pathways of interests are coming from Reactome [4] and WikiPathways [3] and processes of interest are coming from Biological Process (GO) [5], [6]. Data are presented in the Table 13.

Table 13 - Overview of data provided by user

Number

Vitamin A target genes

521

Pathways/processes related to CAKUT

13

Pathways/processes sources

3

Overlap analysis results

We performed an Overlap analysis between vitamin A target genes (521) and pathways/processes related to CAKUT (13). We obtained significant overlap between target genes and 7 pathways/processes related to CAKUT (pAdjusted <= 0.05). Results are presented in the Table 14.

Table 14 - Significant overlap results between vitamin A target genes and pathways/processes related tp CAKUT

PathwayIDs

PathwayNames

pAdjusted

IntersectionSize

GO:0072001

renal system development

8.64e-17

43

GO:0001822

kidney development

5.14e-16

41

GO:0060993

kidney morphogenesis

8.33e-12

20

WP:WP5053

Development of ureteric collection …

7.37e-08

15

WP:WP4823

Genes controlling nephrogenesis

2.88e-04

10

PMC5748921-PMC6115658

CAKUT causal genes

2.30e-03

6

WP:WP4830

GDNF/RET signalling axis

1.54e-02

5

Ozisik et al., [1] identified 7 pathways/processes related to CAKUT disease over 13. ODAMNet found the same 7 pathways/processes.

Active Module Identification (AMI)

The Active Module Identification (AMI) approach identifies active module that contains high number of vitamin A target genes using a protein-protein interaction (PPI) network. AMI is performed using DOMINO [8]. Then, an Overlap analysis is applied between identified active modules and CAKUT pathways/processes. See Active Module Identification page for more details.

Running AMI with data provided by user

Warning

When using DOMINO server, results cannot be identically reproduced. Indeed, DOMINO server doesn’t allow to set the random seed. This random seed changes every new analysis.

Target genes file is given using the --targetGenesFile parameter. It contains the list of vitamin A target genes [FORMAT].

Pathways/processes related to CAKUT are given using the --GMT parameter [FORMAT]. Background genes of each source of pathways/processes is required (--backgroundFile parameter). The file contains the list of background file names.

We used a PPI network [SIF file] previously downloaded from NDEx [7]. The PPI network file is provided using --networkFile parameter (see Network downloading section). Network name should have .sif extension.

Results files are saved into useCases/OutputResults_useCase2/ folder.

odamnet domino  --targetGenesFile useCases/InputData/VitA-Balmer2002-Genes.txt \
                --GMT useCases/InputData/PathwaysOfInterest.gmt \
                --backgroundFile useCases/InputData/PathwaysOfInterestBackground.txt \
                --networkFile useCases/InputData/PPI_HiUnion_LitBM_APID_gene_names_190123.sif \
                --outputPath useCases/OutputResults_useCase2

Several files are generated:

  • DOMINO_inputGeneList_genesList.txt: vitamin A target genes list used for the active module identification.

  • Overlap_AM_*_genesList_withpathOfInterest.csv: results of the Overlap analysis between identified active modules genes and pathways/processes of related to CAKUT. There is one file per active module.

  • DOMINO_genesList_activeModulesNetwork.txt, DOMINO_genesList_overlapAMresults4Cytoscape.txt, DOMINO_genesList_activeModules.txt, DOMINO_genesList_activeModulesMetrics.txt and DOMINO_genesList_signOverlap.txt: some statistics are calculated and saved into files. Theses files are useful for visualisation.

For more details about these files, see AMI output files section.

Results of AMI with data provided by user

Data provided by user overview

Target genes are coming from Balmer and Blomhoff [2]. Pathways of interests are coming from Reactome [4] and WikiPathways [3] and processes of interest are coming from Biological Process (GO) [5], [6]. Data are presented in the Table 15.

Table 15 - Overview of data provided by user

Number

Vitamin A target genes

521

Pathways/processes related to CAKUT

13

Pathways/processes sources

3

The PPI network is downloaded from NDEx (see Protein-Protein Interaction (PPI) network). It was build from 3 datasets: Lit-BM, Hi-Union and APID. It contains 15,390 nodes and 131,087 edges.

AMI results

DOMINO defines vitamin A target genes as active genes and searches active modules enriched in active genes. Over the 521 target genes, 468 are found in the PPI and used as active genes by DOMINO. DOMINO identified 21 active modules enriched in vitamin A target genes (Table 16).

Table 16 - Composition of the active modules identified enriched in vitamin A target genes by DOMINO

Min number

Max number

Edges

3

223

Nodes

4

99

Target genes

3

19

See DOMINO_genesList_activeModulesNetworkMetrics.txt file for more details.

Overlap analysis results

Then, we perform an Overlap analysis between identified active modules (21) and pathways/processes related to CAKUT (13). We obtained significant overlap between 6 identified active modules and 6 pathways/processes (pAdjusted <= 0.05). Results are presented in Table 17.

Table 17 - Significant overlaps between identified active modules and pathways/processes related to CAKUT

Pathway IDs

Pathway Names

pAdjusted

GO:0072001

renal system development

2.66e-03

GO:0001822

kidney development

2.66e-03

GO:0060993

kidney morphogenesis

2.66e-03

REAC:R-HSA-8853659

RET signaling

5.19e-03

WP:WP5053

Development of ureteric collection system

3.21e-02

REAC:R-HSA-195721

Signaling by WNT

3.47e-02

Duplicates between active modules results are removed and we keep the more significant ones.

The RET signaling and Signaling by WNT reactome pathways were not identified with the overlap approach in the previous study [1] neither with the ODAMNet Overlap analysis.

See DOMINO_genesList_signOverlap.txt file for more details.

Visualisation of AMI results

We created a visualisation of AMI results (Fig. 19) using Cytoscape [9].

We found a significant overlap between 6 active modules and 6 pathways/processes related to CAKUT. For sake of visualisation, we selected only three of them (Fig. 23). You can find the entire visualisation in the cytoscape project called AMI_visualisation.cys in GitHub.

useCase2_AMIFig

Fig. 23 : Visualisation of 3 active modules and their associated pathways/processes related to CAKUT

Genes are represented by nodes. Grey nodes are target genes, white nodes are non-target genes. Overlap results between active modules and rare disease pathways are displayed using donuts color around nodes. Each color corresponds to a rare disease pathways. Creation steps are explained in the Active module identification results visualisation section.

Module topology is different between modules and associated pathways/processes related to CAKUT also vary (Fig. 23).

The first active module (left in the Fig. 23) is very connected and contains genes involved in several pathways and processes related to CAKUT. Association between vitamin A and RET signaling reactome pathway is indirect. Indeed, genes involved in this pathway are not genes targeted by vitamin A. That explains why we didn’t found this pathways with the Overlap analysis (see Use-case 2 overlap results).

The second active module (middle in the Fig. 23) is also highly connected but its contains genes involved only in one pathways/processes related to CAKUT (Development of ureteric collection system).

The third active module (right in the Fig. 23) is sparser. It contains target genes and non-target genes involved in kidney morphogenesis and Signaling by WNT.

Random Walk with Restart (RWR)

The Random Walk with Restart (RWR) approach mesures proximities between vitamin A target genes and pathways/processes related to CAKUT. To calculate these proximities (RWR scores), we used multiXrank [10] and multilayer networks. See Random Walk with Restart page for more details.

The multilayer network is composed of three gene networks and one pathways/processes related to CAKUT network. Genes nodes are connected to pathways/processes nodes if they are involved in. See Networks used page for more details.

Running RWR with data provided by user

Target genes file is given using --targetGenesFile parameter. It contains list of vitamin A target genes [FORMAT].

multiXrank needs as input a configuration file (--configPath) that contains path of networks and analysis parameters. We used multiXrank with default parameters.

We provide a name file to save vitamin A target genes (i.e. seeds) --seedsFile useCases/InputData/seeds.txt and also a SIF file name (--sifFileName) to save the top nodes based on RWR score (--top 20).

Results files are saved into useCases/OutputResults_useCase2/ folder.

odamnet multixrank      --targetGenesFile useCases/InputData/VitA-Balmer2002-Genes.txt \
                        --configPath useCases/InputData/config_minimal_useCase2.yml \
                        --networksPath useCases/InputData/ \
                        --seedsFile useCases/InputData/seeds.txt \
                        --sifFileName UseCase2_RWR_network.sif \
                        --top 5 \
                        --outputPath useCases/OutputResults_useCase2

Tip

- Downloading of multiplex network from NDEx: Networks used + Network downloading
- Creation of the pathways/processes network: Networks used + Network creation
- Configuration file explanation and example: Configuration file section

Several files are generated into RWR_genesList/ folder:

  • config_minimal_useCase2.yml and seeds.txt: copies of the input files

  • multiplex_1.tsv and multiplex_2.tsv: RWR scores for each multilayer. 1 is the genes multilayer network RWR scores and 2 is the pathways/processes related to CAKUT network RWR scores.

  • UseCase2_RWR_network.sif: SIF file name that contains the network result

  • RWR_topX.txt: Top X of pathways/processes related to CAKUT

For more details about these files, see RWR output files section.

Results of RWR with data provided by user

Data provided by user overview

Target genes are coming from Balmer and Blomhoff [2]. There are 521 vitamin A target genes.

RWR results

Extra : analysis with disease-disease similarity network

Tip

Same command line, but needs to change configuration file.

We also propose to run an RWR approach using a disease-disease similarity network. In this network, rare diseases are linked together according their phenotype similarity whereas in the previous network they were not at all connected.

We used a multilayer network composed of three genes network and one disease-disease similarity network (Fig. 18 - right, Genes multilayer network, Disease-disease similarity network).

multiXrank defines vitamin A target genes as seeds. Over the 521 target genes retrieved from CTD, 480 are found in the multilayer and used as seeds. Using the RWR scores (i.e. proximity score with the target genes), rare disease are prioritized. We selected the top 20 and presented the top 5 (Table 19).

Table 19 - Rare disease prioritization using RWR score. The top 5 is displayed.

Nodes (disease IDs)

Disease Names

RWR scores

OMIM:178500

Pulmonary fibrosis, idiopathic

3.47e-04

OMIM:125853

Diabetes mellitus, noninsulin-dependent

3.13e-04

OMIM:215600

Cirrhosis, familial

2.73e-04

OMIM:613659

Gastric cancer, somatic

2.38e-04

OMIM:104300

Alzheimer disease

2.34e-04

We created a visualisation of the results (Fig. 25) using Cytoscape [9]. You can retrieved it in the cytoscape project called RWR_visualisation.cys in GitHub. The Fig. 25 presents the top 5 of rare disease pathways, ordered by RWR score.

useCase2_RWRSimFig

Fig. 25 : Top 5 of the rare disease prioritized using RWR score using disease-disease similarity network

Rare disease are in pink triangles. Target genes are in grey and non-target genes are in white. Creation steps are explained in the Random walk with restart results visualisation section.

Overlap, AMI and RWR results comparison

We compare results obtained with the three different approaches: Overlap analysis, Active Module Identification (AMI) and Random Walk with Restart (RWR). We used orsum [11], a Python package to filter and integrate enrichment analysis from several analyses. The main result is a heatmap, presented in Fig. 26.

orsum.py --gmt 00_Data/hsapiens_background.gmt \
         --files 00_Data/Overlap.4Orsum 00_Data/DOMINO.4Orsum 00_Data/RWR_top5.4Orsum \
         --fileAliases Overlap AMI RWR \
         --maxRepSize 0 \
         --outputFolder UseCase2_orsum
useCase2_orsum

Fig. 26 : Overlap analysis (Overlap), Active module identification (AMI) and Random walk with restart (RWR) results integration and comparison using orsum

The --maxRepSize parameter is set to 0 to consider each term as is own representative term.

Some pathways/processes related to CAKUT are retrieved associated with vitamin A by all the three approaches such as renal system development and kidney development. Some other are retrieved associated with vitamin A only by one (RET signaling) or two (Signaling by WNT) approaches.

References