Input files

This page is dedicated to input file of ODAMNet.

Target genes

Warning

Gene IDs have to be consistent between input data (target genes, GMT and networks)
When data are retrieved by queries, HGNC IDs are used.

Choose one of these input parameters according your input data:

-c, --chemicalsFile FILENAME: Contains a list of chemicals. They have to be in MeSH identifiers (e.g. D014801). Each line contains one or several chemical IDs, separated by “;”.

1. Chemicals file

By default, ODAMNet retrieved chemical target genes list from the the Comparative Toxicogenomics Database [1] (CTD) using queries. This file contains a list of chemicals IDs (MeSH, e.g. D014801). Each line contains one or several chemical IDs, separated by “;”.

D014801;D014807
D014212
C009166

ODAMNet approaches are applied in each line separately. If a line contains multiple chemicals, target genes of each chemical will be retrieved and merged as unique target genes list.

Chemical target genes are retrieved in HGCN format.

2. Target genes file

ODAMNet can also used input data provided by the user. This target genes file contains a list of genes. One gene per line.

AANAT
ABCB1
ABCC2
ABL1
ACADM

3. CTD file

This third way to retrieved target genes is well appropriate to do reproducible analysis or to use a specific database version. The required file contains 9 columns:

Input: query input (e.g chemical IDs from chemicals file)
ChemicalName: name of the query input or its descendant chemicals
ChemicalId: MeSH ID of the query or its descendant chemicals
CasRN: CasRN ID of the query or its descendant chemicals
GeneSymbol: names of target genes that are connected to the query or its descendant chemicals
GeneId: target gene ID (HGCN)
Organism: organism name
OrganismId: organism ID
PubMedIds: PubMed IDs of publications that talk about this connection

Input       ChemicalName    ChemicalId      CasRN   GeneSymbol      GeneId  Organism        OrganismId      PubMedIds
d014801     Tretinoin       D014212 302-79-4        ZYG11A  440590  Homo sapiens    9606    23724009|33167477
d014801     Tretinoin       D014212 302-79-4        ZYX     7791    Homo sapiens    9606    23724009
d014801     Tretinoin       D014212 302-79-4        ZZZ3    26009   Homo sapiens    9606    33167477
d014801     Vitamin A       D014801 11103-57-4      ACE2    59272   Homo sapiens    9606    32808185
d014801     Vitamin A       D014801 11103-57-4      AKR1B10 57016   Homo sapiens    9606    19014918

This kind of files is created as query results with query mode of ODAMNet.

Pathways/processes of interest

By default, ODAMNet retrieved all rare disease pathways and all human pathways from WikiPathways [2] using queries. Genes involved in rare disease pathways are retrieved in HGCN format.

Moreover, the user can also provide their own pathways/processes of interest. Two types of files are required by ODAMNet:

--GMT FILENAME: It’s a tab-delimited file that describes gene sets of pathways/processes of interest. Pathways can come from several sources. Each row represents a gene set.
--backgroundFile FILENAME: This file contains the list of the different background file source. They have to be in the same order that they appear on the GMT file. Each file is a GMT file (see above).

GMT file

This file contains genes composition of the pathways/processes of interest. There are at least three columns:

pathwayIDs: first column is pathway IDs
pathways: second column is pathway names - Optional, you can fill it in a dummy field
HGNC: all the other columns contain genes inside pathway. The number of columns is different for each pathway and varies according the number of genes inside.

The GMT file is organized as follow:

pathwayIDs  pathways        HGNC
WP5195      Disorders in ketolysis  ACAT1   HMGCS1  OXCT1   BDH1    ACAT2
WP5189      Copper metabolism       ATP7B   ATP7A   SLC11A2 SLC31A1
WP5190      Creatine pathway        GAMT    SLC6A8  GATM    OAT     CK

For more details, see GMT file format webpage.

Warning

GMT file must doesn’t contain empty columns.

Background file

In addition to the GMT file, ODAMNet needs another GMT file used as background genes for statistical approaches. It can used different background genes at the same time. So, instead of given directly the background GMT file, ODAMNet takes as input the list of background file name.

hsapiens.GO-BP.name.gmt
hsapiens.REAC.name.gmt
hsapiens.REAC.name.gmt
hsapiens.GO-BP.name.gmt
hsapiens.WP.name.gmt

Background file contains same line number as GMT file and background file names are in the same order that they are in the GMT file.

Examples

Background and GMT files need to be in the same folder.

Three lines of WP background file

hsapiens.WP.name.gmt
hsapiens.WP.name.gmt
hsapiens.WP.name.gmt

Five lines of background files. Same order than in the corresponding GMT file.

hsapiens.GO-BP.name.gmt
hsapiens.REAC.name.gmt
hsapiens.REAC.name.gmt
hsapiens.GO-BP.name.gmt
hsapiens.WP.name.gmt

Three lines of WP pathways

pathwayIDs  pathways        HGNC
WP5195      Disorders in ketolysis  ACAT1   HMGCS1  OXCT1   BDH1    ACAT2
WP5189      Copper metabolism       ATP7B   ATP7A   SLC11A2 SLC31A1
WP5190      Creatine pathway        GAMT    SLC6A8  GATM    OAT     CK

Five pathways of interest. Same order than in the background file.

pathwayIDs  pathways        HGNC
GO:0072001  renal system development        CYP26B1 CFLAR   PLXND1  HOXA11  SOX8
REAC:R-HSA-8853659  RET signaling   GAB2    PIK3CB  PRKACA  RAP1GAP DOK5
REAC:R-HSA-157118   Signaling by NOTCH      PLXND1  CREBBP  PSMB1   PSMC4   MAMLD1
GO:0060993  kidney morphogenesis    HOXA11  SOX8    PKD1    WWTR1   FGF10
WP:WP4830   GDNF/RET signalling axis        IFT27   FOXC2   GFRA1   AGTR2   EYA1

Networks

In ODAMNet, two mains network format file are used:

Simple interaction file (SIF)
Graph file (GR)

SIF file

This network format is used in the Active Module Identification (AMI) approach. The SIF file contains three columns: source node, interaction type and target node with header. It’s a tab-separated file.

node_1      link    node_2
AAMP        ppi     VPS52
AAMP        ppi     BHLHE40
AAMP        ppi     AEN
AAMP        ppi     C8orf33
AAMP        ppi     TK1

For more details, see SIF file format webpage.

GR file

This network format is used in the Random Walk with Restart (RWR) approach. The GR format contains two columns: source node and target node, without header. It’s a tab-separated file.

NFYA        NFYB
NFYA        NFYC
NFYB        NFYC
BTRC        CUL1
BTRC        SKP1

Configuration file

Warning

Follow the same folder tree used in multiXrank

To perform a RWR, multiXrank [3] needs a configuration file as input. This file contains path of networks used. It could be short (see bellow) or very detailed with parameters.

For more details about this file, see the multiXrank’s documentation: Github / Documentation.

This is an example of short configuration file:

 multiplex:
     1:
         layers:
             - multiplex/1/Complexes_gene_names_190123.gr
             - multiplex/1/Pathways_reactome_gene_names_190123.gr
             - multiplex/1/PPI_HiUnion_LitBM_APID_gene_names_190123.gr
     2:
         layers:
             - multiplex/2/RareDiseasePathways_network_useCase1.gr
 bipartite:
     bipartite/Bipartite_RareDiseasePathways_geneSymbols_useCase1.gr:
         source: 2
         target: 1
 seed:
     seeds.txt

 multiplex:
     1:
         layers:
             - multiplex/1/Complexes_gene_names_190123.gr
             - multiplex/1/Pathways_reactome_gene_names_190123.gr
             - multiplex/1/PPI_HiUnion_LitBM_APID_gene_names_190123.gr
     2:
         layers:
             - multiplex/2/DiseaseSimilarity_network_2022_06_11.gr
 bipartite:
     bipartite/Bipartite_genes_to_OMIM_2022_09_27.gr:
         source: 2
         target: 1
 seed:
     seeds.txt

Tip

Whatever the networks used, the command line is the same. You have to change the network name inside the configuration file.

Input files

Target genes

1. Chemicals file

2. Target genes file

3. CTD file

Pathways/processes of interest

GMT file

Background file

Examples

Networks

SIF file

GR file

Configuration file

References