Overlap analysis
Principle
The overlap analysis calculates the overlap between chemicals target genes and rare disease pathways. In other words, it looks for target genes that are part of rare disease pathways, i.e. direct overlap (Fig. 1 - left part). This approach is presented in Ozisik et al., [1] for a specific use case.
First, an overlap between target genes and all the rare disease pathways is computed. Then, a statistical significance is calculated using an hypergeometric test. Finally, a Benjamini-Hochberg (BH adjusted) correction is applied to correct the pvalues.
Usage
By default, data are directly retrieved from databases using queries (Fig. 2: section data retrieved
by queries). Chemical target genes are retrieved from the Comparative Toxicogenomics Database [2] (CTD) using --chemicalsFile parameter.
All rare disease pathways are retrieved from WikiPathways [3] automatically.
In addition, the user can provide their own target genes and pathways/processes of interest
(Fig. 2: section data provided by user) using --targetGenesFile , --GMT and
--backgroundFile.
The --outputPath parameter is used whatever how data are retrieved.
Fig. 2 : Input and output of Overlap analysis
(Left part) - By default, chemical target genes and rare disease pathways are retrieved using automatic queries. The user can also provide their own data. Required inputs are represented with pink and green solid border line boxes whereas optional input are represented with dashed border line boxes. (Right part) - Output files in pink are created only if the input data are retrieved by queries.
Input parameters for the Overlap analysis
Warning
Gene IDs have to be consistent between input data (target genes, GMT and networks)
When data are retrieved by queries, HGNC IDs are used.
Data retrieved by queries tab.Data provided by user tab.- -c, --chemicalsFile FILENAME
Contains a list of chemicals. They have to be in MeSH identifiers (e.g. D014801). Each line contains one or several chemical IDs, separated by “;” [FORMAT] [required]
- --directAssociation BOOLEAN
TRUE: retrieve genes targeted by chemicals, from CTDFALSE: retrieve genes targeted by chemicals and theirs descendant chemicals, from CTD[default: True]- --nbPub INTEGER
Each interaction between target gene and chemical can be associated with publications. You can filter these interactions according the number of associated publications. You can define a minimum number of publications to keep an association
[default: 2]
- -t, --targetGenesFile FILENAME
Contains a list of target genes. One target gene per line. [FORMAT] [required]
- --GMT FILENAME
Tab-delimited file that describes gene sets of pathways/processes of interest. Pathways/processes can come from several sources (e.g. WP and GO:BP). [FORMAT] [required]
- --backgroundFile FILENAME
List of the different background source file name. Each background genes source is a GMT file. It should be in the same order than the GMT file. [FORMAT] [required]
- -o, --outputPath PATH
Folder name to save results.
[default: OutputResults]
Use-cases command lines
Examples of command lines with Data retrieved by queries and Data provided by user.
odamnet overlap --chemicalsFile useCases/InputData/chemicalsFile.csv \
--directAssociation FALSE \
--nbPub 2 \
--outputPath useCases/OutputResults_useCase1/
odamnet overlap --targetGenesFile useCases/InputData/VitA-Balmer2002-Genes.txt \
--GMT useCases/InputData/PathwaysOfInterest.gmt \
--backgroundFile useCases/InputData/PathwaysOfInterestBackground.txt \
--outputPath useCases/OutputResults_useCase2