Overlap analysis

Principle

The overlap analysis calculates the overlap between chemicals target genes and rare disease pathways. In other words, it looks for target genes that are part of rare disease pathways, i.e. direct overlap (Fig. 1 - left part). This approach is presented in Ozisik et al., [1] for a specific use case.

First, an overlap between target genes and all the rare disease pathways is computed. Then, a statistical significance is calculated using an hypergeometric test. Finally, a Benjamini-Hochberg (BH adjusted) correction is applied to correct the pvalues.

Usage

By default, data are directly retrieved from databases using queries (Fig. 2: section data retrieved by queries). Chemical target genes are retrieved from the Comparative Toxicogenomics Database [2] (CTD) using --chemicalsFile parameter. All rare disease pathways are retrieved from WikiPathways [3] automatically.

In addition, the user can provide their own target genes and pathways/processes of interest (Fig. 2: section data provided by user) using --targetGenesFile , --GMT and --backgroundFile.

The --outputPath parameter is used whatever how data are retrieved.

overlapUsageFig

Fig. 2 : Input and output of Overlap analysis

(Left part) - By default, chemical target genes and rare disease pathways are retrieved using automatic queries. The user can also provide their own data. Required inputs are represented with pink and green solid border line boxes whereas optional input are represented with dashed border line boxes. (Right part) - Output files in pink are created only if the input data are retrieved by queries.

Input parameters for the Overlap analysis

Warning

  • Gene IDs have to be consistent between input data (target genes, GMT and networks)

  • When data are retrieved by queries, HGNC IDs are used.

To use data retrieved from databases, see parameters on the Data retrieved by queries tab.
To provide your own data, see parameters on the Data provided by user tab.
-c, --chemicalsFile FILENAME

Contains a list of chemicals. They have to be in MeSH identifiers (e.g. D014801). Each line contains one or several chemical IDs, separated by “;” [FORMAT] [required]

--directAssociation BOOLEAN
TRUE: retrieve genes targeted by chemicals, from CTD
FALSE: retrieve genes targeted by chemicals and theirs descendant chemicals, from CTD
[default: True]
--nbPub INTEGER

Each interaction between target gene and chemical can be associated with publications. You can filter these interactions according the number of associated publications. You can define a minimum number of publications to keep an association [default: 2]

-o, --outputPath PATH

Folder name to save results. [default: OutputResults]

Use-cases command lines

Examples of command lines with Data retrieved by queries and Data provided by user.

odamnet overlap --chemicalsFile useCases/InputData/chemicalsFile.csv \
                --directAssociation FALSE \
                --nbPub 2 \
                --outputPath useCases/OutputResults_useCase1/

References