greatpy

Implementation of GREAT in Python

Installation

You require Python 3.8 or newer installed on your system. In case you do not have Python installed, we recommend installing Miniconda <https://docs.conda.io/en/latest/miniconda.html>_.

Options to install greatpy:

Install the latest release of greatpy from PyPI <https://pypi.org/project/greatpy/>:

 pip install greatpy

Install the latest development version:

 pip install git+https://github.com/theislab/greatpy.git@main

Notebooks

Information	link
Create regulatory domains file (regdom)	notebook
Enrichment test (binomial/hypergeometric)	notebook
Plotting of results	notebook
Comparisons with GREAT	notebook

Getting started

Please refer to

What is greatpy:

greatpy is a bioinformatics method that associates custom genomic regions to Gene Ontology (GO) terms by weighting genomic neighborhoods. It is based on and inspired by and inspired by GREAT (Genomic Regions Enrichment of Annotations Tool).

GREAT figure issue from GREAT article

Usage:

1. Create regulatory domain from tss

Translate tab-separated files (.tsv or .bed format) containing the following information:
1. Transcription start site annotations:chromosome_number \t position \t strand \t gene_name.
2. Chromosome sizes file should have the following columns :chromosome_number \t chromosome_size.

See data for input files

regdom = greatpy.tl.create_regdom(
    tss_file=Input_TSS_path,  # eg : "../data/human/hg38/tss.bed"
    chr_sizes_file=Input_chromosome_size_path,  # eg : "../data/human/hg38/chr_size.bed"
    association_rule="Basalplusextention",
    out_path=path_save_output,
)

Allowed association rules are:

Basalplusextention
OneCloset
TwoCloset

2. Get enrichment of GO term in the tests genomics regions

This step calculates the significance of a custom set of genomic annotations through peak-gene mapping, using distal cis-regulatory regions of the genome.
Input files :
test file should have the following columns :chr \t chr_start \t chr_end.
regulatory domain file should have the following columns :chr \t chr_start \t chr_end \t name \t tss strand
chromosome size file should have the following columns :chromosome_number \t chromosome_size.
annotation file should have the following columns :ensembl \t id \t name \t ontology.group \t gene.name \t symbol

See test cases for genomic input files.

res = greatpy.tl.enrichment(
    test_file=Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_file=regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    chr_size_file=chromosome_size_path_or_df,  # eg : "../data/human/hg38/chr_size.bed"
    annotation_file=annotation_path_or_df,  # eg : "../data/human/ontologies.csv"
)

Allowed tests for this function such as :

binom (default True): it calculates the binomial p-value.
hypergeom (default True): it calculates the hypergeometric p-value.

Additionally, it is also possible to apply a Bonferroni and/or FDR correction to the found p-values:

res = great.tl.set_fdr(res, alpha=0.05)
res = great.tl.set_bonferroni(res, alpha=0.05)

3. Plot

1 Distribution of custom genomic annotations in regulatory domains

Number of genetic associations per genomic region.
Distance to the associated gene TSS for each genomic region studied.
Absolute distance to the associated gene TSS for each genomic region studied.

fig, ax = plt.subplots(1, 3, figsize=(30, 8))
greatpy.pl.graph_nb_asso_per_peaks(
    Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    ax[0],
)
greatpy.pl.graph_dist_tss(
    Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    ax[0],
)
greatpy.pl.graph_absolute_dist_tss(
    Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    ax[0],
)
plt.show()

2 Enrichments by GO terms (dotplot) - one input

plot = enrichment_df.rename(columns={"binom_p_value": "p_value", "go_term": "name"})
plt.figure(figsize=(10, 10))
great.pl.plot_enrich(plot)

3 Enrichments by GO terms (dotplot) - multiple inputs

test = ["name_bindome_biosample_1", "name_bindome_biosample_2", "..."]
tmp_df = great.tl.enrichment_multiple(
    tests=test,
    regdom_file="../data/human/hg38/regulatory_domain.bed",
    chr_size_file="../data/human/hg38/chr_size.bed",
    annotation_file="../data/human/ontologies.csv",
    binom=True,
    hypergeom=True,
)

Notes

Both binomial and hypergeometric tests may be susceptible to biases of which one must be aware to analyze the results critically. The binomial test reduces the hypergeometric bias by taking into account exactly the size of the regulatory domains of the genes, whereas the hypergeometric test compensates for the bias of the binomial test by counting each gene only once. The two types of tests are complementary and are recommended to be analyzed together.

Release notes

See the changelog.

Contact

For questions and help requests, you can reach out in the scverse discourse. If you found a bug, please use the issue tracker.

Citation

If greatpy is useful for your research, please consider to cite as:

@software{greatpy,
author = {Ibarra, Mauger-Birocheau},
doi = {},
month = {},
title = {{greatpy}},
url = {https://github.com/theislab/greatpy},
year = {2022}
}

References

@article{GREAT,
author   = {McLean, C.
            and Bristor, D.
            and Hiller, M. et al.},
title    = {GREAT improves functional interpretation of cis-regulatory regions},
journal  = {Nat Biotechnol},
year     = {2010},
month    = {May},
day      = {02},
volume   = {28},
number   = {495},
pages    = {501},
doi      = {10.1038/nbt.1630},
url      = {https://doi.org/10.1038/nbt.1630}
}

@Manual{rGREAT,
title = {rGREAT: GREAT Analysis - Functional Enrichment on Genomic Regions},
author = {Zuguang Gu},
year = {2022},
note = {https://github.com/jokergoo/rGREAT, http://great.stanford.edu/public/html/},
}

Name		Name	Last commit message	Last commit date
Latest commit History 774 Commits
.github/workflows		.github/workflows
data		data
docs		docs
greatpy		greatpy
notebooks		notebooks
sketch		sketch
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.codecov.yaml		.codecov.yaml
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

greatpy

Installation

Notebooks

Getting started

What is greatpy:

Usage:

1. Create regulatory domain from tss

2. Get enrichment of GO term in the tests genomics regions

3. Plot

1 Distribution of custom genomic annotations in regulatory domains

2 Enrichments by GO terms (dotplot) - one input

3 Enrichments by GO terms (dotplot) - multiple inputs

Notes

Release notes

Contact

Citation

References

About

Releases

Packages

Contributors 2

Languages

License

theislab/greatpy

Folders and files

Latest commit

History

Repository files navigation

greatpy

Installation

Notebooks

Getting started

What is greatpy:

Usage:

1. Create regulatory domain from tss

2. Get enrichment of GO term in the tests genomics regions

3. Plot

1 Distribution of custom genomic annotations in regulatory domains

2 Enrichments by GO terms (dotplot) - one input

3 Enrichments by GO terms (dotplot) - multiple inputs

Notes

Release notes

Contact

Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages