Batch Effect Correction framework for metagenomic data
- About the project
- Installation
- Features
- Usage
- Documentation
- License
- Contributing
- Credits and acknowledgements
- Contact and feedback
The integration of metagenomic data from multiple studies and experimental conditions is essential to understand the interactions between microbial communities in complex biological systems, but the inherent diversity and biological complexity pose methodological challenges that require refined strategies for atlas-level integration. ABaCo, a family of generative models based on Variational Autoencoders (VAEs) combined with an adversarial training, aim for the integration of metagenomic data from different studies by minimizing technical heterogeneity conserving biological significance. The VAE encodes the data into a latent space, while the discriminator is trained to detect the provenance of the data, eliminating variability associated with its origin; concurrently, the data is modeled using distributions suitable for raw counts, and the latent space follows a clustering prior to ensure biological conservation.
An overview of the ABaCo workflow is shown in the figure below:
Tip
It is recommended to install ABaCo inside a virtual environment to manage depenendencies and avoid conflicts with existing packages. You can use the virtual environment manager of your choice, such as poetry
, conda
, or pipenv
.
ABaCo is available on PyPI and can be installed using pip:
pip install abaco
You can also install the package for development by cloning this repository and running the following command:
Warning
We assume you are in the root directory of the cloned repository when running this command. Otherwise, you need to specify the path to the abaco
directory.
pip install -e .
ABaCo's documentation is hosted on Read the Docs. It includes detailed examples, configuration options, and the API reference.
The code in this repository is licensed under the MIT License, allowing you to use, modify, and distribute it freely as long as you include the original copyright and license notice.
The documentation and other creative content are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License, meaning you are free to share and adapt it with proper attribution.
Full details for both licenses can be found in the LICENSE file.
ABaCo is an open-source project, and we welcome contributions of all kinds via GitHub issues and pull requests. You can report bugs, suggest improvements, propose new features, or implement changes. Please follow the guidelines in the CONTRIBUTING file to ensure that your contribution is easily integrated into the project.
- ABaCo was developed by the Multiomics Network Analytics Group (MoNA) at the Novo Nordisk Foundation Center for Biosustainability (DTU Biosustain).
We appreciate your feedback! If you have any comments, suggestions, or run into issues while using ABaCo, feel free to open an issue in this repository. Your input helps us make ABaCo better for everyone.