Welcome to Metagenomic-DeepFRI’s documentation!
#
Overview#
Metagenomic-DeepFRI is a high-performance pipeline for annotating protein sequences with Gene Ontology (GO) terms using DeepFRI, a deep learning model for functional protein annotation.
Protein function prediction is increasingly important as sequencing technologies generate vast numbers of novel sequences. Metagenomic-DeepFRI combines:
Structure information from FoldComp databases (AlphaFold, ESMFold, PDB, etc.)
Sequence-based predictions using DeepFRI’s neural networks
Fast searches with MMseqs2 for database alignment
Significant speedup of 2-12× compared to standard DeepFRI implementation
Pipeline Stages#
Search proteins similar to query in PDB and supply FoldComp databases with MMseqs2.
Find the best alignment among MMseqs2 hits using PyOpal.
Align target protein contact map to query protein with unknown structure.
Run DeepFRI with the structure if found in the database, otherwise run DeepFRI with sequence only.
Built With#
Setup#
pip install mdeepfri