Welcome to Metagenomic-DeepFRI’s documentation! Stars#

Overview#

Metagenomic-DeepFRI is a high-performance pipeline for annotating protein sequences with Gene Ontology (GO) terms using DeepFRI, a deep learning model for functional protein annotation.

Protein function prediction is increasingly important as sequencing technologies generate vast numbers of novel sequences. Metagenomic-DeepFRI combines:

  • Structure information from FoldComp databases (AlphaFold, ESMFold, PDB, etc.)

  • Sequence-based predictions using DeepFRI’s neural networks

  • Fast searches with MMseqs2 for database alignment

  • Significant speedup of 2-12× compared to standard DeepFRI implementation

Pipeline Stages#

  1. Search proteins similar to query in PDB and supply FoldComp databases with MMseqs2.

  2. Find the best alignment among MMseqs2 hits using PyOpal.

  3. Align target protein contact map to query protein with unknown structure.

  4. Run DeepFRI with the structure if found in the database, otherwise run DeepFRI with sequence only.

Built With#

  • MMseqs2 - Fast sequence search

  • pyOpal - SIMD-accelerated pairwise alignment

  • DeepFRI - Deep learning protein function prediction

  • FoldComp - Protein structure compression

  • ONNX - Neural network inference

Setup#

pip install mdeepfri

Contents#