API Reference#

Metagenomic-DeepFRI: High-performance protein function annotation pipeline.

This module provides a pipeline for annotating protein sequences with Gene Ontology (GO) terms using DeepFRI, a deep learning model for functional protein annotation. It combines structure information from FoldComp databases, sequence-based predictions using DeepFRI’s neural networks, and fast searches with MMseqs2 for database alignment.

Key Features:
  • Structure information from FoldComp databases (AlphaFold, ESMFold, PDB, etc.)

  • Sequence-based predictions using DeepFRI’s neural networks

  • Fast searches with MMseqs2 for database alignment

  • 2-12× speedup compared to standard DeepFRI

mDeepFRI.DEEPFRI_MODES#

Dictionary mapping prediction modes to their descriptions. - “bp”: Gene Ontology Biological Process - “cc”: Gene Ontology Cellular Component - “mf”: Gene Ontology Molecular Function - “ec”: Enzyme Commission numbers

Type:

dict

Example

Basic usage of the pipeline:

from mDeepFRI.pipeline import hierarchical_database_search
from mDeepFRI.mmseqs import QueryFile

query_file = QueryFile(filepath="proteins.fasta")
hierarchical_database_search(
    query_file=query_file,
    output_path="./results",
    databases=["path/to/database"],
    threads=4
)

Pipeline#

Main prediction pipeline for protein function annotation.

mDeepFRI.pipeline.predict_protein_function

Predict protein function using DeepFRI.

Database#

Structure database handling and management.

mDeepFRI.database.Database

Container for storing database file paths and metadata.

Alignment#

Sequence-structure alignment using PyOpal.

mDeepFRI.alignment.AlignmentResult

Container for pairwise protein alignment results and statistics.

mDeepFRI.alignment.align_mmseqs_results

Aligns MMseqs2 search results sequence-wise.

mDeepFRI.alignment.pairwise_against_database

Finds the best alignment of the query against the target.

mDeepFRI.alignment.align_pairwise

Aligns the query against the target and returns the alignment.

MMSeqs#

MMseqs2 database search functionality.

mDeepFRI.mmseqs.QueryFile

Class for handling FASTA files with sequences to query against MMseqs2 database.

Prediction#

Cython-accelerated contact map alignment and prediction.

Utilities#

General utility functions.

mDeepFRI.utils.download_file

Downloads a file from url and saves it to path.

Bio Utilities#

Biological data processing utilities.

mDeepFRI.bio_utils.build_align_contact_map

Retrieve contact map for aligned sequences.

mDeepFRI.bio_utils.calculate_contact_map

Calculate contact map from PDB string.