API Reference#
Metagenomic-DeepFRI: High-performance protein function annotation pipeline.
This module provides a pipeline for annotating protein sequences with Gene Ontology (GO) terms using DeepFRI, a deep learning model for functional protein annotation. It combines structure information from FoldComp databases, sequence-based predictions using DeepFRI’s neural networks, and fast searches with MMseqs2 for database alignment.
- Key Features:
Structure information from FoldComp databases (AlphaFold, ESMFold, PDB, etc.)
Sequence-based predictions using DeepFRI’s neural networks
Fast searches with MMseqs2 for database alignment
2-12× speedup compared to standard DeepFRI
- mDeepFRI.DEEPFRI_MODES#
Dictionary mapping prediction modes to their descriptions. - “bp”: Gene Ontology Biological Process - “cc”: Gene Ontology Cellular Component - “mf”: Gene Ontology Molecular Function - “ec”: Enzyme Commission numbers
- Type:
Example
Basic usage of the pipeline:
from mDeepFRI.pipeline import hierarchical_database_search
from mDeepFRI.mmseqs import QueryFile
query_file = QueryFile(filepath="proteins.fasta")
hierarchical_database_search(
query_file=query_file,
output_path="./results",
databases=["path/to/database"],
threads=4
)
References
DeepFRI: flatironinstitute/DeepFRI
FoldComp: steineggerlab/foldcomp
MMseqs2: soedinglab/MMseqs2
Pipeline#
Main prediction pipeline for protein function annotation.
Predict protein function using DeepFRI. |
Database#
Structure database handling and management.
Container for storing database file paths and metadata. |
Alignment#
Sequence-structure alignment using PyOpal.
Container for pairwise protein alignment results and statistics. |
|
Aligns MMseqs2 search results sequence-wise. |
|
Finds the best alignment of the query against the target. |
|
Aligns the query against the target and returns the alignment. |
MMSeqs#
MMseqs2 database search functionality.
Class for handling FASTA files with sequences to query against MMseqs2 database. |
Prediction#
Cython-accelerated contact map alignment and prediction.
Utilities#
General utility functions.
Downloads a file from url and saves it to path. |
Bio Utilities#
Biological data processing utilities.
Retrieve contact map for aligned sequences. |
|
Calculate contact map from PDB string. |