Examples#

Basic Usage#

Predict protein function with default settings:

mDeepFRI predict-function -i /path/to/protein/sequences \
-d /path/to/foldcomp/database/ \
-w /path/to/deepfri/weights/folder \
-o /output_path > log.txt

MMseqs2 Search#

Programmatic database search using the MMseqs module:

from mDeepFRI.mmseqs import QueryFile

# Create a query file
query = QueryFile("sequences.fasta")
# Load sequences to manipulate on them
query.load_sequences()
# filter sequences under 30 amino acids
query.filter_sequences(min_length = 30)
# search against MMseqs2 database
result = query.search("mmseqs.db", eval=10e-3, mmseqs_sensitivity=4, threads=8)
# save results
result.save("output.tsv")

Hierarchical Database Search#

Search multiple databases hierarchically (e.g., PDB first, then AlphaFold):

mDeepFRI predict-function -i /path/to/protein/sequences \
  -d /path/to/alphafold/database/ -d /path/to/esmfold/database/ \
  -w /path/to/deepfri/weights/folder -o /output_path

Selecting Prediction Modes#

Predict only specific ontology categories:

mDeepFRI predict-function -i /path/to/protein/sequences \
  -d /path/to/foldcomp/database/ -w /path/to/deepfri/weights/folder \
  -o /output_path -p mf -p bp

Skipping Prediction Matrices#

Save disk space by skipping large matrix files:

mDeepFRI predict-function -i /path/to/protein/sequences \
  -d /path/to/foldcomp/database/ -w /path/to/deepfri/weights/folder \
  -o /output_path --skip-matrix

Filtering Results#

Load and filter prediction results in Python:

import pandas as pd

# Load results
df = pd.read_csv("results.tsv", sep="\t")

# Filter high-confidence structure-based predictions
filtered = df[
    (df["score"] >= 0.5) &
    (df["network_type"] == "gcn") &
    (df["aligned"] == True)
]

# Group by protein
for protein_id, group in filtered.groupby("protein"):
    print(f"\n{protein_id}:")
    for _, row in group.iterrows():
        print(f"  {row['go_term']}: {row['go_name']} (score: {row['score']:.3f})")