Examples
========

Basic Usage
-----------

Predict protein function with default settings:

.. code-block:: bash

    mDeepFRI predict-function -i /path/to/protein/sequences \
    -d /path/to/foldcomp/database/ \
    -w /path/to/deepfri/weights/folder \
    -o /output_path > log.txt

MMseqs2 Search
--------------

Programmatic database search using the MMseqs module:

.. code-block:: python

    from mDeepFRI.mmseqs import QueryFile

    # Create a query file
    query = QueryFile("sequences.fasta")
    # Load sequences to manipulate on them
    query.load_sequences()
    # filter sequences under 30 amino acids
    query.filter_sequences(min_length = 30)
    # search against MMseqs2 database
    result = query.search("mmseqs.db", eval=10e-3, mmseqs_sensitivity=4, threads=8)
    # save results
    result.save("output.tsv")

Hierarchical Database Search
-----------------------------

Search multiple databases hierarchically (e.g., PDB first, then AlphaFold):

.. code-block:: bash

    mDeepFRI predict-function -i /path/to/protein/sequences \
      -d /path/to/alphafold/database/ -d /path/to/esmfold/database/ \
      -w /path/to/deepfri/weights/folder -o /output_path

Selecting Prediction Modes
---------------------------

Predict only specific ontology categories:

.. code-block:: bash

    mDeepFRI predict-function -i /path/to/protein/sequences \
      -d /path/to/foldcomp/database/ -w /path/to/deepfri/weights/folder \
      -o /output_path -p mf -p bp

Skipping Prediction Matrices
-----------------------------

Save disk space by skipping large matrix files:

.. code-block:: bash

    mDeepFRI predict-function -i /path/to/protein/sequences \
      -d /path/to/foldcomp/database/ -w /path/to/deepfri/weights/folder \
      -o /output_path --skip-matrix

Filtering Results
-----------------

Load and filter prediction results in Python:

.. code-block:: python

    import pandas as pd

    # Load results
    df = pd.read_csv("results.tsv", sep="\t")

    # Filter high-confidence structure-based predictions
    filtered = df[
        (df["score"] >= 0.5) &
        (df["network_type"] == "gcn") &
        (df["aligned"] == True)
    ]

    # Group by protein
    for protein_id, group in filtered.groupby("protein"):
        print(f"\n{protein_id}:")
        for _, row in group.iterrows():
            print(f"  {row['go_term']}: {row['go_name']} (score: {row['score']:.3f})")