Bio Utilities#

Functions#

mDeepFRI.bio_utils.build_align_contact_map(alignment: AlignmentResult, threshold: float = 6, generated_contacts: int = 2) Tuple[AlignmentResult, ndarray]#

Retrieve contact map for aligned sequences.

Parameters:
  • alignment (AlignmentResult) – Alignment of query and target sequences.

  • database (str) – Path to FoldComp database. If empty, the structure will be retrieved from the PDB.

  • threshold (float) – Distance threshold for contact map.

  • generated_contacts (int) – Number of generated contacts to add for gapped regions in the query alignment.

Returns:

Tuple[AlignmentResult, np.ndarray] – Tuple of alignment and contact map.

mDeepFRI.bio_utils.calculate_contact_map(coordinates: ndarray, threshold=6.0, distance='sqeuclidean', mode='matrix') ndarray#

Calculate contact map from PDB string.

Parameters:
  • pdb_string (str) – PDB file read into string.

  • threshold (float) – Distance threshold for contact map.

  • mode (str) – Output mode. Either “matrix” or “sparse”.

Returns:

np.ndarray – Contact map.

Description#

The bio_utils module provides biological utilities for structure processing and contact map generation. Values computed from these utilities are critical inputs for structure-based protein function prediction using DeepFRI.

Key Features#

  • Structure Loading: Extract structures from specific database formats like FoldComp or standard PDB/CIF files

  • Coordinate Extraction: Isolate C-alpha coordinates crucial for backbone representation

  • Contact Map Generation: create adjacency matrices representing protein residue interactions

  • Contact Alignment: Map structural contact information onto query sequences guided by alignments

Usage#

This module is primarily used internally by the pipeline to process structural data, but functions can be used independently for structural analysis tasks.

Example#

from mDeepFRI.bio_utils import get_calpha_coordinates, calculate_contact_map

# Assuming 'structure' is a loaded Biotite AtomArray
coords = get_calpha_coordinates(structure)
if coords is not None:
    # Calculate contact map with 6 Angstrom threshold
    cmap = calculate_contact_map(coords, threshold=6.0)