Alignment#
Classes#
- class mDeepFRI.alignment.AlignmentResult(query_name: str = '', query_sequence: str = '', target_name: str = '', target_sequence: str = '', alignment: str = '', query_identity: float | None = None, query_coverage: float | None = None, target_coverage: float | None = None, db_name: str | None = None, coords: ndarray | None = None)#
Bases:
objectContainer for pairwise protein alignment results and statistics.
This class stores the results of a protein sequence alignment, including the aligned sequences with gaps, alignment statistics (identity, coverage), and optional structural information (coordinates, contact maps).
- alignment#
Alignment string in CIGAR-like format. ‘M’ = match/mismatch, ‘I’ = insertion, ‘D’ = deletion.
- Type:
- query_identity#
Sequence identity as fraction (0.0-1.0) of matching residues to alignment length.
- Type:
- target_coords#
C-alpha atom coordinates from target structure.
- Type:
np.ndarray, optional
- cmap#
Contact map of target structure.
- Type:
np.ndarray, optional
- aligned_cmap#
Contact map aligned to query sequence.
- Type:
np.ndarray, optional
Example
>>> result = AlignmentResult( ... query_name="protein1", ... query_sequence="MSKGEELFT", ... target_name="1GFL_A", ... target_sequence="MSKGEELFTGV", ... alignment="MMMMMMMMMM", ... query_identity=0.90, ... query_coverage=0.82 ... ) >>> print(result.gapped_sequence) 'MSKGEELFT'
- insert_gaps()#
Inserts gaps into query and target sequences.
- Returns:
AlignmentResult – The object with gapped sequences.
Functions#
- mDeepFRI.alignment.insert_gaps(sequence: str, reference: str, alignment_string: str) Tuple[str, str]#
Inserts gaps into query and target sequences.
- mDeepFRI.alignment.best_hit_database(query, target_sequences, gap_open: int = 10, gap_extend: int = 1, scoring_matrix: str = 'VTML80')#
Find the best hit in the database and return index.
- mDeepFRI.alignment.align_mmseqs_results(best_matches_filepath: str, sequence_db: str, alignment_gap_open: int = 10, alignment_gap_extend: int = 1, threads: int = 1, scoring_matrix: str = 'VTML80')#
Aligns MMseqs2 search results sequence-wise.
Description#
The alignment module provides sequence-structure alignment functionality using PyOpal, a fast SIMD-accelerated pairwise alignment library.
Key Features#
PyOpal Integration: High-performance SIMD-accelerated alignment
Custom Scoring Matrices: Support for BLOSUM and custom matrices
Batch Processing: Efficiently align multiple query-target pairs
Detailed Statistics: Provides identity, coverage, and alignment coordinates
Alignment Workflow#
Database Search: MMseqs2 identifies candidate structure templates
Sequence Alignment: PyOpal performs pairwise alignment
Statistics Calculation: Computes identity, coverage, and quality metrics
Coordinate Mapping: Maps aligned residues to structure coordinates
Example#
from mDeepFRI.alignment import align_pairwise
from mDeepFRI.database import Database
# Initialize database
db = Database("pdb100.mmseqsDB")
# Perform alignment
result = align_pairwise(
query_seq="MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL",
target_seq="MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL",
target_coords=db.get_structure("1abc_A").coords,
scoring_matrix="BLOSUM62"
)
print(f"Identity: {result.query_identity:.1f}%")
print(f"Coverage: {result.query_coverage:.1f}%")
print(f"Aligned coordinates: {len(result.coords)}")
Scoring Matrices#
Supported scoring matrices include:
BLOSUM62 (default): Balanced for diverse sequences
BLOSUM45: More permissive for distant homologs
BLOSUM80: More stringent for close homologs
PAM250: Alternative evolutionary model
Custom scoring matrices can be provided as dictionaries.