Database#
- class mDeepFRI.database.Database(foldcomp_db: Path, sequence_db: Path, mmseqs_db: Path, mmseqs_result: Path | None = None)#
Bases:
objectContainer for storing database file paths and metadata.
This dataclass maintains references to FoldComp structure databases and their corresponding MMseqs2 sequence databases. All paths are automatically converted to Path objects for consistency.
- foldcomp_db#
Path to FoldComp database containing compressed structures.
- Type:
Path
- sequence_db#
Path to extracted FASTA sequence database.
- Type:
Path
- mmseqs_db#
Path to MMseqs2 index database for fast searches.
- Type:
Path
- mmseqs_result#
Path to MMseqs2 search results (if available). Defaults to None.
- Type:
Path, optional
Example
>>> db = Database( ... foldcomp_db=Path("afdb_swissprot"), ... sequence_db=Path("afdb_swissprot.fasta.gz"), ... mmseqs_db=Path("afdb_swissprot.mmseqsDB") ... ) >>> db.name 'afdb_swissprot'
Description#
The Database class handles structure database management for protein function prediction. It supports both FoldComp-compressed structure databases and standard PDB/mmCIF formats.
Key Features#
FoldComp Integration: Efficiently decompress and access protein structures
MMseqs2 Compatibility: Works with MMseqs2 database search results
Structure Caching: Optimizes repeated structure access
Multi-format Support: Handles PDB, mmCIF, and FoldComp formats
Usage#
The Database class is typically used internally by the pipeline to retrieve structure coordinates for alignment and contact map generation.
Example#
from mDeepFRI.database import Database
# Initialize database
db = Database("pdb100.mmseqsDB")
# Access structure by ID
structure = db.get_structure("1abc_A")
coords = structure.get_coordinates()
# Get structure information
chain_info = db.get_chain_info("1abc_A")