Database#

class mDeepFRI.database.Database(foldcomp_db: Path, sequence_db: Path, mmseqs_db: Path, mmseqs_result: Path | None = None)#

Bases: object

Container for storing database file paths and metadata.

This dataclass maintains references to FoldComp structure databases and their corresponding MMseqs2 sequence databases. All paths are automatically converted to Path objects for consistency.

foldcomp_db#

Path to FoldComp database containing compressed structures.

Type:

Path

sequence_db#

Path to extracted FASTA sequence database.

Type:

Path

mmseqs_db#

Path to MMseqs2 index database for fast searches.

Type:

Path

mmseqs_result#

Path to MMseqs2 search results (if available). Defaults to None.

Type:

Path, optional

name#

Database identifier derived from sequence_db stem.

Type:

str

Example

>>> db = Database(
...     foldcomp_db=Path("afdb_swissprot"),
...     sequence_db=Path("afdb_swissprot.fasta.gz"),
...     mmseqs_db=Path("afdb_swissprot.mmseqsDB")
... )
>>> db.name
'afdb_swissprot'
foldcomp_db: Path#
mmseqs_db: Path#
mmseqs_result: Path | None = None#
sequence_db: Path#

Description#

The Database class handles structure database management for protein function prediction. It supports both FoldComp-compressed structure databases and standard PDB/mmCIF formats.

Key Features#

  • FoldComp Integration: Efficiently decompress and access protein structures

  • MMseqs2 Compatibility: Works with MMseqs2 database search results

  • Structure Caching: Optimizes repeated structure access

  • Multi-format Support: Handles PDB, mmCIF, and FoldComp formats

Usage#

The Database class is typically used internally by the pipeline to retrieve structure coordinates for alignment and contact map generation.

Example#

from mDeepFRI.database import Database

# Initialize database
db = Database("pdb100.mmseqsDB")

# Access structure by ID
structure = db.get_structure("1abc_A")
coords = structure.get_coordinates()

# Get structure information
chain_info = db.get_chain_info("1abc_A")