Commands (CLI)#
The mDeepFRI command-line interface provides two main commands:
get-models- Download pre-trained DeepFRI models (v1.0 or v1.1)predict-function- Run the prediction pipeline on protein sequences
Key Features#
Prediction Modes
The GO ontology contains three subontologies, plus EC number prediction:
Molecular Function (mf)
Biological Process (bp)
Cellular Component (cc)
Enzyme Commission numbers (ec)
By default, predictions are made in all 4 categories. Use -p or --processing-modes
to select specific modes.
Hierarchical Database Search
Different databases have different levels of evidence. For example, PDB structures
are experimental and considered highest quality. Use -d or --databases
multiple times to search databases hierarchically.
Performance Options
--skip-matrix- Skip writing large prediction matrix files to save disk space--threads- Parallelize alignment, contact map alignment, and annotationGPU acceleration is automatically used if CUDA is available
Command Reference#
mDeepFRI#
mDeepFRI
Usage
mDeepFRI [OPTIONS] COMMAND [ARGS]...
Options
- --debug, --no-debug#
- --version#
Show the version and exit.
generate-config#
Generate a config file for mDeepFRI. This is used only when the model weights are downloaded manually.
Usage
mDeepFRI generate-config [OPTIONS]
Options
- -w, --weights_path <weights_path>#
Required Path to a folder containing model weights.
- -v, --version <version>#
Required Version of the model.
- Options:
1.0 | 1.1
get-models#
Download model weights for mDeepFRI.
Usage
mDeepFRI get-models [OPTIONS]
Options
- -o, --output <output>#
Required Path to folder where the model weights will be downloaded.
- -v, --version <version>#
Required Version of the model.
- Options:
1.0 | 1.1
make-cmaps#
Compute CA contact maps for all PDB/mmCIF files in a directory.
Usage
mDeepFRI make-cmaps [OPTIONS]
Options
- -i, --input_dir <input_dir>#
Required Directory containing PDB or mmCIF files.
- -o, --output_dir <output_dir>#
Required Directory to save computed contact maps.
- -t, --threshold <threshold>#
Distance threshold in Å for contact map.
- Default:
6.0
predict-function#
Predict protein function from sequence.
Usage
mDeepFRI predict-function [OPTIONS]
Options
- --tmpdir <tmpdir>#
Path to a temporary directory. Required for very large searches.
- --skip-pdb#
Skip PDB100 database search.
- -t, --threads <threads>#
Number of threads to use.
- Default:
1
- --overwrite#
Overwrite existing files.
- --top-k <top_k>#
Number of top MMseqs2 hits to save.
- Default:
5
- --mmseqs-min-coverage <mmseqs_min_coverage>#
Minimum coverage for MMseqs2 alignment for both query and target sequences.
- Default:
0.9
- --mmseqs-min-identity <mmseqs_min_identity>#
Minimum identity for MMseqs2 alignment.
- Default:
0.5
- --mmseqs-max-evalue <mmseqs_max_evalue>#
Maximum e-value for MMseqs2 alignment.
- Default:
0.001
- --mmseqs-min-bitscore <mmseqs_min_bitscore>#
Minimum bitscore for MMseqs2 alignment.
- Default:
0
- --max-length <max_length>#
Maximum length of the protein sequence.
- --min-length <min_length>#
Minimum length of the protein sequence.
- -s, --mmseqs-sensitivity <mmseqs_sensitivity>#
Sensitivity of the MMseqs2 search.
- Default:
5.7
- -d, --db-path <db_path>#
Path to a structures database compessed with FoldComp.
- -o, --output <output>#
Required Path to output file or directory.
- -i, --input <input>#
Required Path to an input protein sequences (FASTA file, may be gzipped).
- -w, --weights <weights>#
Required Path to a folder containing model weights.
- -p, --processing-modes <processing_modes>#
Processing modes. Default is all(biological process, cellular component, enzyme comission, molecular function).
- Options:
bp | cc | ec | mf
- -a, --angstrom-contact-thresh <angstrom_contact_thresh>#
Angstrom contact threshold. Default is 6.
- --generate-contacts <generate_contacts>#
Gap fill threshold during contact map alignment.
- --alignment-gap-open <alignment_gap_open>#
Gap open penalty for contact map alignment.
- --alignment-gap-extend <alignment_gap_extend>#
Gap extend penalty for contact map alignment.
- --remove-intermediate#
Remove intermediate files.
- --save-structures#
Save structures of the top hits.
- --save-cmaps#
Save contact maps of the top hits.
- --skip-matrix#
Skip writing prediction matrix files (saves disk space).
- --scoring-matrix <scoring_matrix>#
Scoring matrix for sequence alignment (e.g., VTML80, BLOSUM62).
- Default:
'VTML80'
search-databases#
Hierarchically search FoldComp databases for similar proteins with MMseqs2. Based on the thresholds from https://doi.org/10.1038/s41586-023-06510-w.
Usage
mDeepFRI search-databases [OPTIONS]
Options
- --tmpdir <tmpdir>#
Path to a temporary directory. Required for very large searches.
- --skip-pdb#
Skip PDB100 database search.
- -t, --threads <threads>#
Number of threads to use.
- Default:
1
- --overwrite#
Overwrite existing files.
- --top-k <top_k>#
Number of top MMseqs2 hits to save.
- Default:
5
- --mmseqs-min-coverage <mmseqs_min_coverage>#
Minimum coverage for MMseqs2 alignment for both query and target sequences.
- Default:
0.9
- --mmseqs-min-identity <mmseqs_min_identity>#
Minimum identity for MMseqs2 alignment.
- Default:
0.5
- --mmseqs-max-evalue <mmseqs_max_evalue>#
Maximum e-value for MMseqs2 alignment.
- Default:
0.001
- --mmseqs-min-bitscore <mmseqs_min_bitscore>#
Minimum bitscore for MMseqs2 alignment.
- Default:
0
- --max-length <max_length>#
Maximum length of the protein sequence.
- --min-length <min_length>#
Minimum length of the protein sequence.
- -s, --mmseqs-sensitivity <mmseqs_sensitivity>#
Sensitivity of the MMseqs2 search.
- Default:
5.7
- -d, --db-path <db_path>#
Path to a structures database compessed with FoldComp.
- -o, --output <output>#
Required Path to output file or directory.
- -i, --input <input>#
Required Path to an input protein sequences (FASTA file, may be gzipped).