camlhmp-blast-thresholds¶
camlhmp-blast-thresholds is a command that allows users to determine the specificity
percent identity and coverage thresholds when using BLAST+. This command will start at
100 percent identity and coverage and work its way down until a reference sequence can
no longer be distinguished from other reference sequences.
Usage¶
 Usage: camlhmp-blast-thresholds [OPTIONS]
 ๐ช camlhmp-blast-thresholds ๐ช - Determine the specificity thresholds for a set of
 reference sequences
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ *  --input         -i  TEXT                           Input file in FASTA format of  โ
โ                                                       reference sequences            โ
โ                                                       [required]                     โ
โ *  --blast         -b  [blastn|blastp|blastx|tblastn  The blast algorithm to use     โ
โ                        |tblastx]                      [required]                     โ
โ    --outdir        -o  PATH                           Directory to write output      โ
โ                                                       [default:                      โ
โ                                                       ./camlhmp-blast-thresholds]    โ
โ    --prefix        -p  TEXT                           Prefix to use for output files โ
โ                                                       [default: camlhmp]             โ
โ    --min-pident        INTEGER                        Minimum percent identity to    โ
โ                                                       test                           โ
โ                                                       [default: 70]                  โ
โ    --min-coverage      INTEGER                        Minimum percent coverage to    โ
โ                                                       test                           โ
โ                                                       [default: 70]                  โ
โ    --increment         INTEGER                        The value to increment the     โ
โ                                                       thresholds by                  โ
โ                                                       [default: 1]                   โ
โ    --force                                            Overwrite existing reports     โ
โ    --verbose                                          Increase the verbosity of      โ
โ                                                       output                         โ
โ    --silent                                           Only critical errors will be   โ
โ                                                       printed                        โ
โ    --version                                          Print schema and camlhmp       โ
โ                                                       version                        โ
โ    --help                                             Show this message and exit.    โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Example Usage¶
To run camlhmp-blast-thresholds, you will need a FASTA file of your reference sequences.
Below is an example of how to run camlhmp-blast-thresholds using available test data.
camlhmp-blast-thresholds \
    --input tests/data/blast/targets/sccmec-partial.fasta \
    --blast blastn
Running camlhmp-blast-thresholds with following parameters:
    --input tests/data/blast/targets/sccmec-partial.fasta
    --blast blastn
    --outdir ./camlhmp-blast-thresholds
    --prefix camlhmp
    --min-pident 70
    --min-coverage 70
Gathering seqeuences from tests/data/blast/targets/sccmec-partial.fasta...
Writing reference seqeuences to ./camlhmp-blast-thresholds/reference_seqs...
Detecting failure for ccrA1
Detected failure for ccrA1 with pident=75 and coverage=100 - ['ccrA1', 'ccrA2']
Detecting failure for ccrA2
Detected failure for ccrA2 with pident=75 and coverage=100 - ['ccrA1', 'ccrA2']
Detecting failure for ccrA3
Detected failure for ccrA3 with pident=75 and coverage=95 - ['ccrA1', 'ccrA3']
Detecting failure for ccrB1
Detecting failure for ccrB2
Detecting failure for ccrB3
Detecting failure for IS1272
Detecting failure for mecI
Detecting failure for mecR1
Detecting failure for mecA
Detecting failure for IS431
Writing results to ./camlhmp-blast-thresholds/camlhmp.tsv...
Final Results...
                                                  Thresholds Detection
โโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ reference โ pident_failure โ coverage_failure โ hits        โ comment                                                โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ IS1272    โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โ IS431     โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โ ccrA1     โ 75             โ 100              โ ccrA1,ccrA2 โ Suspected overlap or containment with another target:  โ
โ ccrA2     โ 75             โ 100              โ ccrA1,ccrA2 โ Suspected overlap or containment with another target:  โ
โ ccrA3     โ 75             โ 95               โ ccrA1,ccrA3 โ                                                        โ
โ ccrB1     โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โ ccrB2     โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โ ccrB3     โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โ mecA      โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โ mecI      โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โ mecR1     โ -              โ -                โ -           โ no detection failures for pident>=70 and coverage>=70  โ
โโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Suggested thresholds for specificity: pident>75 and coverage>95
**NOTE** these are suggestions for a starting point
Note
The final results will suggest a starting point for the specificity thresholds. You should still validate these thresholds with your own data. You may also notice that shorter sequences are more susceptible to percent identity and longer sequences are more susceptible to coverage.
Output Files¶
camlhmp-blast-thresholds will generate an output directory called camlhmp-blast-thresholds
by default and in there will be the individual reference sequences and a final table of 
suggested cutoffs.
| File Name | Description | 
|---|---|
| {PREFIX}.tsv | A tab-delimited file the threshold failures (if observed) of each reference sequence | 
{PREFIX}.tsv¶
The {PREFIX}.tsv file is a tab-delimited file the threshold failures (if observed) of each
reference sequence . The columns are:
| Column | Description | 
|---|---|
| reference | The reference sequence name | 
| pident_failure | The point at which percent identity no longer differentiated from other sequences | 
| coverage_failure | The point at which coverage identity no longer differentiated from other sequences | 
| hits | The other reference sequences that also had a hit | 
| comment | A small comment about the result | 
Below is an example of the {PREFIX}.tsv file:
reference   pident_failure  coverage_failure    hits    comment
IS1272  -   -   -   no detection failures for pident>=70 and coverage>=70
IS431   -   -   -   no detection failures for pident>=70 and coverage>=70
ccrA1   75  100 ccrA1,ccrA2 Suspected overlap or containment with another target: 
ccrA2   75  100 ccrA1,ccrA2 Suspected overlap or containment with another target: 
ccrA3   75  95  ccrA1,ccrA3 
ccrB1   -   -   -   no detection failures for pident>=70 and coverage>=70
ccrB2   -   -   -   no detection failures for pident>=70 and coverage>=70
ccrB3   -   -   -   no detection failures for pident>=70 and coverage>=70
mecA    -   -   -   no detection failures for pident>=70 and coverage>=70
mecI    -   -   -   no detection failures for pident>=70 and coverage>=70
mecR1   -   -   -   no detection failures for pident>=70 and coverage>=70