camlhmp-blast-targets¶
camlhmp-blast-targets is a command that allows users to type their samples using a provided
schema with BLAST algorithms. This command is useful when a schema is looking at full length
genes or proteins.
Usage¶
 Usage: camlhmp-blast-targets [OPTIONS]
 ๐ช camlhmp-blast-targets ๐ช - Classify assemblies using BLAST against individual
 genes or proteins
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ *  --input         -i  TEXT     Input file in FASTA format to classify [required]   โ
โ *  --yaml          -y  TEXT     YAML file documenting the targets and types         โ
โ                                 [required]                                          โ
โ *  --targets       -t  TEXT     Query targets in FASTA format [required]            โ
โ    --outdir        -o  PATH     Directory to write output [default: ./]             โ
โ    --prefix        -p  TEXT     Prefix to use for output files [default: camlhmp]   โ
โ    --min-pident        INTEGER  Minimum percent identity to count a hit             โ
โ                                 [default: 95]                                       โ
โ    --min-coverage      INTEGER  Minimum percent coverage to count a hit             โ
โ                                 [default: 95]                                       โ
โ    --force                      Overwrite existing reports                          โ
โ    --verbose                    Increase the verbosity of output                    โ
โ    --silent                     Only critical errors will be printed                โ
โ    --version                    Print schema and camlhmp version                    โ
โ    --help                       Show this message and exit.                         โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Example Usage¶
To run camlhmp-blast-targets, you will need a FASTA file of your input sequences, a YAML
file with the schema, and a FASTA file with the targets. Below is an example of how to run
camlhmp-blast-targets using available test data.
camlhmp-blast-targets \
    --yaml tests/data/blast/targets/sccmec-partial.yaml \
    --targets tests/data/blast/targets/sccmec-partial.fasta \
    --input tests/data/blast/targets/sccmec-i.fasta
Running camlhmp with following parameters:
    --input tests/data/blast/targets/sccmec-i.fasta
    --yaml tests/data/blast/targets/sccmec-partial.yaml
    --targets tests/data/blast/targets/sccmec-partial.fasta
    --outdir ./
    --prefix camlhmp
    --min-pident 95
    --min-coverage 95
Starting camlhmp for SCCmec Typing...
Running blastn...
Processing hits...
Final Results...
                                     SCCmec Typing
โโโโโโโโโโโณโโโโโโโณโโโโโโโโโโโโณโโโโโโโโโโโโณโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโณโโโโโโโโโโ
โ sample  โ type โ targets   โ schema    โ schema_vโฆ โ camlhmpโฆ โ params    โ comment โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ camlhmp โ I    โ ccrA1,ccโฆ โ sccmec_pโฆ โ 0.0.1     โ 0.3.1    โ min-coveโฆ โ         โ
โโโโโโโโโโโดโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโ
Writing outputs...
Final predicted type written to ./camlhmp.tsv
Results against each type written to ./camlhmp.details.tsv
blastn results written to ./camlhmp.blastn.tsv
Note
The table printed to STDOUT by camlhmp-blast-targets has been purposefully truncated
for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.
Output Files¶
camlhmp-blast-targets will generate three output files:
| File Name | Description | 
|---|---|
| {PREFIX}.tsv | A tab-delimited file with the predicted type | 
| {PREFIX}.blast.tsv | A tab-delimited file of all blast hits | 
| {PREFIX}.details.tsv | A tab-delimited file with details for each type | 
{PREFIX}.tsv¶
The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:
| Column | Description | 
|---|---|
| sample | The sample name as determined by --prefix | 
| type | The predicted type | 
| targets | The targets for the given type that had a hit | 
| schema | The schema used to determine the type | 
| schema_version | The version of the schema used | 
| camlhmp_version | The version of camlhmp used | 
| params | The parameters used for the analysis | 
| comment | A small comment about the result | 
Below is an example of the {PREFIX}.tsv file:
sample  type    targets schema  schema_version  camlhmp_version params  comment
camlhmp I   ccrA1,ccrB1,IS431,IS1272,mecA,mecR1 sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   
{PREFIX}.blast.tsv¶
The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits.
The columns are the standard BLAST output with -outfmt 6.
Here is an example of the {PREFIX}.blast.tsv file:
qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore
ccrA1   AB033763.2  100.000 100 1350    39332   1350    1350    0   0   1   1350    23692   25041   0.0 2494
ccrB1   AB033763.2  100.000 100 1152    39332   1152    1152    0   0   1   1152    25063   26214   0.0 2128
IS1272  AB033763.2  100.000 100 1659    39332   1659    1659    0   0   1   1659    28423   30081   0.0 3064
mecR1   AB033763.2  100.000 100 987 39332   987 987 0   0   1   987 30304   31290   0.0 1823
mecA    AB033763.2  99.950  100 2007    39332   2007    2006    1   0   1   2007    31390   33396   0.0 3701
mecA    AB033763.2  99.950  100 2007    39332   2007    2006    1   0   1   2007    31390   33396   0.0 3701
IS431   AB033763.2  99.873  100 790 39332   790 789 1   0   1   790 35958   36747   0.0 1454
IS431   AB033763.2  100.000 100 792 39332   792 792 0   0   1   792 35957   36748   0.0 1463
{PREFIX}.details.tsv¶
The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file
can be useful for seeing how a sample did against all other types in a schema.
The columns in this file are:
| Column | Description | 
|---|---|
| sample | The sample name as determined by --prefix | 
| type | The predicted type | 
| status | The status of the type ( Trueif passed thresholds,Falseif failed to exceed thresholds) | 
| targets | The targets for the given type that had a match | 
| missing | The targets for the given type that were not found | 
| schema | The schema used to determine the type | 
| schema_version | The version of the schema used | 
| camlhmp_version | The version of camlhmp used | 
| params | The parameters used for the analysis | 
| comment | A small comment about the result | 
Below is an example of the {PREFIX}.details.tsv file:
sample  type    status  targets missing schema  schema_version  camlhmp_version params  comment
camlhmp I   True    ccrA1,ccrB1,IS431,mecA,mecR1,IS1272     sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   
camlhmp II  False   IS431,mecA,mecR1    ccrA2,ccrB2,mecI    sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   
camlhmp III False   IS431,mecA,mecR1    ccrA3,ccrB3,mecI    sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   
camlhmp IV  False   IS431,mecA,mecR1,IS1272 ccrA2,ccrB2 sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   
Example Implementation¶
If you would like to see how camlhmp-blast-targets can be used, please see
sccmec. In sccmec the schema is set up
to directly use camlhmp-blast-targets to classify samples without any extra
logic.
This allows for a simple wrapper like the following:
#!/usr/bin/env bash
sccmec_dir=$(dirname $0)
CAML_YAML="${sccmec_dir}/../data/sccmec.yaml" \
CAML_TARGETS="${sccmec_dir}/../data/sccmec.fasta" \
    camlhmp-blast-targets \
    "${@:1}"
This script will run camlhmp-blast-targets with the sccmec schema and targets.