camlhmp-blast-regions¶
camlhmp-blast-regions is a command that allows users to search for full regions of interest.
It is nearly identical to camlhmp-blast-targets, but instead of many smaller targets the
idea is to instead look at full regions such as O-antigens and or similar features.
Usage¶
 Usage: camlhmp-blast-regions [OPTIONS]
 ๐ช camlhmp-blast-regions ๐ช - Classify assemblies using BLAST against larger genomic
 regions
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ *  --input         -i  TEXT     Input file in FASTA format to classify [required]   โ
โ *  --yaml          -y  TEXT     YAML file documenting the targets and types         โ
โ                                 [required]                                          โ
โ *  --targets       -t  TEXT     Query targets in FASTA format [required]            โ
โ    --outdir        -o  PATH     Directory to write output [default: ./]             โ
โ    --prefix        -p  TEXT     Prefix to use for output files [default: camlhmp]   โ
โ    --min-pident        INTEGER  Minimum percent identity to count a hit             โ
โ                                 [default: 95]                                       โ
โ    --min-coverage      INTEGER  Minimum percent coverage to count a hit             โ
โ                                 [default: 95]                                       โ
โ    --force                      Overwrite existing reports                          โ
โ    --verbose                    Increase the verbosity of output                    โ
โ    --silent                     Only critical errors will be printed                โ
โ    --version                    Print schema and camlhmp version                    โ
โ    --help                       Show this message and exit.                         โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Example Usage¶
To run camlhmp-blast-regions, you will need a FASTA file of your input sequences, a YAML
file with the schema, and a FASTA file with the targets. Below is an example of how to run
camlhmp-blast-regions using available test data.
camlhmp-blast-regions \
    --yaml tests/data/blast/regions/pseudomonas-serogroup.yaml \
    --targets tests/data/blast/regions/pseudomonas-serogroup.fasta \
    --input tests/data/blast/regions/O1-GCF_000504045.fna.gz
Running camlhmp with following parameters:
    --input tests/data/blast/regions/O1-GCF_000504045.fna.gz
    --yaml tests/data/blast/regions/pseudomonas-serogroup.yaml
    --targets tests/data/blast/regions/pseudomonas-serogroup.fasta
    --outdir ./
    --prefix camlhmp
    --min-pident 95
    --min-coverage 95
Starting camlhmp for Pseudomonas Serogrouping...
Running blastn...
Processing hits...
Final Results...
                               Pseudomonas Serogrouping
โโโโโโโโโโณโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโโโ
โ sample โ type โ targeโฆ โ coverโฆ โ hits โ schema โ schemโฆ โ camlhโฆ โ params โ commeโฆ โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ camlhโฆ โ O1   โ O1     โ 100.00 โ 1    โ pseudโฆ โ 0.0.1  โ 0.3.1  โ min-cโฆ โ        โ
โโโโโโโโโโดโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโ
Writing outputs...
Final predicted type written to ./camlhmp.tsv
Results against each type written to ./camlhmp.details.tsv
blastn results written to ./camlhmp.blastn.tsv
Note
The table printed to STDOUT by camlhmp-blast-regions has been purposefully truncated
for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.
Output Files¶
camlhmp-blast-region will generate three output files:
| File Name | Description | 
|---|---|
| {PREFIX}.tsv | A tab-delimited file with the predicted type | 
| {PREFIX}.blast.tsv | A tab-delimited file of all blast hits | 
| {PREFIX}.details.tsv | A tab-delimited file with details for each type | 
{PREFIX}.tsv¶
The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:
| Column | Description | 
|---|---|
| sample | The sample name as determined by --prefix | 
| type | The predicted type | 
| targets | The targets for the given type that had a hit | 
| coverage | The coverage of the target region | 
| hits | The number of hits used to calculate coverage of the target region | 
| schema | The schema used to determine the type | 
| schema_version | The version of the schema used | 
| camlhmp_version | The version of camlhmp used | 
| params | The parameters used for the analysis | 
| comment | A small comment about the result | 
Below is an example of the {PREFIX}.tsv file:
sample  type    targets coverage    hits    schema  schema_version  camlhmp_version params  comment
camlhmp O5  O2  100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   
{PREFIX}.blast.tsv¶
The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits.
The columns are the standard BLAST output with -outfmt 6.
Here is an example of the {PREFIX}.blast.tsv file:
qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore
wzyB    NZ_PSQS01000003.1   88.403  99  1140    6935329 595 526 69  0   545 1139    6874509 6875103 0.0 717
wzyB    NZ_PSQS01000003.1   88.403  99  1140    6935329 595 526 69  0   545 1139    6920911 6921505 0.0 717
wzyB    NZ_PSQS01000003.1   89.444  99  1140    6935329 540 483 56  1   1   539 6872864 6873403 0.0 680
wzyB    NZ_PSQS01000003.1   89.444  99  1140    6935329 540 483 56  1   1   539 6919266 6919805 0.0 680
O1  NZ_PSQS01000003.1   97.972  12  18368   6935329 1972    1932    38  2   16398   18368   6620589 6618619 0.0 3419
O1  NZ_PSQS01000003.1   96.296  12  18368   6935329 324 312 11  1   1   323 6641914 6641591 1.68e-149   531
O2  NZ_PSQS01000003.1   99.841  100 23303   6935329 23303   23266   30  1   1   23303   6618619 6641914 0.0 42821
O2  NZ_PSQS01000003.1   86.935  100 23303   6935329 1240    1078    130 12  2542    3749    3864567 3863328 0.0 1363
O3  NZ_PSQS01000003.1   94.442  13  20210   6935329 2393    2260    114 15  1   2386    6618619 6620999 0.0 3664
O3  NZ_PSQS01000003.1   99.308  13  20210   6935329 289 287 2   0   19922   20210   6641626 6641914 3.09e-147   523
O4  NZ_PSQS01000003.1   97.448  14  15279   6935329 1842    1795    47  0   1   1842    6618619 6620460 0.0 3142
O4  NZ_PSQS01000003.1   99.638  14  15279   6935329 276 275 1   0   15004   15279   6641639 6641914 8.46e-142   505
{PREFIX}.details.tsv¶
The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file
can be useful for seeing how a sample did against all other types in a schema.
The columns in this file are:
| Column | Description | 
|---|---|
| sample | The sample name as determined by --prefix | 
| type | The predicted type | 
| status | The status of the type ( Trueif passed thresholds,Falseif failed to exceed thresholds) | 
| targets | The targets for the given type that had a match | 
| missing | The targets for the given type that were not found | 
| coverage | The coverage of the target region | 
| hits | The number of hits used to calculate coverage of the target region | 
| schema | The schema used to determine the type | 
| schema_version | The version of the schema used | 
| camlhmp_version | The version of camlhmp used | 
| params | The parameters used for the analysis | 
| comment | A small comment about the result | 
Below is an example of the {PREFIX}.details.tsv file:
sample  type    status  targets missing coverage    hits    schema  schema_version  camlhmp_version params  comment
camlhmp O1  False       O1  12.49   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits
camlhmp O2  False   O2  wzyB    100.00,0.00 1,0 pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   
camlhmp O3  False       O3  1.43    1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   
camlhmp O4  False       O4  13.86   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits
camlhmp O5  True    O2      100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   
Example Implementation¶
If you would like to see how camlhmp-blast-regions can be used, please see
pasty. In pasty the schema is set up
to directly use camlhmp-blast-regions to classify samples without any extra
logic.
This allows for a simple wrapper like the following:
#!/usr/bin/env bash
pasty_dir=$(dirname $0)
CAML_YAML="${pasty_dir}/../data/pa-osa.yaml" \
CAML_TARGETS="${pasty_dir}/../data/pa-osa.fasta" \
    camlhmp-blast-regions \
    "${@:1}"
This script will run camlhmp-blast-regions with the pasty schema and targets.