camlhmp-blast-regions
¶
camlhmp-blast-regions
is a command that allows users to search for full regions of interest.
It is nearly identical to camlhmp-blast-targets
, but instead of many smaller targets the
idea is to instead look at full regions such as O-antigens and or similar features.
Usage¶
Usage: camlhmp-blast-regions [OPTIONS]
๐ช camlhmp-blast-regions ๐ช - Classify assemblies using BLAST against larger genomic
regions
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --input -i TEXT Input file in FASTA format to classify [required] โ
โ * --yaml -y TEXT YAML file documenting the targets and types โ
โ [required] โ
โ * --targets -t TEXT Query targets in FASTA format [required] โ
โ --outdir -o PATH Directory to write output [default: ./] โ
โ --prefix -p TEXT Prefix to use for output files [default: camlhmp] โ
โ --min-pident INTEGER Minimum percent identity to count a hit โ
โ [default: 95] โ
โ --min-coverage INTEGER Minimum percent coverage to count a hit โ
โ [default: 95] โ
โ --force Overwrite existing reports โ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --version Print schema and camlhmp version โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Example Usage¶
To run camlhmp-blast-regions
, you will need a FASTA file of your input sequences, a YAML
file with the schema, and a FASTA file with the targets. Below is an example of how to run
camlhmp-blast-regions
using available test data.
camlhmp-blast-regions \
--yaml tests/data/blast/regions/pseudomonas-serogroup.yaml \
--targets tests/data/blast/regions/pseudomonas-serogroup.fasta \
--input tests/data/blast/regions/O1-GCF_000504045.fna.gz
Running camlhmp with following parameters:
--input tests/data/blast/regions/O1-GCF_000504045.fna.gz
--yaml tests/data/blast/regions/pseudomonas-serogroup.yaml
--targets tests/data/blast/regions/pseudomonas-serogroup.fasta
--outdir ./
--prefix camlhmp
--min-pident 95
--min-coverage 95
Starting camlhmp for Pseudomonas Serogrouping...
Running blastn...
Processing hits...
Final Results...
Pseudomonas Serogrouping
โโโโโโโโโโณโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโโโณโโโโโโโโโ
โ sample โ type โ targeโฆ โ coverโฆ โ hits โ schema โ schemโฆ โ camlhโฆ โ params โ commeโฆ โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ camlhโฆ โ O1 โ O1 โ 100.00 โ 1 โ pseudโฆ โ 0.0.1 โ 0.3.1 โ min-cโฆ โ โ
โโโโโโโโโโดโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโ
Writing outputs...
Final predicted type written to ./camlhmp.tsv
Results against each type written to ./camlhmp.details.tsv
blastn results written to ./camlhmp.blastn.tsv
Note
The table printed to STDOUT by camlhmp-blast-regions
has been purposefully truncated
for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.
Output Files¶
camlhmp-blast-region
will generate three output files:
File Name | Description |
---|---|
{PREFIX}.tsv |
A tab-delimited file with the predicted type |
{PREFIX}.blast.tsv |
A tab-delimited file of all blast hits |
{PREFIX}.details.tsv |
A tab-delimited file with details for each type |
{PREFIX}.tsv¶
The {PREFIX}.tsv
file is a tab-delimited file with the predicted type. The columns are:
Column | Description |
---|---|
sample | The sample name as determined by --prefix |
type | The predicted type |
targets | The targets for the given type that had a hit |
coverage | The coverage of the target region |
hits | The number of hits used to calculate coverage of the target region |
schema | The schema used to determine the type |
schema_version | The version of the schema used |
camlhmp_version | The version of camlhmp used |
params | The parameters used for the analysis |
comment | A small comment about the result |
Below is an example of the {PREFIX}.tsv
file:
sample type targets coverage hits schema schema_version camlhmp_version params comment
camlhmp O5 O2 100.00 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
{PREFIX}.blast.tsv¶
The {PREFIX}.blast.tsv
file is a tab-delimited file of the raw output for all blast hits.
The columns are the standard BLAST output with -outfmt 6
.
Here is an example of the {PREFIX}.blast.tsv
file:
qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore
wzyB NZ_PSQS01000003.1 88.403 99 1140 6935329 595 526 69 0 545 1139 6874509 6875103 0.0 717
wzyB NZ_PSQS01000003.1 88.403 99 1140 6935329 595 526 69 0 545 1139 6920911 6921505 0.0 717
wzyB NZ_PSQS01000003.1 89.444 99 1140 6935329 540 483 56 1 1 539 6872864 6873403 0.0 680
wzyB NZ_PSQS01000003.1 89.444 99 1140 6935329 540 483 56 1 1 539 6919266 6919805 0.0 680
O1 NZ_PSQS01000003.1 97.972 12 18368 6935329 1972 1932 38 2 16398 18368 6620589 6618619 0.0 3419
O1 NZ_PSQS01000003.1 96.296 12 18368 6935329 324 312 11 1 1 323 6641914 6641591 1.68e-149 531
O2 NZ_PSQS01000003.1 99.841 100 23303 6935329 23303 23266 30 1 1 23303 6618619 6641914 0.0 42821
O2 NZ_PSQS01000003.1 86.935 100 23303 6935329 1240 1078 130 12 2542 3749 3864567 3863328 0.0 1363
O3 NZ_PSQS01000003.1 94.442 13 20210 6935329 2393 2260 114 15 1 2386 6618619 6620999 0.0 3664
O3 NZ_PSQS01000003.1 99.308 13 20210 6935329 289 287 2 0 19922 20210 6641626 6641914 3.09e-147 523
O4 NZ_PSQS01000003.1 97.448 14 15279 6935329 1842 1795 47 0 1 1842 6618619 6620460 0.0 3142
O4 NZ_PSQS01000003.1 99.638 14 15279 6935329 276 275 1 0 15004 15279 6641639 6641914 8.46e-142 505
{PREFIX}.details.tsv¶
The {PREFIX}.details.tsv
file is a tab-delimited file with details for each type. This file
can be useful for seeing how a sample did against all other types in a schema.
The columns in this file are:
Column | Description |
---|---|
sample | The sample name as determined by --prefix |
type | The predicted type |
status | The status of the type (True if failed) |
targets | The targets for the given type that had a match |
missing | The targets for the given type that were not found |
coverage | The coverage of the target region |
hits | The number of hits used to calculate coverage of the target region |
schema | The schema used to determine the type |
schema_version | The version of the schema used |
camlhmp_version | The version of camlhmp used |
params | The parameters used for the analysis |
comment | A small comment about the result |
Below is an example of the {PREFIX}.details.tsv
file:
sample type status targets missing coverage hits schema schema_version camlhmp_version params comment
camlhmp O1 False O1 12.49 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits
camlhmp O2 False O2 wzyB 100.00,0.00 1,0 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp O3 False O3 1.43 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp O4 False O4 13.86 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits
camlhmp O5 True O2 100.00 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
Example Implementation¶
If you would like to see how camlhmp-blast-regions
can be used, please see
pasty. In pasty
the schema is set up
to directly use camlhmp-blast-regions
to classify samples without any extra
logic.
This allows for a simple wrapper like the following:
#!/usr/bin/env bash
pasty_dir=$(dirname $0)
CAML_YAML="${pasty_dir}/../data/pa-osa.yaml" \
CAML_TARGETS="${pasty_dir}/../data/pa-osa.fasta" \
camlhmp-blast-regions \
"${@:1}"
This script will run camlhmp-blast-regions
with the pasty
schema and targets.