camlhmp-blast-alleles
¶
camlhmp-blast-alleles
is a command that allows users to type their samples using a provided
schema with BLAST algorithms. This command is useful when the schema is typing specific alleles
of a gene or set of genes (e.g. MLST).
Usage: camlhmp-blast-alleles [OPTIONS]
๐ช camlhmp-blast-alleles ๐ช - Classify assemblies using BLAST against alleles of
a set of genes
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --input -i TEXT Input file in FASTA format to classify โ
โ [required] โ
โ * --yaml -y TEXT YAML file documenting the targets and types โ
โ [required] โ
โ * --targets -t TEXT Query targets in FASTA format [required] โ
โ --outdir -o PATH Directory to write output [default: ./] โ
โ --prefix -p TEXT Prefix to use for output files โ
โ [default: camlhmp] โ
โ --min-pident INTEGER Minimum percent identity to count a hit โ
โ [default: 95] โ
โ --min-coverage INTEGER Minimum percent coverage to count a hit โ
โ [default: 95] โ
โ --force Overwrite existing reports โ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --version Print schema and camlhmp version โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Example Usage¶
To run camlhmp-blast-alleles
, you will need a FASTA file of your input sequences, a YAML
file with the schema, and a FASTA file with the targets. Below is an example of how to run
camlhmp-blast-alleles
using available test data.
camlhmp-blast-alleles \
--yaml tests/data/blast/alleles/spn-pbptype.yaml \
--targets tests/data/blast/alleles/spn-pbptype.fasta \
--input tests/data/blast/alleles/SRR2912551.fna.gz
Running camlhmp with following parameters:
--input tests/data/blast/alleles/SRR2912551.fna.gz
--yaml tests/data/blast/alleles/spn-pbptype.yaml
--targets tests/data/blast/alleles/spn-pbptype.fasta
--outdir ./
--prefix camlhmp
--min-pident 95
--min-coverage 95
Starting camlhmp for S. pneumoniae PBP typing...
Running tblastn...
Processing hits...
Final Results...
S. pneumoniae PBP typing
โโโโโณโโโโณโโโโณโโโโณโโโโณโโโโณโโโโณโโโโณโโโโณโโโโโณโโโโณโโโโโณโโโโณโโโโโณโโโโณโโโโโณโโโโณโโโโโณโโโโณโโโโโ
โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ 1โฆ โ โฆ โ 2โฆ โ โฆ โ 2โฆ โ โฆ โ 2โฆ โ โฆ โ 2โฆ โ โฆ โ 2โฆ โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โฆ โ โ 0 โ 1โฆ โ โฆ โ 5โฆ โ โ 2 โ โฆ โ 1โฆ โ โฆ โ โ
โโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโโดโโโโดโโโโโดโโโโดโโโโโดโโโโดโโโโโดโโโโดโโโโโดโโโโดโโโโโ
Writing outputs...
Final predicted type written to ./camlhmp.tsv
tblastn results written to ./camlhmp.tblastn.tsv
Note
The table printed to STDOUT by camlhmp-blast-alleles
has been purposefully truncated
for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.
Output Files¶
camlhmp-blast-region
will generate three output files:
File Name | Description |
---|---|
{PREFIX}.tsv |
A tab-delimited file with the predicted type |
{PREFIX}.blast.tsv |
A tab-delimited file of all blast hits |
{PREFIX}.tsv¶
The {PREFIX}.tsv
file is a tab-delimited file with the predicted type. The columns are:
Column | Description |
---|---|
sample | The sample name as determined by --prefix |
schema | The schema used to determine the type |
schema_version | The version of the schema used |
camlhmp_version | The version of camlhmp used |
params | The parameters used for the analysis |
{TARGET}_id | The allele ID for a target hit |
{TARGET}_pident | The percent identity of the hit |
{TARGET}_qcovs | The percent coverage of the hit |
{TARGET}_bitscore | The bitscore of the hit |
{TARGET}_comment | A small comment about the hit |
Below is an example of the {PREFIX}.tsv
file:
sample schema schema_version camlhmp_version params 1A_id 1A_pident 1A_qcovs 1A_bitscore 1A_comment 2B_id 2B_pident 2B_qcovs 2B_bitscore 2B_comment 2X_id 2X_pident 2X_qcovs 2X_bitscore 2X_comment
camlhmp pbptype_partial 0.0.1 0.3.1 min-coverage=95;min-pident=95 23 100.0 100 556 0 100.0 100 567 2 100.0 100 741
{PREFIX}.blast.tsv¶
The {PREFIX}.blast.tsv
file is a tab-delimited file of the raw output for all blast hits.
The columns are the standard BLAST output with -outfmt 6
.
Here is an example of the {PREFIX}.blast.tsv
file:
qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore
1A_0 NODE_223_length_8196_cov_21.291849 99.638 100 276 8324 276 275 1 0 1 276 1807 2634 0.0 555
1A_1 NODE_223_length_8196_cov_21.291849 99.638 100 276 8324 276 275 1 0 1 276 1807 2634 0.0 555
1A_2 NODE_223_length_8196_cov_21.291849 99.275 100 276 8324 276 274 2 0 1 276 1807 2634 0.0 554
1A_3 NODE_223_length_8196_cov_21.291849 99.275 100 276 8324 276 274 2 0 1 276 1807 2634 0.0 553
1A_4 NODE_223_length_8196_cov_21.291849 84.420 100 276 8324 276 233 43 0 1 276 1807 2634 3.91e-155 474
1A_23 NODE_223_length_8196_cov_21.291849 100.000 100 276 8324 276 276 0 0 1 276 1807 2634 0.0 556
2B_0 NODE_878_length_2854_cov_17.976875 100.000 100 277 2982 277 277 0 0 1 277 1218 2048 0.0 567
2B_1 NODE_878_length_2854_cov_17.976875 87.365 100 277 2982 277 242 35 0 1 277 1218 2048 3.24e-173 501
2B_2 NODE_878_length_2854_cov_17.976875 99.278 100 277 2982 277 275 2 0 1 277 1218 2048 0.0 563
2B_3 NODE_878_length_2854_cov_17.976875 99.639 100 277 2982 277 276 1 0 1 277 1218 2048 0.0 565
2B_4 NODE_878_length_2854_cov_17.976875 99.639 100 277 2982 277 276 1 0 1 277 1218 2048 0.0 565
2X_0 NODE_210_length_5085_cov_16.539627 99.721 100 358 5213 358 357 1 0 1 358 3172 2099 0.0 740
2X_1 NODE_210_length_5085_cov_16.539627 92.179 100 358 5213 358 330 28 0 1 358 3172 2099 0.0 688
2X_1 NODE_878_length_2854_cov_17.976875 23.797 99 358 2982 395 94 230 17 1 353 915 2012 1.95e-06 45.8
2X_2 NODE_210_length_5085_cov_16.539627 100.000 100 358 5213 358 358 0 0 1 358 3172 2099 0.0 741
2X_3 NODE_210_length_5085_cov_16.539627 99.721 100 358 5213 358 357 1 0 1 358 3172 2099 0.0 739
2X_4 NODE_210_length_5085_cov_16.539627 99.441 100 358 5213 358 356 2 0 1 358 3172 2099 0.0 738
{PREFIX}.details.tsv¶
The {PREFIX}.details.tsv
file is a tab-delimited file with details for each type. This file
can be useful for seeing how a sample did against all other types in a schema.
The columns in this file are:
Column | Description |
---|---|
sample | The sample name as determined by --prefix |
type | The predicted type |
status | The status of the type (True if failed) |
targets | The targets for the given type that had a match |
missing | The targets for the given type that were not found |
coverage | The coverage of the target region |
hits | The number of hits used to calculate coverage of the target region |
schema | The schema used to determine the type |
schema_version | The version of the schema used |
camlhmp_version | The version of camlhmp used |
params | The parameters used for the analysis |
comment | A small comment about the result |
Below is an example of the {PREFIX}.details.tsv
file:
sample type status targets missing coverage hits schema schema_version camlhmp_version params comment
camlhmp O1 False O1 12.49 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits
camlhmp O2 False O2 wzyB 100.00,0.00 1,0 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp O3 False O3 1.43 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95
camlhmp O4 False O4 13.86 2 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95 Coverage based on 2 hits
camlhmp O5 True O2 100.00 1 pseudomonas_serogroup_partial 0.0.1 0.2.1 min-coverage=95;min-pident=95