STing companion scripts
STing includes two companion scripts:
db_util.py
This script provides a set of utilities to download databases from PubMLST and build STing indices from them.
This script provides a set of utilities to download databases from PubMLST and build STing indices from them.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
subcommands:
{list,query,fetch}
list List all the available databases at PubMLST
query Search a database in PubMLST
fetch Fetch a database from PubMLST
To list the available MLST schemes and their last update time, use the list
sub-command:
./scripts/db_util.py list
# # Database #Profiles Retrieved DB_URL
# 1 Achromobacter spp. 476 2019-11-18 https://pubmlst.org/achromobacter
# 2 Acinetobacter baumannii#1 2058 2019-11-18 https://pubmlst.org/abaumannii/
To search the available PubMLST schemes by search term (e.g. 'cholera'), use the query
sub-command:
./scripts/db_util.py query "cholera"
# 1 -b 984 2019-11-18 https://pubmlst.org/vcholerae/
# 2 Vibrio cholerae#2 422 2019-11-18 http://pubmlst.org/vcholerae
Finally, to download and build a PubMLST database, use the fetch
sub-command:
./scripts/db_util.py fetch -q "Vibrio cholerae" -b -o testdb
# Database: "Vibrio cholerae"
# Fetching allele sequences:
# - https://pubmlst.org/data/alleles/vcholerae/adk.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/adk.fa
# - https://pubmlst.org/data/alleles/vcholerae/gyrB.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/gyrB.fa
# - https://pubmlst.org/data/alleles/vcholerae/mdh.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/mdh.fa
# - https://pubmlst.org/data/alleles/vcholerae/metE.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/metE.fa
# - https://pubmlst.org/data/alleles/vcholerae/pntA.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/pntA.fa
# - https://pubmlst.org/data/alleles/vcholerae/purM.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/purM.fa
# - https://pubmlst.org/data/alleles/vcholerae/pyrC.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/pyrC.fa
# Fetching profiles:
# - https://pubmlst.org/data/profiles/vcholerae.txt -> /storage/aroon/sting/databases/vibrio_cholerae/profile.txt
# Building STing index:
# /data/home/achande3/bin/indexer -c /storage/aroon/sting/databases/vibrio_cholerae/config.txt -p /storage/aroon/sting/databases/vibrio_cholerae/db/index
# Loading sequences from sequences files:
# # Seqs. File
# 1 130 /storage/aroon/sting/databases/vibrio_cholerae/adk.fa
# 2 151 /storage/aroon/sting/databases/vibrio_cholerae/gyrB.fa
# 3 164 /storage/aroon/sting/databases/vibrio_cholerae/mdh.fa
# 4 329 /storage/aroon/sting/databases/vibrio_cholerae/metE.fa
# 5 185 /storage/aroon/sting/databases/vibrio_cholerae/pntA.fa
# 6 128 /storage/aroon/sting/databases/vibrio_cholerae/purM.fa
# 7 259 /storage/aroon/sting/databases/vibrio_cholerae/pyrC.fa
# Total loaded sequences: 1346
# Creating and saving ESA index from loaded sequences...
# Index successfuly created!
plot_kmer_depth.R
This script Generates k-mer depth plots from output depth files generated by the typer
and detector
applications.
Requirements:
plot_kmer_depth.R
requires the following R packages:
- argparser
- ggsci
- gridExtra
- RColorBrewer
- stringr
- svglite
- tidyverse
By default, plot_kmer_depth.R
will try to install automatically the required packages to the personal R library directory (usually something like ~/R/x86_64-pc-linux-gnu-library/3.4
).
./scripts/plot_kmer_depth.R
usage: plot_kmer_depth.R [--] [--help] [--opts OPTS] [--gene_file GENE_FILE] [--prefix PREFIX] [--sample_name SAMPLE_NAME] [--max_loci_per_page MAX_LOCI_PER_PAGE] [--format FORMAT] input_file
This script generates k-mer depth plots using a depth file generated by the STing typer tool (-t option).
positional arguments:
input_file Samples file. Text file with a list of sample names (line by line).
flags:
-h, --help show this help message and exit
optional arguments:
-x, --opts OPTS RDS file containing argument values
-g, --gene_file GENE_FILE Path to a text file with a list of genes/loci to be plotted.
-p, --prefix PREFIX Filename prefix for output files. [default: kmer_depth]
-s, --sample_name SAMPLE_NAME Sample name. [default: input file's name]
-m, --max_loci_per_page MAX_LOCI_PER_PAGE Maximum number of loci to print on each page. [default: 7]
-f, --format FORMAT Output file format. Valid options are 'pdf', 'png', and 'svg' [default: pdf]