STing companion scripts

STing includes two companion scripts:

db_util.py

This script provides a set of utilities to download databases from PubMLST and build STing indices from them.


This script provides a set of utilities to download databases from PubMLST and build STing indices from them.

optional arguments:
  -h, --help          show this help message and exit
  --version           show program's version number and exit

subcommands:
  {list,query,fetch}
    list              List all the available databases at PubMLST
    query             Search a database in PubMLST
    fetch             Fetch a database from PubMLST

To list the available MLST schemes and their last update time, use the list sub-command:

./scripts/db_util.py list
# #       Database        #Profiles       Retrieved       DB_URL
# 1       Achromobacter spp.      476     2019-11-18      https://pubmlst.org/achromobacter
# 2       Acinetobacter baumannii#1       2058    2019-11-18      https://pubmlst.org/abaumannii/

To search the available PubMLST schemes by search term (e.g. 'cholera'), use the query sub-command:

./scripts/db_util.py query "cholera"
# 1        -b 984     2019-11-18      https://pubmlst.org/vcholerae/
# 2       Vibrio cholerae#2       422     2019-11-18      http://pubmlst.org/vcholerae

Finally, to download and build a PubMLST database, use the fetch sub-command:

./scripts/db_util.py fetch -q "Vibrio cholerae" -b -o testdb
# Database: "Vibrio cholerae"
#  Fetching allele sequences:
#  - https://pubmlst.org/data/alleles/vcholerae/adk.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/adk.fa
#  - https://pubmlst.org/data/alleles/vcholerae/gyrB.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/gyrB.fa
#  - https://pubmlst.org/data/alleles/vcholerae/mdh.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/mdh.fa
#  - https://pubmlst.org/data/alleles/vcholerae/metE.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/metE.fa
#  - https://pubmlst.org/data/alleles/vcholerae/pntA.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/pntA.fa
#  - https://pubmlst.org/data/alleles/vcholerae/purM.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/purM.fa
#  - https://pubmlst.org/data/alleles/vcholerae/pyrC.tfa -> /storage/aroon/sting/databases/vibrio_cholerae/pyrC.fa
#  Fetching profiles:
#  - https://pubmlst.org/data/profiles/vcholerae.txt -> /storage/aroon/sting/databases/vibrio_cholerae/profile.txt
# Building STing index:
# /data/home/achande3/bin/indexer -c /storage/aroon/sting/databases/vibrio_cholerae/config.txt -p /storage/aroon/sting/databases/vibrio_cholerae/db/index
# Loading sequences from sequences files:

# #       Seqs.   File
# 1       130     /storage/aroon/sting/databases/vibrio_cholerae/adk.fa
# 2       151     /storage/aroon/sting/databases/vibrio_cholerae/gyrB.fa
# 3       164     /storage/aroon/sting/databases/vibrio_cholerae/mdh.fa
# 4       329     /storage/aroon/sting/databases/vibrio_cholerae/metE.fa
# 5       185     /storage/aroon/sting/databases/vibrio_cholerae/pntA.fa
# 6       128     /storage/aroon/sting/databases/vibrio_cholerae/purM.fa
# 7       259     /storage/aroon/sting/databases/vibrio_cholerae/pyrC.fa

# Total loaded sequences: 1346

# Creating and saving ESA index from loaded sequences...
# Index successfuly created!

plot_kmer_depth.R

This script Generates k-mer depth plots from output depth files generated by the typer and detector applications.

Requirements:

plot_kmer_depth.R requires the following R packages:

  • argparser
  • ggsci
  • gridExtra
  • RColorBrewer
  • stringr
  • svglite
  • tidyverse

By default, plot_kmer_depth.R will try to install automatically the required packages to the personal R library directory (usually something like ~/R/x86_64-pc-linux-gnu-library/3.4).

 ./scripts/plot_kmer_depth.R
usage: plot_kmer_depth.R [--] [--help] [--opts OPTS] [--gene_file GENE_FILE] [--prefix PREFIX] [--sample_name SAMPLE_NAME] [--max_loci_per_page MAX_LOCI_PER_PAGE] [--format FORMAT] input_file

This script generates k-mer depth plots using a depth file generated by the STing typer tool (-t option).

positional arguments:
  input_file                    Samples file. Text file with a list of sample names (line by line).

flags:
  -h, --help                    show this help message and exit

optional arguments:
  -x, --opts OPTS                       RDS file containing argument values
  -g, --gene_file GENE_FILE                     Path to a text file with a list of genes/loci to be plotted.
  -p, --prefix PREFIX                   Filename prefix for output files. [default: kmer_depth]
  -s, --sample_name SAMPLE_NAME                 Sample name. [default: input file's name]
  -m, --max_loci_per_page MAX_LOCI_PER_PAGE                     Maximum number of loci to print on each page. [default: 7]
  -f, --format FORMAT                   Output file format. Valid options are 'pdf', 'png', and 'svg' [default: pdf]