CLI
TileDB-VCF -- Efficient variant-call data storage and retrieval.
This command-line utility provides an interface to create, store and
efficiently retrieve variant-call data in the TileDB storage format.
More information: TileDB-VCF <https://tiledb-inc.github.io/TileDB-VCF>
Usage: tiledbvcf [OPTIONS] SUBCOMMAND
Options:
-h,--help Print this help message and exit
-v,--version Print the version information and exit
Subcommands:
create Creates an empty TileDB-VCF dataset
store Ingests samples into a TileDB-VCF dataset
delete Delete samples from a TileDB-VCF dataset
export Exports data from a TileDB-VCF dataset
list Lists all sample names present in a TileDB-VCF dataset
stat Prints high-level statistics about a TileDB-VCF dataset
utils Utils for working with a TileDB-VCF dataset
version Print the version information and exit
Create
Creates an empty TileDB-VCF dataset
Usage: tiledbvcf create [OPTIONS]
Options:
-u,--uri TEXT REQUIRED TileDB-VCF dataset URI
-a,--attributes TEXT=[] ... Excludes: --vcf-attributes
INFO and/or FORMAT field names (comma-delimited) to store as separate attributes.
Names should be 'fmt_X' or 'info_X' for a field name 'X' (case sensitive).
-v,--vcf-attributes TEXT Excludes: --attributes
Create separate attributes for all INFO and FORMAT fields in the provided VCF file.
-g,--anchor-gap UINT=1000 Anchor gap size to use
-n,--no-duplicates Allow records with duplicate start positions to be written to the array.
--compress-sample-dim,--no-compress-sample-dim{false}
Enable/disable compression of the sample dimension. Enabled by default.
Ingestion task options:
--enable-allele-count,--disable-allele-count{false}
Enable/disable allele count array creation. Enabled by default.
--enable-variant-stats,--disable-variant-stats{false}
Enable/disable variant stats array creation. Enabled by default.
TileDB options:
-c,--tile-capacity UINT=10000 Tile capacity to use for the array schema
--tiledb-config TEXT=[] ... CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
configuration parameter settings.
--checksum ENUM:value in {md5->md5,none->none,sha256->sha256} OR {md5,none,sha256}=sha256
Checksum to use for dataset validation on read and writes.
Debug options:
--log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
Log message level
--log-file TEXT Log message output file
Store
Ingests samples into a TileDB-VCF dataset
Usage: tiledbvcf store [OPTIONS] [paths...]
Positionals:
paths TEXT=[] ... Excludes: --samples-file
VCF URIs to ingest
Options:
-u,--uri TEXT REQUIRED TileDB-VCF dataset URI
-t,--threads UINT=20 Number of threads
-m,--total-memory-budget-mb UINT:UINT in [512 - 64103]=48077
The total memory budget for ingestion (MiB)
-M,--total-memory-percentage FLOAT:FLOAT in [0 - 1]=0
Percentage of total system memory used for ingestion (overrides '--total-memory-budget-mb')
--resume Resume incomplete ingestion of sample batch
Sample options:
-e,--sample-batch-size UINT=10 Number of samples per batch for ingestion
-f,--samples-file TEXT Excludes: paths
File with 1 VCF path to be ingested per line. The format can also include an explicit
index path on each line, in the format '<vcf-uri><TAB><index-uri>'
--remove-sample-file Needs: --samples-file
If specified, the samples file ('-f' argument) is deleted after successful ingestion
-d,--scratch-dir TEXT Directory used for local storage of downloaded remote samples
-s,--scratch-mb UINT=0 Amount of local storage that can be used for downloading remote samples (MB)
TileDB options:
-p,--s3-part-size UINT=50 [S3 only] Part size to use for writes (MB)
--tiledb-config TEXT=[] ... CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
configuration parameter settings.
--stats Enable TileDB stats
--stats-vcf-header-array Enable TileDB stats for vcf header array usage
Advanced options:
--ratio-tiledb-memory FLOAT:FLOAT in [0.01 - 0.99]=0.5
Ratio of memory budget allocated to TileDB::sm.mem.total_budget
--max-tiledb-memory-mb UINT=4096 Maximum memory allocated to TileDB::sm.mem.total_budget (MiB)
--input-record-buffer-mb UINT=1 Size of input record buffer for each sample file (MiB)
--avg-vcf-record-size INT:INT in [1 - 4096]=512
Average VCF record size (bytes)
--ratio-task-size FLOAT:FLOAT in [0.01 - 1]=0.75
Ratio of worker task size to computed task size
--ratio-output-flush FLOAT:FLOAT in [0.01 - 1]=0.75
Ratio of output buffer capacity that triggers a flush to TileDB
Contig options:
--disable-contig-fragment-merging{false} Excludes: --contigs-to-keep-separate --contigs-to-allow-merging
Disable merging of contigs into fragments. Generally contig fragment merging is good,
this is a performance optimization to reduce the prefixes on a s3/azure/gcs bucket
when there is a large number of pseudo contigs which are small in size.
--contigs-to-keep-separate TEXT ... Excludes: --disable-contig-fragment-merging --contigs-to-allow-merging
Comma-separated list of contigs that should not be merged into combined fragments.
The default list includes all standard human chromosomes in both UCSC (e.g., chr1)
and Ensembl (e.g., 1) formats.
--contigs-to-allow-merging TEXT=[] ... Excludes: --disable-contig-fragment-merging --contigs-to-keep-separate
Comma-separated list of contigs that should be allowed to be merged into combined
fragments.
--contig-mode ENUM:value in {all->all,merged->merged,separate->separate} OR {all,merged,separate}=all
Select which contigs are ingested: 'separate', 'merged', or 'all' contigs
Debug options:
--log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
Log message level
--log-file TEXT Log message output file
-v,--verbose :DEPRECATED Enable verbose output DEPRECATED: please use '--log-level debug' instead
Legacy options:
-n,--max-record-buff UINT Max number of VCF records to buffer per file
-k,--thread-task-size UINT Max length (# columns) of an ingestion task. Affects load balancing of ingestion
work across threads, and total memory consumption.
-b,--mem-budget-mb UINT The maximum size of TileDB buffers before flushing (MiB)
Delete
Delete samples from a TileDB-VCF dataset
Usage: tiledbvcf delete [OPTIONS]
Options:
-u,--uri TEXT REQUIRED TileDB-VCF dataset URI
-s,--sample-names TEXT=[] ... CSV list of sample names to delete
--tiledb-config TEXT=[] ... CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
configuration parameter settings.
--log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
Log message level
--log-file TEXT Log message output file
Export
Exports data from a TileDB-VCF dataset
Usage: tiledbvcf export [OPTIONS]
Options:
-u,--uri TEXT REQUIRED TileDB-VCF dataset URI
Output options:
-O,--output-format ENUM:value in {b->b,t->t,u->u,v->v,z->z} OR {b,t,u,v,z}=b
Export format. Options are: 'b': bcf (compressed); 'u': bcf; 'z': vcf.gz; 'v': vcf;
't': TSV
-o,--output-path TEXT [TSV or combined VCF export only] The name of the output file.
-m,--merge Needs: --output-path Export combined VCF file.
-t,--tsv-fields TEXT=[] ... [TSV export only] An ordered CSV list of fields to export in the TSV. A field name
can be one of 'SAMPLE', 'ID', 'REF', 'ALT', 'QUAL', 'POS', 'CHR', 'FILTER'. Additionally,
INFO fields can be specified by 'I:<name>' and FMT fields with 'F:<name>'. To export
the intersecting query region for each row in the output, use the field names 'Q:POS',
'Q:END' and 'Q:LINE'.
-n,--limit UINT=18446744073709551615 Only export the first N intersecting records.
-d,--output-dir TEXT Directory used for local output of exported samples
--upload-dir TEXT If set, all output file(s) from the export process will be copied to the given directory
(or S3 prefix) upon completion.
-c,--count-only Excludes: --af-filter Don't write output files, only print the count of the resulting number of intersecting
records.
--af-filter TEXT Excludes: --count-only
If set, only export data that passes the AF filter.
Region options:
-r,--regions TEXT=[] ... Excludes: --regions-file
CSV list of regions to export in the format 'chr:min-max'
-R,--regions-file TEXT Excludes: --regions
File containing regions (BED format)
--sorted Do not sort regions or regions file if they are pre-sorted
--region-partition TEXT Partitions the list of regions to be exported and causes this export to export only
a specific partition of them. Specify in the format I:N where I is the partition
index and N is the total number of partitions. Useful for batch exports.
Sample options:
-f,--samples-file TEXT Excludes: --sample-names
Path to file with 1 sample name per line
-s,--sample-names TEXT=[] ... Excludes: --samples-file
CSV list of sample names to export
--sample-partition TEXT Partitions the list of samples to be exported and causes this export to export only
a specific partition of them. Specify in the format I:N where I is the partition
index and N is the total number of partitions. Useful for batch exports.
--disable-check-samples{false} Disable validating that sample passed exist in dataset before executing query and
error if any sample requested is not in the dataset
TileDB options:
--tiledb-config TEXT=[] ... CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
configuration parameter settings.
--mem-budget-buffer-percentage FLOAT=25
The percentage of the memory budget to use for TileDB query buffers.
--mem-budget-tile-cache-percentage FLOAT=10
The percentage of the memory budget to use for TileDB tile cache.
-b,--mem-budget-mb UINT=2048 The memory budget (MB) used when submitting TileDB queries.
--stats Enable TileDB stats
--stats-vcf-header-array Enable TileDB stats for vcf header array usage
Debug options:
--log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
Log message level
--log-file TEXT Log message output file
-v,--verbose :DEPRECATED Enable verbose output DEPRECATED: please use '--log-level debug' instead
--enable-progress-estimation Enable progress estimation in verbose mode. Progress estimation can sometimes cause
a performance impact, so enable this with consideration.
--debug-print-vcf-regions Enable debug printing of vcf region passed by user or bed file. Requires verbose
mode
--debug-print-sample-list Enable debug printing of sample list used in read. Requires verbose mode
--debug-print-tiledb-query-ranges Enable debug printing of tiledb query ranges used in read. Requires verbose mode
List
Lists all sample names present in a TileDB-VCF dataset
Usage: tiledbvcf list [OPTIONS]
Options:
-u,--uri TEXT REQUIRED TileDB-VCF dataset URI
--tiledb-config TEXT=[] ... CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
configuration parameter settings.
--log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
Log message level
--log-file TEXT Log message output file
Stat
Prints high-level statistics about a TileDB-VCF dataset
Usage: tiledbvcf stat [OPTIONS]
Options:
-u,--uri TEXT REQUIRED TileDB-VCF dataset URI
--tiledb-config TEXT=[] ... CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
configuration parameter settings.
--log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
Log message level
--log-file TEXT Log message output file
Consolidate
Consolidate TileDB-VCF dataset
Usage: tiledbvcf utils consolidate [OPTIONS] SUBCOMMAND
Options:
-h,--help Print this help message and exit
Subcommands:
commits Consolidate TileDB-VCF dataset commits
fragments Consolidate TileDB-VCF dataset fragments
fragment_meta Consolidate TileDB-VCF dataset fragment metadata
Vacuum
Vacuum TileDB-VCF dataset
Usage: tiledbvcf utils vacuum [OPTIONS] SUBCOMMAND
Options:
-h,--help Print this help message and exit
Subcommands:
commits Vacuum TileDB-VCF dataset commits
fragments Vacuum TileDB-VCF dataset fragments
fragment_meta Vacuum TileDB-VCF dataset fragment metadata