Perform Distributed Queries with TileDB-Cloud
The tiledbvcf
Python package includes integration with TileDB to enable distributing large queries in a serverless manner.
Task Graphs
You can use the tiledbvcf
package’s TileDB integration to partition read operations across regions and samples. The partitioning semantics are identical to those used by the CLI.
import tiledbvcf
import tiledb.cloud.vcf
'my-large-dataset',
tiledb.cloud.vcf.query.read(=['sample_name', 'pos_start', 'pos_end'],
attrs='very-large-bedfile.bed',
bed_file=10,
region_partitions=2) sample_partitions
The result is a pyarrow table.