Ingest Samples
Note
Indexed files are required for ingestion. If your VCF/BCF files have not been indexed, you can use bcftools
to do so:
for f in data/vcfs/*.vcf.gz; do bcftools index -c $f; done
You can ingest samples into an already created dataset as follows:
import tiledbvcf
= "my_vcf_dataset"
uri = tiledbvcf.Dataset(uri, mode = "w")
ds = ["sample_1", "samples_2"]) ds.ingest_samples(sample_uris
Just add a regular expression for the VCF file locations at the end of the store
command:
tiledbvcf store --uri my_vcf_dataset *.bcf
Alternatively, provide a text file with the absolute locations of the VCF files, separated by a new line:
tiledbvcf store --uri my_vcf_dataset --samples-file samples.txt
Note
Incremental updates work in the same manner as the ingestion above, nothing special is needed. In addition, the ingestion is thread- and process-safe and, therefore, can be performed in parallel.