vcf.split
client.vcf.split
Split samples from multi-sample VCF.
Functions
| Name | Description |
|---|---|
| ls_samples | List samples in an aggregate VCF. |
| split_one_sample | Split one sample from multi-sample VCF. |
| split_vcf | Split individual sample VCFs from an aggreate VCF. |
ls_samples
client.vcf.split.ls_samples(vcf_uri, config=None)List samples in an aggregate VCF.
:param vcf_uri: S3 path to aggregate VCF. :param config: TileDB config params. :return: Samples included in VCF.
split_one_sample
client.vcf.split.split_one_sample(vcf_uri, sample, output_uri, config=None)Split one sample from multi-sample VCF.
:param vcf_uri: URI of VCF to isolate from. :param sample: Sample name to isolate. :param output_uri: URI to deposit isolated VCF. :param config: TileDB config object. :return: URI of isolated sample.
split_vcf
client.vcf.split.split_vcf(
vcf_uri,
output_uri,
workspace,
acn,
resources={'cpu': '2', 'memory': '30Gi'},
compute=True,
verbose=False,
samples=None,
retry_count=1,
max_workers=100,
config=None,
)Split individual sample VCFs from an aggreate VCF.
Given an aggregate VCF file containing multiple samples, split all samples into isolated VCFs, one per sample. Alternatively, specify sample(s) to split apart from VCF if not all isolated VCFs are needed.
:param vcf_uri: Aggregate VCF URI. :param output_uri: Output URI to write isolated VCFs. :param workspace: TileDB Cloud workspace to process task graph. :param acn: Access credential friendly name to auth storage i/o. :param resources: Resources applied to splitting UDF (start with default). :param compute: Whether to execute DAG. :param verbose: Logging verbosity. :param samples: Indicate a batch of sample names within vcf_uri to isolate if it is undesired to isolate all samples (default). :param retry_count: Number of Node retries. :param max_workers: Max workers to engage simultaneously. :param config: TileDB configuration parameters used to configure virtual filesystem handler. :return: DAG instantiated as specified.