vcf.ingestion
cloud.vcf.ingestion
Classes
| Name | Description |
|---|---|
| Contigs | The contigs to ingest. |
Contigs
cloud.vcf.ingestion.Contigs()The contigs to ingest.
ALL = all contigs CHROMOSOMES = all human chromosomes OTHER = all contigs other than the human chromosomes ALL_DISABLE_MERGE = all contigs with merging disabled, for non-human datasets
Functions
| Name | Description |
|---|---|
| consolidate_dataset_udf | Consolidate arrays in the dataset. |
| create_dataset_udf | Create a TileDB-VCF dataset. |
| create_manifest | Create a manifest array in the dataset. |
| filter_samples_udf | Return URIs for samples not already in the dataset. |
| filter_uris_udf | Return URIs from sample_uris that are not in the manifest. |
| find_uris_aws_udf | Find URIs matching a pattern in the search_uri path with an efficient |
| find_uris_udf | Find URIs matching a pattern in the search_uri path. |
| get_logger_wrapper | Get a logger instance and log version information. |
| ingest_manifest_dag | Create a DAG to load the manifest array. |
| ingest_manifest_udf | Ingest sample URIs into the manifest array. |
| ingest_samples_dag | Create a DAG to ingest samples into the dataset. |
| ingest_samples_udf | Ingest samples into the dataset. |
| ingest_vcf | Ingest samples into a dataset. |
| ingest_vcf_annotations | Ingest annotation VCF into a dataset. For example, a ClinVar or gnomAD VCF. |
| read_metadata_uris_udf | Read a list of URIs from a TileDB array. The URIs will be read from the |
| read_uris_udf | Read a list of URIs from a URI. |
| register_dataset_udf | Register the dataset on TileDB Cloud. |
consolidate_dataset_udf
cloud.vcf.ingestion.consolidate_dataset_udf(
dataset_uri,
*,
config=None,
exclude=MANIFEST_ARRAY,
include=None,
id='consolidate',
verbose=False,
)Consolidate arrays in the dataset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| exclude | Optional[Union[Sequence[str], str]] | group members to exclude, defaults to MANIFEST_ARRAY | MANIFEST_ARRAY |
| include | Optional[Union[Sequence[str], str]] | group members to include, defaults to None | None |
| id | str | profiler event id, defaults to “consolidate” | 'consolidate' |
| verbose | bool | verbose logging, defaults to False | False |
create_dataset_udf
cloud.vcf.ingestion.create_dataset_udf(
dataset_uri,
*,
config=None,
extra_attrs=None,
vcf_attrs=None,
anchor_gap=None,
compression_level=None,
annotation_dataset=False,
verbose=False,
)Create a TileDB-VCF dataset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| extra_attrs | Optional[Union[Sequence[str], str]] | INFO/FORMAT fields to materialize, defaults to None | None |
| vcf_attrs | Optional[str] | VCF with all INFO/FORMAT fields to materialize, defaults to None | None |
| anchor_gap | Optional[int] | anchor gap for VCF dataset, defaults to None | None |
| compression_level | Optional[int] | zstd compression level for the VCF dataset, defaults to None (uses the default level in TileDB-VCF) | None |
| annotation_dataset | bool | create an annotation dataset, defaults to False | False |
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| str | dataset URI |
create_manifest
cloud.vcf.ingestion.create_manifest(dataset_uri, group)Create a manifest array in the dataset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| group | tiledb.Group | dataset group | required |
filter_samples_udf
cloud.vcf.ingestion.filter_samples_udf(
dataset_uri,
*,
config=None,
verbose=False,
)Return URIs for samples not already in the dataset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| Sequence[str] | sample URIs |
filter_uris_udf
cloud.vcf.ingestion.filter_uris_udf(
dataset_uri,
sample_uris,
*,
config=None,
verbose=False,
)Return URIs from sample_uris that are not in the manifest.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| sample_uris | Sequence[str] | sample URIs | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| Sequence[str] | filtered sample URIs |
find_uris_aws_udf
cloud.vcf.ingestion.find_uris_aws_udf(
dataset_uri,
search_uri,
*,
config=None,
include=None,
exclude=None,
max_files=None,
verbose=False,
)Find URIs matching a pattern in the search_uri path with an efficient implementation for S3.
include and exclude patterns are Unix shell style (see fnmatch module).
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| search_uri | str | URI to search for VCF files | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| include | Optional[str] | include pattern used in the search, defaults to None | None |
| exclude | Optional[str] | exclude pattern applied to the search results, defaults to None | None |
| max_files | Optional[int] | maximum number of URIs returned, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| Sequence[str] | list of URIs |
find_uris_udf
cloud.vcf.ingestion.find_uris_udf(
dataset_uri,
search_uri,
*,
config=None,
include=None,
exclude=None,
max_files=None,
verbose=False,
)Find URIs matching a pattern in the search_uri path.
include and exclude patterns are Unix shell style (see fnmatch module).
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| search_uri | str | URI to search for VCF files | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| include | Optional[str] | include pattern used in the search, defaults to None | None |
| exclude | Optional[str] | exclude pattern applied to the search results, defaults to None | None |
| max_files | Optional[int] | maximum number of URIs returned, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| Sequence[str] | list of URIs |
get_logger_wrapper
cloud.vcf.ingestion.get_logger_wrapper(verbose=False)Get a logger instance and log version information.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| logging.Logger | logger instance |
ingest_manifest_dag
cloud.vcf.ingestion.ingest_manifest_dag(
dataset_uri,
*,
acn=None,
config=None,
namespace=None,
search_uri=None,
pattern=None,
ignore=None,
sample_list_uri=None,
metadata_uri=None,
metadata_attr='uri',
max_files=None,
batch_size=MANIFEST_BATCH_SIZE,
workers=MANIFEST_WORKERS,
extra_attrs=None,
vcf_attrs=None,
anchor_gap=None,
compression_level=None,
verbose=False,
aws_find_mode=False,
disable_manifest=False,
consolidate_resources=CONSOLIDATE_RESOURCES,
manifest_resources=MANIFEST_RESOURCES,
create_resources=None,
read_vcf_uris_resources=None,
filter_uri_resources=None,
)Create a DAG to load the manifest array.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| namespace | Optional[str] | TileDB-Cloud namespace, defaults to None | None |
| search_uri | Optional[str] | URI to search for VCF files, defaults to None | None |
| pattern | Optional[str] | pattern to match when searching for VCF files, defaults to None | None |
| ignore | Optional[str] | pattern to ignore when searching for VCF files, defaults to None | None |
| sample_list_uri | Optional[str] | URI with a list of VCF URIs, defaults to None | None |
| metadata_uri | Optional[str] | URI of metadata array holding VCF URIs, defaults to None | None |
| metadata_attr | str | name of metadata attribute containing URIs, defaults to “uri” | 'uri' |
| max_files | Optional[int] | maximum number of URIs to ingest, defaults to None | None |
| batch_size | int | manifest batch size, defaults to MANIFEST_BATCH_SIZE | MANIFEST_BATCH_SIZE |
| workers | int | maximum number of parallel workers, defaults to MANIFEST_WORKERS | MANIFEST_WORKERS |
| extra_attrs | Optional[Union[Sequence[str], str]] | INFO/FORMAT fields to materialize, defaults to None | None |
| vcf_attrs | Optional[str] | VCF with all INFO/FORMAT fields to materialize, defaults to None | None |
| anchor_gap | Optional[int] | anchor gap for VCF dataset, defaults to None | None |
| compression_level | Optional[int] | zstd compression level for the VCF dataset, defaults to None (uses the default level in TileDB-VCF) | None |
| verbose | bool | verbose logging, defaults to False | False |
| aws_find_mode | bool | use AWS CLI to find VCFs, defaults to False | False |
| disable_manifest | bool | disable manifest creation, defaults to False | False |
| consolidate_resources | Optional[Mapping[str, str]] | manual override for consolidate UDF resources, defaults to CONSOLIDATE_RESOURCES | CONSOLIDATE_RESOURCES |
| manifest_resources | Optional[Mapping[str, str]] | manual override for manifest UDF resources, defaults to MANIFEST_RESOURCES | MANIFEST_RESOURCES |
| create_resources | Optional[Mapping[str, str]] | manual override for create UDF resources, defaults to None | None |
| read_vcf_uris_resources | Optional[Mapping[str, str]] | manual override for read VCF UDF resources, defaults to None | None |
| filter_uri_resources | Optional[Mapping[str, str]] | manual override for filter VCF UDF resources, defaults to None | None |
ingest_manifest_udf
cloud.vcf.ingestion.ingest_manifest_udf(
dataset_uri,
sample_uris,
*,
config=None,
id='manifest',
verbose=False,
)Ingest sample URIs into the manifest array.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| sample_uris | Sequence[str] | sample URIs | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| id | str | profiler event id, defaults to “manifest” | 'manifest' |
| verbose | bool | verbose logging, defaults to False | False |
ingest_samples_dag
cloud.vcf.ingestion.ingest_samples_dag(
dataset_uri,
*,
acn=None,
config=None,
namespace=None,
contigs=Contigs.ALL,
threads=VCF_THREADS,
batch_size=VCF_BATCH_SIZE,
workers=VCF_WORKERS,
max_samples=None,
resume=True,
verbose=False,
create_index=True,
trace_id=None,
consolidate_stats=False,
use_remote_tmp=False,
sample_list_uri=None,
ingest_resources=None,
consolidate_resources=CONSOLIDATE_RESOURCES,
filter_samples_resources=FILTER_SAMPLES_RESOURCES,
group_fragments_resources=GROUP_FRAGMENTS_RESOURCES,
)Create a DAG to ingest samples into the dataset.
Note: If sample_list_uri is provided, the manifest is not checked for existing samples.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| namespace | Optional[str] | TileDB-Cloud namespace, defaults to None | None |
| contigs | Optional[Union[Sequence[str], Contigs]] | contig mode (Contigs.ALL | Contigs.CHROMOSOMES | Contigs.OTHER | Contigs.ALL_DISABLE_MERGE) or list of contigs to ingest, defaults to Contigs.ALL | Contigs.ALL |
| threads | int | number of threads to use per ingestion task, defaults to VCF_THREADS | VCF_THREADS |
| batch_size | int | sample batch size, defaults to VCF_BATCH_SIZE | VCF_BATCH_SIZE |
| workers | int | maximum number of parallel workers, defaults to VCF_WORKERS | VCF_WORKERS |
| max_samples | Optional[int] | maximum number of samples to ingest, defaults to None (no limit) | None |
| resume | bool | enable resume ingestion mode, defaults to True | True |
| verbose | bool | verbose logging, defaults to False | False |
| create_index | bool | force creation of a local index file, defaults to True | True |
| trace_id | Optional[str] | trace ID for logging, defaults to None | None |
| consolidate_stats | bool | consolidate the stats arrays, defaults to False | False |
| use_remote_tmp | bool | use remote tmp space if VCFs need to be bgzipped, defaults to False (preferred for small VCFs) | False |
| sample_list_uri | Optional[str] | URI with a list of VCF URIs, defaults to None | None |
| ingest_resources | Optional[Mapping[str, str]] | manual override for ingest UDF resources, defaults to None | None |
| consolidate_resources | Optional[Mapping[str, str]] | manual override for consolidate UDF resources, defaults to CONSOLIDATE_RESOURCES | CONSOLIDATE_RESOURCES |
| filter_samples_resources | Optional[Mapping[str, str]] | manual override for filter samples UDF resources, defaults to FILTER_SAMPLES_RESOURCES | FILTER_SAMPLES_RESOURCES |
| group_fragments_resources | Optional[Mapping[str, str]] | resources for the group_fragments node, defaults to GROUP_FRAGMENTS_RESOURCES | GROUP_FRAGMENTS_RESOURCES |
ingest_samples_udf
cloud.vcf.ingestion.ingest_samples_udf(
dataset_uri,
sample_uris,
*,
config=None,
threads,
memory_mb,
sample_batch_size,
contig_mode='all',
contigs_to_keep_separate=None,
contig_fragment_merging=True,
resume=True,
create_index=True,
id='samples',
verbose=False,
trace_id=None,
use_remote_tmp=False,
)Ingest samples into the dataset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| sample_uris | Sequence[str] | sample URIs | required |
| threads | int | number of threads to use for ingestion | required |
| memory_mb | int | memory to use for ingestion in MiB | required |
| sample_batch_size | int | sample batch size to use for ingestion | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| contig_mode | str | ingestion mode, defaults to “all” | 'all' |
| contigs_to_keep_separate | Optional[Sequence[str]] | list of contigs to keep separate, defaults to None | None |
| contig_fragment_merging | bool | enable contig fragment merging, defaults to True | True |
| resume | bool | enable resume ingestion mode, defaults to True | True |
| create_index | bool | force creation of a local index file, defaults to True | True |
| id | str | profiler event id, defaults to “samples” | 'samples' |
| verbose | bool | verbose logging, defaults to False | False |
| trace_id | Optional[str] | trace ID for logging, defaults to None | None |
| use_remote_tmp | bool | use remote tmp space if VCFs need to be bgzipped, defaults to False (preferred for small VCFs) | False |
ingest_vcf
cloud.vcf.ingestion.ingest_vcf(
dataset_uri,
*,
acn=None,
config=None,
namespace=None,
register_name=None,
search_uri=None,
pattern=None,
ignore=None,
sample_list_uri=None,
metadata_uri=None,
metadata_attr='uri',
max_files=None,
max_samples=None,
contigs=Contigs.ALL,
resume=True,
extra_attrs=DEFAULT_ATTRIBUTES,
vcf_attrs=None,
anchor_gap=None,
compression_level=None,
manifest_batch_size=MANIFEST_BATCH_SIZE,
manifest_workers=MANIFEST_WORKERS,
vcf_batch_size=VCF_BATCH_SIZE,
vcf_workers=VCF_WORKERS,
vcf_threads=VCF_THREADS,
verbose=False,
create_index=True,
trace_id=None,
consolidate_stats=True,
aws_find_mode=False,
use_remote_tmp=False,
disable_manifest=False,
ingest_resources=None,
consolidate_resources=CONSOLIDATE_RESOURCES,
manifest_resources=MANIFEST_RESOURCES,
create_resources=None,
read_vcf_uris_resources=None,
filter_uri_resources=None,
filter_samples_resources=FILTER_SAMPLES_RESOURCES,
group_fragments_resources=GROUP_FRAGMENTS_RESOURCES,
)Ingest samples into a dataset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
| config | config dictionary, defaults to None | None |
|
| namespace | Optional[str] | TileDB-Cloud namespace, defaults to None | None |
| register_name | Optional[str] | name to register the dataset with on TileDB Cloud, defaults to None | None |
| search_uri | Optional[str] | URI to search for VCF files, defaults to None | None |
| pattern | Optional[str] | Unix shell style pattern to match when searching for VCF files, defaults to None | None |
| ignore | Optional[str] | Unix shell style pattern to ignore when searching for VCF files, defaults to None | None |
| sample_list_uri | Optional[str] | URI with a list of VCF URIs, defaults to None | None |
| metadata_uri | Optional[str] | URI of metadata array holding VCF URIs, defaults to None | None |
| metadata_attr | str | name of metadata attribute containing URIs, defaults to “uri” | 'uri' |
| max_files | Optional[int] | maximum number of VCF URIs to read/find, defaults to None (no limit) | None |
| max_samples | Optional[int] | maximum number of samples to ingest, defaults to None (no limit) | None |
| contigs | Optional[Union[Sequence[str], Contigs]] | contig mode (Contigs.ALL | Contigs.CHROMOSOMES | Contigs.OTHER | Contigs.ALL_DISABLE_MERGE) or list of contigs to ingest, defaults to Contigs.ALL | Contigs.ALL |
| resume | bool | enable resume ingestion mode, defaults to True | True |
| extra_attrs | Optional[Union[Sequence[str], str]] | INFO/FORMAT fields to materialize, defaults to repr(DEFAULT_ATTRIBUTES) |
DEFAULT_ATTRIBUTES |
| vcf_attrs | Optional[str] | VCF with all INFO/FORMAT fields to materialize, defaults to None | None |
| anchor_gap | Optional[int] | anchor gap for VCF dataset, defaults to None | None |
| compression_level | Optional[int] | zstd compression level for the VCF dataset, defaults to None (uses the default level in TileDB-VCF) | None |
| manifest_batch_size | int | batch size for manifest ingestion, defaults to MANIFEST_BATCH_SIZE | MANIFEST_BATCH_SIZE |
| manifest_workers | int | number of workers for manifest ingestion, defaults to MANIFEST_WORKERS | MANIFEST_WORKERS |
| vcf_batch_size | int | batch size for VCF ingestion, defaults to VCF_BATCH_SIZE | VCF_BATCH_SIZE |
| vcf_workers | int | number of workers for VCF ingestion, defaults to VCF_WORKERS | VCF_WORKERS |
| vcf_threads | int | number of threads for VCF ingestion, defaults to VCF_THREADS | VCF_THREADS |
| verbose | bool | verbose logging, defaults to False | False |
| create_index | bool | force creation of a local index file, defaults to True | True |
| trace_id | Optional[str] | trace ID for logging, defaults to None | None |
| consolidate_stats | bool | consolidate the stats arrays, defaults to True | True |
| aws_find_mode | bool | use AWS CLI to find VCFs, defaults to False | False |
| use_remote_tmp | bool | use remote tmp space if VCFs need to be sorted and bgzipped, defaults to False (preferred for small VCFs) | False |
| disable_manifest | bool | disable manifest creation, defaults to False | False |
| ingest_resources | Optional[Mapping[str, str]] | manual override for ingest UDF resources, defaults to None | None |
| consolidate_resources | Optional[Mapping[str, str]] | manual override for consolidate UDF resources, defaults to CONSOLIDATE_RESOURCES | CONSOLIDATE_RESOURCES |
| manifest_resources | Optional[Mapping[str, str]] | manual override for manifest UDF resources, defaults to MANIFEST_RESOURCES | MANIFEST_RESOURCES |
| create_resources | Optional[Mapping[str, str]] | manual override for create UDF resources, defaults to None | None |
| read_vcf_uris_resources | Optional[Mapping[str, str]] | manual override for read VCF UDF resources, defaults to None | None |
| filter_uri_resources | Optional[Mapping[str, str]] | manual override for filter VCF UDF resources, defaults to None | None |
| filter_samples_resources | Optional[Mapping[str, str]] | manual override for filter samples UDF resources, defaults to FILTER_SAMPLES_RESOURCES | FILTER_SAMPLES_RESOURCES |
| group_fragments_resources | Optional[Mapping[str, str]] | resources for the group_fragments node, defaults to GROUP_FRAGMENTS_RESOURCES | GROUP_FRAGMENTS_RESOURCES |
ingest_vcf_annotations
cloud.vcf.ingestion.ingest_vcf_annotations(
dataset_uri,
*,
vcf_uri=None,
search_uri=None,
pattern=None,
ignore=None,
create_index=True,
config=None,
acn=None,
namespace=None,
register_name=None,
verbose=False,
ingest_resources=None,
consolidate_resources=CONSOLIDATE_RESOURCES,
find_uris_resources=None,
create_resources=None,
register_resources=None,
)Ingest annotation VCF into a dataset. For example, a ClinVar or gnomAD VCF.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| vcf_uri | Optional[str] | VCF URI, defaults to None | None |
| search_uri | Optional[str] | URI to search for VCF files, defaults to None | None |
| pattern | Optional[str] | Unix shell style pattern to match when searching for VCF files, defaults to None | None |
| ignore | Optional[str] | Unix shell style pattern to ignore when searching for VCF files, defaults to None | None |
| create_index | bool | force creation of a local index file, defaults to True | True |
| config | config dictionary, defaults to None | None |
|
| acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
| namespace | Optional[str] | TileDB-Cloud namespace, defaults to None | None |
| register_name | Optional[str] | name to register the dataset with on TileDB Cloud, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |
| ingest_resources | Optional[Mapping[str, str]] | manual override for ingest UDF resources, defaults to None | None |
| consolidate_resources | Optional[Mapping[str, str]] | manual override for consolidate UDF resources, defaults to CONSOLIDATE_RESOURCES | CONSOLIDATE_RESOURCES |
| find_uris_resources | Optional[Mapping[str, str]] | manual override for find VCF UDF resources, defaults to None | None |
| create_resources | Optional[Mapping[str, str]] | manual override for create UDF resources, defaults to None | None |
| register_resources | Optional[Mapping[str, str]] | manual override for register UDF resources, defaults to None | None |
read_metadata_uris_udf
cloud.vcf.ingestion.read_metadata_uris_udf(
dataset_uri,
*,
config=None,
metadata_uri,
metadata_attr='uri',
max_files=None,
verbose=False,
)Read a list of URIs from a TileDB array. The URIs will be read from the attribute specified in the metadata_attr argument.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| config | Optional[Mapping[str, Any]] | TileDB config, defaults to None | None |
| metadata_uri | str | metadata array URI | required |
| metadata_attr | str | name of metadata attribute containing URIs, defaults to “uri” | 'uri' |
| max_files | Optional[int] | maximum number of URIs returned, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| Sequence[str] | list of URIs |
read_uris_udf
cloud.vcf.ingestion.read_uris_udf(
dataset_uri,
list_uri,
*,
config=None,
max_files=None,
verbose=False,
)Read a list of URIs from a URI.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| list_uri | str | URI of the list of URIs | required |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| max_files | Optional[int] | maximum number of URIs returned, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| Sequence[str] | list of URIs |
register_dataset_udf
cloud.vcf.ingestion.register_dataset_udf(
dataset_uri,
*,
register_name,
acn,
namespace=None,
config=None,
verbose=False,
)Register the dataset on TileDB Cloud.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | dataset URI | required |
| register_name | str | name to register the dataset with on TileDB Cloud | required |
| namespace | Optional[str] | TileDB Cloud namespace, defaults to the user’s default namespace | None |
| config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
| verbose | bool | verbose logging, defaults to False | False |