files.ingestion

cloud.files.ingestion

Attributes

Name Description
ingest_and_index Ingest files into a dataset and index them afterwards.

Functions

Name Description
add_arrays_to_group_udf Add a list of TileDB array uris in a TileDB group.
ingest_files Ingest files into a dataset.
ingest_files_udf Ingest files.

add_arrays_to_group_udf

cloud.files.ingestion.add_arrays_to_group_udf(
    array_uris
    group_uri
    *
    config=None
    verbose=False
)

Add a list of TileDB array uris in a TileDB group.

Parameters

Name Type Description Default
array_uris Sequence[str] An iterable of TileDB URIs. required
group_uri str A TileDB Group URI. required
config Optional[dict] Config dictionary, defaults to None None
verbose bool Verbose logging, defaults to False False

ingest_files

cloud.files.ingestion.ingest_files(
    dataset_uri
    *
    search_uri=None
    pattern=None
    ignore=None
    max_files=None
    batch_size=file_udfs.DEFAULT_BATCH_SIZE
    acn=None
    config=None
    namespace=None
    group_uri=None
    taskgraph_name=DEFAULT_FILE_INGESTION_NAME
    ingest_resources=dag.MIN_BATCH_RESOURCES
    verbose=False
)

Ingest files into a dataset.

Parameters

Name Type Description Default
dataset_uri str The dataset URI required
search_uri Optional[Union[Sequence[str], str]] URI or an iterable of URIs of input files. Defaults to None. None
pattern Optional[str] UNIX shell style pattern to filter files in the search, defaults to None None
ignore Optional[str] UNIX shell style pattern to filter files out of the search, defaults to None None
max_files Optional[int] maximum number of File URIs to read/find, defaults to None (no limit) None
batch_size Optional[int] Batch size for file ingestion, defaults to 100. file_udfs.DEFAULT_BATCH_SIZE
acn Optional[str] Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None None
config Optional[dict] Config dictionary, defaults to None None
namespace Optional[str] TileDB-Cloud namespace, defaults to None None
group_uri Optional[str] A TileDB Group URI, defaults to None. None
taskgraph_name Optional[str] Optional name for taskgraph, defaults to “file-ingestion”. DEFAULT_FILE_INGESTION_NAME
ingest_resources Optional[Mapping[str, Any]] Configuration for node specs, defaults to {“cpu”: “1”, “memory”: “2Gi”} dag.MIN_BATCH_RESOURCES
verbose bool Verbose logging, defaults to False False

Returns

Name Type Description
str The resulting TaskGraph’s server UUID.

ingest_files_udf

cloud.files.ingestion.ingest_files_udf(
    dataset_uri
    file_uris
    *
    acn=None
    namespace=None
    verbose=False
)

Ingest files.

Parameters

Name Type Description Default
dataset_uri str The dataset URI. required
file_uris Sequence[str] An iterable of file URIs. required
acn Optional[str] Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None None
namespace Optional[str] TileDB-Cloud namespace, defaults to None. None
verbose bool Verbose logging, defaults to False. False

Returns

Name Type Description
List[str] A list of the ingested files’ resulting URIs.