files.ingestion

cloud.files.ingestion

Attributes

Name Description
ingest_and_index Ingest files into a dataset and index them afterwards.

Functions

Name Description
add_arrays_to_group_udf Add a list of TileDB array uris in a TileDB group.
ingest_files Ingest files into a dataset.
ingest_files_udf Ingest files.

add_arrays_to_group_udf

cloud.files.ingestion.add_arrays_to_group_udf(
    array_uris,
    group_uri,
    *,
    config=None,
    verbose=False,
)

Add a list of TileDB array uris in a TileDB group.

Parameters

Name Type Description Default
array_uris Sequence[str] An iterable of TileDB URIs. required
group_uri str A TileDB Group URI. required
config Optional[dict] Config dictionary, defaults to None None
verbose bool Verbose logging, defaults to False False

ingest_files

cloud.files.ingestion.ingest_files(
    dataset_uri,
    *,
    search_uri=None,
    pattern=None,
    ignore=None,
    max_files=None,
    batch_size=file_udfs.DEFAULT_BATCH_SIZE,
    acn=None,
    config=None,
    namespace=None,
    group_uri=None,
    taskgraph_name=DEFAULT_FILE_INGESTION_NAME,
    ingest_resources=dag.MIN_BATCH_RESOURCES,
    verbose=False,
)

Ingest files into a dataset.

Parameters

Name Type Description Default
dataset_uri str The dataset URI required
search_uri Optional[Union[Sequence[str], str]] URI or an iterable of URIs of input files. Defaults to None. None
pattern Optional[str] UNIX shell style pattern to filter files in the search, defaults to None None
ignore Optional[str] UNIX shell style pattern to filter files out of the search, defaults to None None
max_files Optional[int] maximum number of File URIs to read/find, defaults to None (no limit) None
batch_size Optional[int] Batch size for file ingestion, defaults to 100. file_udfs.DEFAULT_BATCH_SIZE
acn Optional[str] Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None None
config Optional[dict] Config dictionary, defaults to None None
namespace Optional[str] TileDB-Cloud namespace, defaults to None None
group_uri Optional[str] A TileDB Group URI, defaults to None. None
taskgraph_name Optional[str] Optional name for taskgraph, defaults to “file-ingestion”. DEFAULT_FILE_INGESTION_NAME
ingest_resources Optional[Mapping[str, Any]] Configuration for node specs, defaults to {“cpu”: “1”, “memory”: “2Gi”} dag.MIN_BATCH_RESOURCES
verbose bool Verbose logging, defaults to False False

Returns

Name Type Description
str The resulting TaskGraph’s server UUID.

ingest_files_udf

cloud.files.ingestion.ingest_files_udf(
    dataset_uri,
    file_uris,
    *,
    acn=None,
    namespace=None,
    verbose=False,
)

Ingest files.

Parameters

Name Type Description Default
dataset_uri str The dataset URI. required
file_uris Sequence[str] An iterable of file URIs. required
acn Optional[str] Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None None
namespace Optional[str] TileDB-Cloud namespace, defaults to None. None
verbose bool Verbose logging, defaults to False. False

Returns

Name Type Description
List[str] A list of the ingested files’ resulting URIs.