files.ingestion
cloud.files.ingestion
Attributes
| Name | Description |
|---|---|
| ingest_and_index | Ingest files into a dataset and index them afterwards. |
Functions
| Name | Description |
|---|---|
| add_arrays_to_group_udf | Add a list of TileDB array uris in a TileDB group. |
| ingest_files | Ingest files into a dataset. |
| ingest_files_udf | Ingest files. |
add_arrays_to_group_udf
cloud.files.ingestion.add_arrays_to_group_udf(
array_uris,
group_uri,
*,
config=None,
verbose=False,
)Add a list of TileDB array uris in a TileDB group.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| array_uris | Sequence[str] | An iterable of TileDB URIs. | required |
| group_uri | str | A TileDB Group URI. | required |
| config | Optional[dict] | Config dictionary, defaults to None | None |
| verbose | bool | Verbose logging, defaults to False | False |
ingest_files
cloud.files.ingestion.ingest_files(
dataset_uri,
*,
search_uri=None,
pattern=None,
ignore=None,
max_files=None,
batch_size=file_udfs.DEFAULT_BATCH_SIZE,
acn=None,
config=None,
namespace=None,
group_uri=None,
taskgraph_name=DEFAULT_FILE_INGESTION_NAME,
ingest_resources=dag.MIN_BATCH_RESOURCES,
verbose=False,
)Ingest files into a dataset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | The dataset URI | required |
| search_uri | Optional[Union[Sequence[str], str]] | URI or an iterable of URIs of input files. Defaults to None. | None |
| pattern | Optional[str] | UNIX shell style pattern to filter files in the search, defaults to None | None |
| ignore | Optional[str] | UNIX shell style pattern to filter files out of the search, defaults to None | None |
| max_files | Optional[int] | maximum number of File URIs to read/find, defaults to None (no limit) | None |
| batch_size | Optional[int] | Batch size for file ingestion, defaults to 100. | file_udfs.DEFAULT_BATCH_SIZE |
| acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
| config | Optional[dict] | Config dictionary, defaults to None | None |
| namespace | Optional[str] | TileDB-Cloud namespace, defaults to None | None |
| group_uri | Optional[str] | A TileDB Group URI, defaults to None. | None |
| taskgraph_name | Optional[str] | Optional name for taskgraph, defaults to “file-ingestion”. | DEFAULT_FILE_INGESTION_NAME |
| ingest_resources | Optional[Mapping[str, Any]] | Configuration for node specs, defaults to {“cpu”: “1”, “memory”: “2Gi”} | dag.MIN_BATCH_RESOURCES |
| verbose | bool | Verbose logging, defaults to False | False |
Returns
| Name | Type | Description |
|---|---|---|
| str | The resulting TaskGraph’s server UUID. |
ingest_files_udf
cloud.files.ingestion.ingest_files_udf(
dataset_uri,
file_uris,
*,
acn=None,
namespace=None,
verbose=False,
)Ingest files.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_uri | str | The dataset URI. | required |
| file_uris | Sequence[str] | An iterable of file URIs. | required |
| acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
| namespace | Optional[str] | TileDB-Cloud namespace, defaults to None. | None |
| verbose | bool | Verbose logging, defaults to False. | False |
Returns
| Name | Type | Description |
|---|---|---|
| List[str] | A list of the ingested files’ resulting URIs. |