files.ingestion
cloud.files.ingestion
Attributes
Name | Description |
---|---|
ingest_and_index | Ingest files into a dataset and index them afterwards. |
Functions
Name | Description |
---|---|
add_arrays_to_group_udf | Add a list of TileDB array uris in a TileDB group. |
ingest_files | Ingest files into a dataset. |
ingest_files_udf | Ingest files. |
add_arrays_to_group_udf
cloud.files.ingestion.add_arrays_to_group_udf(
array_uris
group_uri*
=None
config=False
verbose )
Add a list of TileDB array uris in a TileDB group.
Parameters
Name | Type | Description | Default |
---|---|---|---|
array_uris | Sequence[str] | An iterable of TileDB URIs. | required |
group_uri | str | A TileDB Group URI. | required |
config | Optional[dict] | Config dictionary, defaults to None | None |
verbose | bool | Verbose logging, defaults to False | False |
ingest_files
cloud.files.ingestion.ingest_files(
dataset_uri*
=None
search_uri=None
pattern=None
ignore=None
max_files=file_udfs.DEFAULT_BATCH_SIZE
batch_size=None
acn=None
config=None
namespace=None
group_uri=DEFAULT_FILE_INGESTION_NAME
taskgraph_name=dag.MIN_BATCH_RESOURCES
ingest_resources=False
verbose )
Ingest files into a dataset.
Parameters
Name | Type | Description | Default |
---|---|---|---|
dataset_uri | str | The dataset URI | required |
search_uri | Optional[Union[Sequence[str], str]] | URI or an iterable of URIs of input files. Defaults to None. | None |
pattern | Optional[str] | UNIX shell style pattern to filter files in the search, defaults to None | None |
ignore | Optional[str] | UNIX shell style pattern to filter files out of the search, defaults to None | None |
max_files | Optional[int] | maximum number of File URIs to read/find, defaults to None (no limit) | None |
batch_size | Optional[int] | Batch size for file ingestion, defaults to 100. | file_udfs.DEFAULT_BATCH_SIZE |
acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
config | Optional[dict] | Config dictionary, defaults to None | None |
namespace | Optional[str] | TileDB-Cloud namespace, defaults to None | None |
group_uri | Optional[str] | A TileDB Group URI, defaults to None. | None |
taskgraph_name | Optional[str] | Optional name for taskgraph, defaults to “file-ingestion”. | DEFAULT_FILE_INGESTION_NAME |
ingest_resources | Optional[Mapping[str, Any]] | Configuration for node specs, defaults to {“cpu”: “1”, “memory”: “2Gi”} | dag.MIN_BATCH_RESOURCES |
verbose | bool | Verbose logging, defaults to False | False |
Returns
Name | Type | Description |
---|---|---|
str | The resulting TaskGraph’s server UUID. |
ingest_files_udf
cloud.files.ingestion.ingest_files_udf(
dataset_uri
file_uris*
=None
acn=None
namespace=False
verbose )
Ingest files.
Parameters
Name | Type | Description | Default |
---|---|---|---|
dataset_uri | str | The dataset URI. | required |
file_uris | Sequence[str] | An iterable of file URIs. | required |
acn | Optional[str] | Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None | None |
namespace | Optional[str] | TileDB-Cloud namespace, defaults to None. | None |
verbose | bool | Verbose logging, defaults to False. | False |
Returns
Name | Type | Description |
---|---|---|
List[str] | A list of the ingested files’ resulting URIs. |