array
client.array
Register, search, and manage arrays with TileDB.
Classes
| Name | Description |
|---|---|
| ArrayList | An incrementally built list of UDFArrayDetails. |
ArrayList
client.array.ArrayList()An incrementally built list of UDFArrayDetails.
For use in multi array UDFs.
Methods
| Name | Description |
|---|---|
| add | Add an array, with indexes and layout options. |
| get | Returns the list of UDFArrayDetails. |
add
client.array.ArrayList.add(uri, ranges=None, buffers=None, layout=None)Add an array, with indexes and layout options.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| uri | str | The fully qualified TileDB URI of the array. | required |
| ranges | list of pairs of int | A list of tuples, one tuple for each dimension of the array. The tuples specify the start and stop range in each dimension, as with a slice. | None |
| buffers | str or list of str | The name of an array attribute or list of names. | None |
| layout | str | The layout for the array. | None |
Examples
>>> array_list = ArrayList()
>>> array_list.add(
... "tiledb://workspace/teamspace/array",
... ranges=[(0,5), (0,5)],
... buffers=["a", "b", "c"],
)get
client.array.ArrayList.get()Returns the list of UDFArrayDetails.
Functions
| Name | Description |
|---|---|
| apply | Apply a user defined function to an array, synchronously. |
| apply_async | Apply a user-defined function to an array, asynchronously. |
| apply_base | Apply a user-defined function to an array. |
| exec_multi_array_udf | Apply a user-defined function to multiple arrays, synchronously. |
| exec_multi_array_udf_async | Apply a user-defined function to multiple arrays, asynchronously. |
| exec_multi_array_udf_base | Apply a user-defined function to multiple arrays. |
| parse_ranges | Takes a list of the following objects per dimension: |
apply
client.array.apply(*args, **kwargs)Apply a user defined function to an array, synchronously.
All arguments are exactly as in :func:apply_base, but this returns the data only.
Example:
import tiledb, tiledb.client, numpy def median(df): … return numpy.median(df[“a”]) # Open the array then run the UDF tiledb.client.array.apply(“tiledb://TileDB-Inc/quickstart_dense”, median, [(0,5), (0,5)], attrs=[“a”, “b”, “c”]) 2.0
apply_async
client.array.apply_async(*args, **kwargs)Apply a user-defined function to an array, asynchronously.
All arguments are exactly as in :func:apply_base, but this returns the data as a future-like AsyncResponse.
apply_base
client.array.apply_base(
array,
func,
*,
teamspace=None,
ranges=(),
attrs=(),
layout=None,
image_name='default',
http_compressor='deflate',
include_source_lines=True,
task_name=None,
result_format=models.ResultFormat.NATIVE,
store_results=False,
stored_param_uuids=(),
timeout=None,
resource_class=None,
_download_results=True,
_server_graph_uuid=None,
_client_node_uuid=None,
**kwargs,
)Apply a user-defined function to an array.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| array | str or object | The array asset to run the function on, identified by path, object, or “tiledb” URI. The array teamspace may be different from the UDF teamspace. |
required |
| func | callable or Asset - like | The function to run. This can be either a callable function, or a registered function asset identified by path, object, or “tiledb” URI. | required |
| teamspace | TeamspaceLike | The teamspace of the UDF. If the func parameter specifies a teamspace, this parameter may be omitted. |
None |
| ranges | list | Ranges to issue query on. | () |
| attrs | list | List of attributes or dimensions to fetch in query. | () |
| layout | str | Tiledb query layout. | None |
| image_name | str | UDF image name to use, useful for testing beta features. | 'default' |
| http_compressor | str | Set http compressor for results. | 'deflate' |
| include_source_lines | bool | True to send the source code of your UDF to the server with your request. (This means it can be shown to you in stack traces if an error occurs.) False to send only compiled Python bytecode. | True |
| task_name | str | Name to assign the task for logging and audit purposes. | None |
| result_format | ResultFormat | Result serialization format. | models.ResultFormat.NATIVE |
| store_results | bool | True to temporarily store results on the server side for later retrieval (in addition to downloading them). | False |
| stored_param_uuids | list | A list of UUIDs. | () |
| timeout | int | Timeout for UDF in seconds. | None |
| resource_class | str | The name of the resource class to use. Resource classes define maximum limits for cpu and memory usage. | None |
| _download_results | bool | True to download and parse results eagerly. False to not download results by default and only do so lazily (e.g. for an intermediate node in a graph). | True |
| _server_graph_uuid | str | If this function is being executed within a DAG, the server-generated ID of the graph’s log. Otherwise, None. | None |
| _client_node_uuid | str | If this function is being executed within a DAG, the ID of this function’s node within the graph. Otherwise, None. | None |
| kwargs | dict | named arguments to pass to function. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| results.RemoteResult | A future containing the results of the UDF. |
Examples
>>> import numpy
>>> def median(df):
... return numpy.median(df["a"])
...
>>> tiledb.cloud.array.apply_base(
... "folder/array",
... median,
... teamspace="teamspace",
... ranges=[(0,5), (0,5)],
... attrs=["a", "b", "c"]
... ).result
2.0exec_multi_array_udf
client.array.exec_multi_array_udf(*args, **kwargs)Apply a user-defined function to multiple arrays, synchronously.
All arguments are exactly as in :func:exec_multi_array_udf_base.
exec_multi_array_udf_async
client.array.exec_multi_array_udf_async(*args, **kwargs)Apply a user-defined function to multiple arrays, asynchronously.
All arguments are exactly as in :func:exec_multi_array_udf_base.
exec_multi_array_udf_base
client.array.exec_multi_array_udf_base(
func,
arrays,
*,
teamspace=None,
image_name='default',
http_compressor='deflate',
include_source_lines=True,
task_name=None,
result_format=models.ResultFormat.NATIVE,
store_results=False,
stored_param_uuids=(),
resource_class=None,
_download_results=True,
_server_graph_uuid=None,
_client_node_uuid=None,
**kwargs,
)Apply a user-defined function to multiple arrays.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| func | callable or Asset - like | The function to run. This can be either a callable function, or a registered function asset identified by path, object, or “tiledb” URI. | required |
| arrays | list | The list of arrays to run the function on, as an already-built ArrayList object, or as a list of dicts with “uri”, “ranges”, and “attrs” members. All arrays must be in the same teamspace, which may be different from the UDF teamspace. |
required |
| teamspace | TeamspaceLike | The teamspace of the UDF. If the func parameter specifies a teamspace, this parameter may be omitted. |
None |
| image_name | str | UDF image name to use, useful for testing beta features. | 'default' |
| http_compressor | str | Set http compressor for results. | 'deflate' |
| task_name | str | Name to assign the task for logging and audit purposes. | None |
| result_format | ResultFormat | Result serialization format. | models.ResultFormat.NATIVE |
| store_results | bool | True to temporarily store results on the server side for later retrieval (in addition to downloading them). | False |
| _server_graph_uuid | str | If this function is being executed within a DAG, the server-generated ID of the graph’s log. Otherwise, None. | None |
| _client_node_uuid | str | If this function is being executed within a DAG, the ID of this function’s node within the graph. Otherwise, None. | None |
| resource_class | str | The name of the resource class to use. Resource classes define maximum limits for cpu and memory usage. | None |
| kwargs | dict | named arguments to pass to function. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| results.RemoteResult | A future containing the results of the UDF. |
Examples
>>> import numpy as np
>>> def median(numpy_ordered_dictionary):
... return np.median(
... numpy_ordered_dictionary[0]["a"]) + np.median(numpy_ordered_dictionary[1]["a"]
... )
...
>>> exec_multi_array_udf_base(
... median, [
... {"uri": "folder/array1", "ranges": [(1, 4), (1, 4)], "attrs": ["a"]},
... {"uri": "folder/array2", "ranges": [(1, 4), (1, 4)], "attrs": ["a"]},
... ],
... teamspace="teamspace"
... ).get()parse_ranges
client.array.parse_ranges(ranges)Takes a list of the following objects per dimension:
- scalar index
- (start,end) tuple
- list of either of the above types
:param ranges: list of (scalar, tuple, list) :param builder: function taking arguments (dim_idx, start, end) :return: