array

client.array

Register, search, and manage arrays with TileDB.

Classes

Name Description
ArrayList An incrementally built list of UDFArrayDetails.

ArrayList

client.array.ArrayList()

An incrementally built list of UDFArrayDetails.

For use in multi array UDFs.

Methods

Name Description
add Add an array, with indexes and layout options.
get Returns the list of UDFArrayDetails.
add
client.array.ArrayList.add(uri, ranges=None, buffers=None, layout=None)

Add an array, with indexes and layout options.

Parameters
Name Type Description Default
uri str The fully qualified TileDB URI of the array. required
ranges list of pairs of int A list of tuples, one tuple for each dimension of the array. The tuples specify the start and stop range in each dimension, as with a slice. None
buffers str or list of str The name of an array attribute or list of names. None
layout str The layout for the array. None
Examples
>>> array_list = ArrayList()
>>> array_list.add(
...     "tiledb://workspace/teamspace/array",
...     ranges=[(0,5), (0,5)],
...     buffers=["a", "b", "c"],
)
get
client.array.ArrayList.get()

Returns the list of UDFArrayDetails.

Functions

Name Description
apply Apply a user defined function to an array, synchronously.
apply_async Apply a user-defined function to an array, asynchronously.
apply_base Apply a user-defined function to an array.
exec_multi_array_udf Apply a user-defined function to multiple arrays, synchronously.
exec_multi_array_udf_async Apply a user-defined function to multiple arrays, asynchronously.
exec_multi_array_udf_base Apply a user-defined function to multiple arrays.
parse_ranges Takes a list of the following objects per dimension:

apply

client.array.apply(*args, **kwargs)

Apply a user defined function to an array, synchronously.

All arguments are exactly as in :func:apply_base, but this returns the data only.

Example:

import tiledb, tiledb.client, numpy def median(df): … return numpy.median(df[“a”]) # Open the array then run the UDF tiledb.client.array.apply(“tiledb://TileDB-Inc/quickstart_dense”, median, [(0,5), (0,5)], attrs=[“a”, “b”, “c”]) 2.0

apply_async

client.array.apply_async(*args, **kwargs)

Apply a user-defined function to an array, asynchronously.

All arguments are exactly as in :func:apply_base, but this returns the data as a future-like AsyncResponse.

apply_base

client.array.apply_base(
    array,
    func,
    *,
    teamspace=None,
    ranges=(),
    attrs=(),
    layout=None,
    image_name='default',
    http_compressor='deflate',
    include_source_lines=True,
    task_name=None,
    result_format=models.ResultFormat.NATIVE,
    store_results=False,
    stored_param_uuids=(),
    timeout=None,
    resource_class=None,
    _download_results=True,
    _server_graph_uuid=None,
    _client_node_uuid=None,
    **kwargs,
)

Apply a user-defined function to an array.

Parameters

Name Type Description Default
array str or object The array asset to run the function on, identified by path, object, or “tiledb” URI. The array teamspace may be different from the UDF teamspace. required
func callable or Asset - like The function to run. This can be either a callable function, or a registered function asset identified by path, object, or “tiledb” URI. required
teamspace TeamspaceLike The teamspace of the UDF. If the func parameter specifies a teamspace, this parameter may be omitted. None
ranges list Ranges to issue query on. ()
attrs list List of attributes or dimensions to fetch in query. ()
layout str Tiledb query layout. None
image_name str UDF image name to use, useful for testing beta features. 'default'
http_compressor str Set http compressor for results. 'deflate'
include_source_lines bool True to send the source code of your UDF to the server with your request. (This means it can be shown to you in stack traces if an error occurs.) False to send only compiled Python bytecode. True
task_name str Name to assign the task for logging and audit purposes. None
result_format ResultFormat Result serialization format. models.ResultFormat.NATIVE
store_results bool True to temporarily store results on the server side for later retrieval (in addition to downloading them). False
stored_param_uuids list A list of UUIDs. ()
timeout int Timeout for UDF in seconds. None
resource_class str The name of the resource class to use. Resource classes define maximum limits for cpu and memory usage. None
_download_results bool True to download and parse results eagerly. False to not download results by default and only do so lazily (e.g. for an intermediate node in a graph). True
_server_graph_uuid str If this function is being executed within a DAG, the server-generated ID of the graph’s log. Otherwise, None. None
_client_node_uuid str If this function is being executed within a DAG, the ID of this function’s node within the graph. Otherwise, None. None
kwargs dict named arguments to pass to function. {}

Returns

Name Type Description
results.RemoteResult A future containing the results of the UDF.

Examples

>>> import numpy
>>> def median(df):
...   return numpy.median(df["a"])
...
>>> tiledb.cloud.array.apply_base(
...     "folder/array",
...     median,
...     teamspace="teamspace",
...     ranges=[(0,5), (0,5)],
...     attrs=["a", "b", "c"]
... ).result
2.0

exec_multi_array_udf

client.array.exec_multi_array_udf(*args, **kwargs)

Apply a user-defined function to multiple arrays, synchronously.

All arguments are exactly as in :func:exec_multi_array_udf_base.

exec_multi_array_udf_async

client.array.exec_multi_array_udf_async(*args, **kwargs)

Apply a user-defined function to multiple arrays, asynchronously.

All arguments are exactly as in :func:exec_multi_array_udf_base.

exec_multi_array_udf_base

client.array.exec_multi_array_udf_base(
    func,
    arrays,
    *,
    teamspace=None,
    image_name='default',
    http_compressor='deflate',
    include_source_lines=True,
    task_name=None,
    result_format=models.ResultFormat.NATIVE,
    store_results=False,
    stored_param_uuids=(),
    resource_class=None,
    _download_results=True,
    _server_graph_uuid=None,
    _client_node_uuid=None,
    **kwargs,
)

Apply a user-defined function to multiple arrays.

Parameters

Name Type Description Default
func callable or Asset - like The function to run. This can be either a callable function, or a registered function asset identified by path, object, or “tiledb” URI. required
arrays list The list of arrays to run the function on, as an already-built ArrayList object, or as a list of dicts with “uri”, “ranges”, and “attrs” members. All arrays must be in the same teamspace, which may be different from the UDF teamspace. required
teamspace TeamspaceLike The teamspace of the UDF. If the func parameter specifies a teamspace, this parameter may be omitted. None
image_name str UDF image name to use, useful for testing beta features. 'default'
http_compressor str Set http compressor for results. 'deflate'
task_name str Name to assign the task for logging and audit purposes. None
result_format ResultFormat Result serialization format. models.ResultFormat.NATIVE
store_results bool True to temporarily store results on the server side for later retrieval (in addition to downloading them). False
_server_graph_uuid str If this function is being executed within a DAG, the server-generated ID of the graph’s log. Otherwise, None. None
_client_node_uuid str If this function is being executed within a DAG, the ID of this function’s node within the graph. Otherwise, None. None
resource_class str The name of the resource class to use. Resource classes define maximum limits for cpu and memory usage. None
kwargs dict named arguments to pass to function. {}

Returns

Name Type Description
results.RemoteResult A future containing the results of the UDF.

Examples

>>> import numpy as np
>>> def median(numpy_ordered_dictionary):
...     return np.median(
...         numpy_ordered_dictionary[0]["a"]) + np.median(numpy_ordered_dictionary[1]["a"]
...     )
...
>>> exec_multi_array_udf_base(
...     median, [
...         {"uri": "folder/array1", "ranges": [(1, 4), (1, 4)], "attrs": ["a"]},
...         {"uri": "folder/array2", "ranges": [(1, 4), (1, 4)], "attrs": ["a"]},
...     ],
...     teamspace="teamspace"
... ).get()

parse_ranges

client.array.parse_ranges(ranges)

Takes a list of the following objects per dimension:

  • scalar index
  • (start,end) tuple
  • list of either of the above types

:param ranges: list of (scalar, tuple, list) :param builder: function taking arguments (dim_idx, start, end) :return: