taskgraphs.builder

client.taskgraphs.builder

The code to build task graphs for later registration and execution.

Attributes

Name	Description
ValOrNode	Type indicating that you can pass either a direct value or an input node.
ValOrNodeSeq	Either a Node that yields a sequence or a sequence that may contain nodes.

Classes

Name	Description
Node	The root type of a Node when building a task graph.
TaskGraphBuilder	The thing you use to build a task graph.

Node

client.taskgraphs.builder.Node(owner, name, deps, *, fallback_name=None)

The root type of a Node when building a task graph.

The basic building block of a task graph. Nodes represent the data and execution steps within a TileDB task graph.

builder.Nodes themselves are inert; they only represent the steps that will be taken by an Executor implementation to run the task graph. They should be treated as opaque and immutable; the Executor’s node objects are the ones that can be interacted with to get status and results.

Attributes

Name	Description
display_name	A friendly name for the Node.
id	A unique ID for this node.
name	The name of the node. If absent, the node is unnamed.
owner	The Builder this node comes from.

Methods

Name	Description
to_registration_json	Converts this node to the form used when registering the graph.

to_registration_json

client.taskgraphs.builder.Node.to_registration_json(existing_names)

Converts this node to the form used when registering the graph.

This is the form of the Node that will be used to represent it in the RegisteredTaskGraph object, i.e. a RegisteredTaskGraphNode.

:param existing_names: The set of names that have already been used, so that we don’t generate a duplicate node name.

TaskGraphBuilder

client.taskgraphs.builder.TaskGraphBuilder(name=None)

The thing you use to build a task graph.

This class only builds task graphs. The graphs it builds are static and only represent the steps to run (the recipe). The actual execution will be later performed by the executor.

Attributes

Name	Description
name	A name for this graph.

Methods

Name	Description
add_dep	Manually requires that the `parent` must happen before `child`.
array_read	Creates a Node that will read data from a TileDB array.
input	Creates a Node that can be used as an input to the graph.
sql	Creates a Node that executes an SQL query.
udf	Creates a Node which executes a UDF.

add_dep

client.taskgraphs.builder.TaskGraphBuilder.add_dep(parent, child)

Manually requires that the parent must happen before child.

This should rarely be necessary; including a parent node within the parameter list of a child node automatically adds a dependency.

array_read

client.taskgraphs.builder.TaskGraphBuilder.array_read(
    uri,
    *,
    raw_ranges=None,
    buffers=None,
    layout=None,
    name=None,
)

Creates a Node that will read data from a TileDB array.

This Node is not executed immediately; instead, it is used in the same way as the array input to an Array UDF works: when an actual UDF is executed, the array is queried server-side and is passed as a parameter to the user code.

:param uri: The URI to query against. This must be a tiledb:// URI. May be provided either as the URI itself, or as the output of an upstream node.

:param raw_ranges: The ranges to query against. This is called “raw” because we accept the format that is passed to the server::

    [
        [startDim1A, endDim1A, startDim1B, endDim1B, ...],
        [startDim2A, endDim2A, startDim2B, endDim2B, ...],
    ]

This may also be provided as either a value or a Node output.

:param buffers: Optionally, the buffers to query against. May be either a raw value or the Node output.

:param name: An optional name for this Node.

input

client.taskgraphs.builder.TaskGraphBuilder.input(name, default_value=_NOTHING)

Creates a Node that can be used as an input to the graph.

:param name: The name of this input. Required, since it is used when executing to match the input to the Node.

:param default_value: An optional default value to use when executing. If not provided, the caller is required to set this input when running the task graph.

sql

client.taskgraphs.builder.TaskGraphBuilder.sql(
    query,
    init_commands=(),
    parameters=(),
    *,
    result_format='arrow',
    resource_class=None,
    download_results=None,
    namespace=None,
    name=None,
)

Creates a Node that executes an SQL query.

:param query: The query to execute. This must be a string, and cannot be the output of a previous node. :param init_commands: A list of SQL commands to execute in the session before running query. :param parameters: A sequence of objects to provide as parameters for the ? placeholders in the query. These may be provided either as values or as the output of earlier Nodes. :param result_format: The format to provide results in. Either json or arrow. :param resource_class: If specified, the container resource class that this UDF will be executed in. :param download_results: If True, download results eagerly (i.e., immediately when the function returns). If False, download results lazily (i.e., only when you call .result() on an execution). If unset (the default), automatically choose whether to download results: eagerly if it’s a terminal node, or if it has a local dependent; lazily if it’s an internal node.

udf

client.taskgraphs.builder.TaskGraphBuilder.udf(
    func,
    args=types.Arguments(),
    *,
    result_format='tiledb_json',
    include_source=True,
    image_name=None,
    timeout=None,
    resource_class=None,
    namespace=None,
    name=None,
    local=False,
    download_results=None,
)

Creates a Node which executes a UDF.

:param func: The function to call; either a Python callable or a registered UDF name. :param args: The arguments to pass to this function. These may contain values or Nodes. :param result_format: The format to return results in. :param include_source: True (the default) to include the function source in the request. This is useful for debugging and logging, but does not have any impact on the UDF’s execution. False to omit source. :param image_name: If specified, will execute the UDF within the specified image rather than the default image for its language. :param timeout: If specified, the length of time after which the UDF will be terminated on the server side. If specified as a number, a number of seconds. If zero or unset, the UDF will run until the server’s configured maximum. Unlike the timeout parameter to Future-like objects, this sets a limit on actual execution time, rather than just a limit on how long to wait. :param resource_class: If specified, the container resource class that this UDF will be executed in. :param namespace: If specified, the non-default namespace that the UDF will be executed under. This will also be the namespace used for reading any array nodes used in this UDF’s input. :param local: If True, will attempt to run the UDF on the client machine. If this is not possible, the UDF will fail. :param download_results: If True, download results eagerly (i.e., immediately when the function returns). If False, download results lazily (i.e., only when you call .result() on an execution). If unset (the default), automatically choose whether to download results: eagerly if it’s a terminal node, or if it has a local dependent; lazily if it’s an internal node.