taskgraphs.builder

cloud.taskgraphs.builder

The code to build task graphs for later registration and execution.

Attributes

Name Description
ValOrNode Type indicating that you can pass either a direct value or an input node.
ValOrNodeSeq Either a Node that yields a sequence or a sequence that may contain nodes.

Classes

Name Description
Node The root type of a Node when building a task graph.
TaskGraphBuilder The thing you use to build a task graph.

Node

cloud.taskgraphs.builder.Node(owner, name, deps, *, fallback_name=None)

The root type of a Node when building a task graph.

The basic building block of a task graph. Nodes represent the data and execution steps within a TileDB task graph.

builder.Nodes themselves are inert; they only represent the steps that will be taken by an Executor implementation to run the task graph. They should be treated as opaque and immutable; the Executor’s node objects are the ones that can be interacted with to get status and results.

Attributes

Name Description
display_name A friendly name for the Node.
id A unique ID for this node.
name The name of the node. If absent, the node is unnamed.
owner The Builder this node comes from.

Methods

Name Description
to_registration_json Converts this node to the form used when registering the graph.
to_registration_json
cloud.taskgraphs.builder.Node.to_registration_json(existing_names)

Converts this node to the form used when registering the graph.

This is the form of the Node that will be used to represent it in the RegisteredTaskGraph object, i.e. a RegisteredTaskGraphNode.

Parameters
Name Type Description Default
existing_names Set[str] The set of names that have already been used, so that we don’t generate a duplicate node name. required

TaskGraphBuilder

cloud.taskgraphs.builder.TaskGraphBuilder(name=None)

The thing you use to build a task graph.

This class only builds task graphs. The graphs it builds are static and only represent the steps to run (the recipe). The actual execution will be later performed by the executor.

Attributes

Name Description
name A name for this graph.

Methods

Name Description
add_dep Manually requires that the parent must happen before child.
array_read Creates a Node that will read data from a TileDB array.
input Creates a Node that can be used as an input to the graph.
sql Creates a Node that executes an SQL query.
udf Creates a Node which executes a UDF.
add_dep
cloud.taskgraphs.builder.TaskGraphBuilder.add_dep(parent, child)

Manually requires that the parent must happen before child.

This should rarely be necessary; including a parent node within the parameter list of a child node automatically adds a dependency.

array_read
cloud.taskgraphs.builder.TaskGraphBuilder.array_read(
    uri,
    *,
    raw_ranges=None,
    buffers=None,
    layout=None,
    name=None,
)

Creates a Node that will read data from a TileDB array.

This Node is not executed immediately; instead, it is used in the same way as the array input to an Array UDF works: when an actual UDF is executed, the array is queried server-side and is passed as a parameter to the user code.

Parameters
Name Type Description Default
uri ValOrNode[str] The URI to query against. This must be a tiledb:// URI. May be provided either as the URI itself, or as the output of an upstream node. required
raw_ranges Optional[ValOrNodeSeq[Any]] The ranges to query against. This is called “raw” because we accept the format that is passed to the server:: [ [startDim1A, endDim1A, startDim1B, endDim1B, …], [startDim2A, endDim2A, startDim2B, endDim2B, …], ] This may also be provided as either a value or a Node output. None
buffers Optional[ValOrNodeSeq[str]] Optionally, the buffers to query against. May be either a raw value or the Node output. None
name Optional[str] An optional name for this Node. None
input
cloud.taskgraphs.builder.TaskGraphBuilder.input(name, default_value=_NOTHING)

Creates a Node that can be used as an input to the graph.

Parameters
Name Type Description Default
name str The name of this input. Required, since it is used when executing to match the input to the Node. required
default_value _T An optional default value to use when executing. If not provided, the caller is required to set this input when running the task graph. _NOTHING
sql
cloud.taskgraphs.builder.TaskGraphBuilder.sql(
    query,
    init_commands=(),
    parameters=(),
    *,
    result_format='arrow',
    resource_class=None,
    download_results=None,
    namespace=None,
    name=None,
)

Creates a Node that executes an SQL query.

Parameters
Name Type Description Default
query str The query to execute. This must be a string, and cannot be the output of a previous node. required
init_commands Iterable[str] A list of SQL commands to execute in the session before running query. ()
parameters ValOrNodeSeq A sequence of objects to provide as parameters for the ? placeholders in the query. These may be provided either as values or as the output of earlier Nodes. ()
result_format str The format to provide results in. Either json or arrow. 'arrow'
resource_class Optional[str] If specified, the container resource class that this UDF will be executed in. None
download_results Optional[bool] If True, download results eagerly (i.e., immediately when the function returns). If False, download results lazily (i.e., only when you call .result() on an execution). If unset (the default), automatically choose whether to download results: eagerly if it’s a terminal node, or if it has a local dependent; lazily if it’s an internal node. None
udf
cloud.taskgraphs.builder.TaskGraphBuilder.udf(
    func,
    args=types.Arguments(),
    *,
    result_format='tiledb_json',
    include_source=True,
    image_name=None,
    timeout=None,
    resource_class=None,
    namespace=None,
    name=None,
    local=False,
    download_results=None,
)

Creates a Node which executes a UDF.

Parameters
Name Type Description Default
func functions.Funcable[_T] The function to call; either a Python callable or a registered UDF name. required
args types.Arguments The arguments to pass to this function. These may contain values or Nodes. types.Arguments()
result_format Optional[str] The format to return results in. 'tiledb_json'
include_source bool True (the default) to include the function source in the request. This is useful for debugging and logging, but does not have any impact on the UDF’s execution. False to omit source. True
image_name Optional[str] If specified, will execute the UDF within the specified image rather than the default image for its language. None
timeout Union[datetime.timedelta, int, None] If specified, the length of time after which the UDF will be terminated on the server side. If specified as a number, a number of seconds. If zero or unset, the UDF will run until the server’s configured maximum. Unlike the timeout parameter to Future-like objects, this sets a limit on actual execution time, rather than just a limit on how long to wait. None
resource_class Optional[str] If specified, the container resource class that this UDF will be executed in. None
namespace Optional[str] If specified, the non-default namespace that the UDF will be executed under. This will also be the namespace used for reading any array nodes used in this UDF’s input. None
local bool If True, will attempt to run the UDF on the client machine. If this is not possible, the UDF will fail. False
download_results Optional[bool] If True, download results eagerly (i.e., immediately when the function returns). If False, download results lazily (i.e., only when you call .result() on an execution). If unset (the default), automatically choose whether to download results: eagerly if it’s a terminal node, or if it has a local dependent; lazily if it’s an internal node. None