taskgraphs.builder
cloud.taskgraphs.builder
The code to build task graphs for later registration and execution.
Attributes
Name | Description |
---|---|
ValOrNode | Type indicating that you can pass either a direct value or an input node. |
ValOrNodeSeq | Either a Node that yields a sequence or a sequence that may contain nodes. |
Classes
Name | Description |
---|---|
Node | The root type of a Node when building a task graph. |
TaskGraphBuilder | The thing you use to build a task graph. |
Node
*, fallback_name=None) cloud.taskgraphs.builder.Node(owner, name, deps,
The root type of a Node when building a task graph.
The basic building block of a task graph. Nodes represent the data and execution steps within a TileDB task graph.
builder.Node
s themselves are inert; they only represent the steps that will be taken by an Executor implementation to run the task graph. They should be treated as opaque and immutable; the Executor’s node objects are the ones that can be interacted with to get status and results.
Attributes
Name | Description |
---|---|
display_name | A friendly name for the Node. |
id | A unique ID for this node. |
name | The name of the node. If absent, the node is unnamed. |
owner | The Builder this node comes from. |
Methods
Name | Description |
---|---|
to_registration_json | Converts this node to the form used when registering the graph. |
to_registration_json
cloud.taskgraphs.builder.Node.to_registration_json(existing_names)
Converts this node to the form used when registering the graph.
This is the form of the Node that will be used to represent it in the RegisteredTaskGraph
object, i.e. a RegisteredTaskGraphNode
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
existing_names | Set[str] | The set of names that have already been used, so that we don’t generate a duplicate node name. | required |
TaskGraphBuilder
=None) cloud.taskgraphs.builder.TaskGraphBuilder(name
The thing you use to build a task graph.
This class only builds task graphs. The graphs it builds are static and only represent the steps to run (the recipe). The actual execution will be later performed by the executor.
Attributes
Name | Description |
---|---|
name | A name for this graph. |
Methods
Name | Description |
---|---|
add_dep | Manually requires that the parent must happen before child . |
array_read | Creates a Node that will read data from a TileDB array. |
input | Creates a Node that can be used as an input to the graph. |
sql | Creates a Node that executes an SQL query. |
udf | Creates a Node which executes a UDF. |
add_dep
cloud.taskgraphs.builder.TaskGraphBuilder.add_dep(parent, child)
Manually requires that the parent
must happen before child
.
This should rarely be necessary; including a parent node within the parameter list of a child node automatically adds a dependency.
array_read
cloud.taskgraphs.builder.TaskGraphBuilder.array_read(
uri,*,
=None,
raw_ranges=None,
buffers=None,
layout=None,
name )
Creates a Node that will read data from a TileDB array.
This Node is not executed immediately; instead, it is used in the same way as the array input to an Array UDF works: when an actual UDF is executed, the array is queried server-side and is passed as a parameter to the user code.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri | ValOrNode[str] | The URI to query against. This must be a tiledb:// URI. May be provided either as the URI itself, or as the output of an upstream node. |
required |
raw_ranges | Optional[ValOrNodeSeq[Any]] | The ranges to query against. This is called “raw” because we accept the format that is passed to the server:: [ [startDim1A, endDim1A, startDim1B, endDim1B, …], [startDim2A, endDim2A, startDim2B, endDim2B, …], ] This may also be provided as either a value or a Node output. | None |
buffers | Optional[ValOrNodeSeq[str]] | Optionally, the buffers to query against. May be either a raw value or the Node output. | None |
name | Optional[str] | An optional name for this Node. | None |
input
input(name, default_value=_NOTHING) cloud.taskgraphs.builder.TaskGraphBuilder.
Creates a Node that can be used as an input to the graph.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of this input. Required, since it is used when executing to match the input to the Node. | required |
default_value | _T | An optional default value to use when executing. If not provided, the caller is required to set this input when running the task graph. | _NOTHING |
sql
cloud.taskgraphs.builder.TaskGraphBuilder.sql(
query,=(),
init_commands=(),
parameters*,
='arrow',
result_format=None,
resource_class=None,
download_results=None,
namespace=None,
name )
Creates a Node that executes an SQL query.
Parameters
Name | Type | Description | Default |
---|---|---|---|
query | str | The query to execute. This must be a string, and cannot be the output of a previous node. | required |
init_commands | Iterable[str] | A list of SQL commands to execute in the session before running query . |
() |
parameters | ValOrNodeSeq | A sequence of objects to provide as parameters for the ? placeholders in the query . These may be provided either as values or as the output of earlier Nodes. |
() |
result_format | str | The format to provide results in. Either json or arrow . |
'arrow' |
resource_class | Optional[str] | If specified, the container resource class that this UDF will be executed in. | None |
download_results | Optional[bool] | If True , download results eagerly (i.e., immediately when the function returns). If False , download results lazily (i.e., only when you call .result() on an execution). If unset (the default), automatically choose whether to download results: eagerly if it’s a terminal node, or if it has a local dependent; lazily if it’s an internal node. |
None |
udf
cloud.taskgraphs.builder.TaskGraphBuilder.udf(
func,=types.Arguments(),
args*,
='tiledb_json',
result_format=True,
include_source=None,
image_name=None,
timeout=None,
resource_class=None,
namespace=None,
name=False,
local=None,
download_results )
Creates a Node which executes a UDF.
Parameters
Name | Type | Description | Default |
---|---|---|---|
func | functions.Funcable[_T] | The function to call; either a Python callable or a registered UDF name. | required |
args | types.Arguments | The arguments to pass to this function. These may contain values or Nodes. | types.Arguments() |
result_format | Optional[str] | The format to return results in. | 'tiledb_json' |
include_source | bool | True (the default) to include the function source in the request. This is useful for debugging and logging, but does not have any impact on the UDF’s execution. False to omit source. | True |
image_name | Optional[str] | If specified, will execute the UDF within the specified image rather than the default image for its language. | None |
timeout | Union[datetime.timedelta, int, None] | If specified, the length of time after which the UDF will be terminated on the server side. If specified as a number, a number of seconds. If zero or unset, the UDF will run until the server’s configured maximum. Unlike the timeout parameter to Future-like objects, this sets a limit on actual execution time, rather than just a limit on how long to wait. |
None |
resource_class | Optional[str] | If specified, the container resource class that this UDF will be executed in. | None |
namespace | Optional[str] | If specified, the non-default namespace that the UDF will be executed under. This will also be the namespace used for reading any array nodes used in this UDF’s input. | None |
local | bool | If True, will attempt to run the UDF on the client machine. If this is not possible, the UDF will fail. | False |
download_results | Optional[bool] | If True , download results eagerly (i.e., immediately when the function returns). If False , download results lazily (i.e., only when you call .result() on an execution). If unset (the default), automatically choose whether to download results: eagerly if it’s a terminal node, or if it has a local dependent; lazily if it’s an internal node. |
None |