TileDB Backend for xarray (Partially Filled Arrays)

About this Example

What it shows

This example shows some of the basic usage for opening a TileDB array in xarray using the TileDB backend when the TileDB array is not fully filled.

There are two possible values the TileDB-xarray backend can use for a dimension size:

  1. (default) The size of the current non-empty domain when the dataset is first loaded.
  2. The size of the full domain of the dimension.

The default behavior of TileDB is to take the maximum value of all non-empty domains. If you have dimensions with mis-matched domain, the dimension will never return a size larger than the smallest domain.

Set-up Requirements

This example requires tiledb-cf to be installed and uses the tiledb, xarray, and numpy libraries.

import tiledb
import xarray as xr
import numpy as np
# Set names for the output generated by the example.
output_dir = "output/tiledb-xarray-partially-filled"
array_uri = f"{output_dir}/example1"
group1_uri = f"{output_dir}/group1"
group2_uri = f"{output_dir}/group2"
# Reset output folder
import os
import shutil

shutil.rmtree(output_dir, ignore_errors=True)
os.mkdir(output_dir)

Example 1: Simple partially-filled 2D array

# Create array and write data.
tiledb.Array.create(
    array_uri,
    tiledb.ArraySchema(
        domain=tiledb.Domain(
            tiledb.Dim("x", domain=(0, 7), dtype=np.uint64),
            tiledb.Dim("y", domain=(0, 7), dtype=np.uint64),
        ),
        attrs=[tiledb.Attr("z", np.float64)],
    ),
)
with tiledb.open(array_uri, mode="w") as array:
    array[0:4, 0:4] = np.reshape(np.arange(16), (4, 4))
# Print non-empty domain and data.
with tiledb.open(array_uri) as array:
    print(f"Non-empty domain: {array.nonempty_domain()}")
    print(f"Data in non-empty domain:\n {array.multi_index[:, :]['z']}")
    print(f"All data: \n {array[:, :]['z']}")
# By default, xarray will only open the non-empty domain
xr.open_dataset(array_uri, engine="tiledb")

Example 2: Fixed dimensions

We can create a group that always reads some or all of the dimensions as full dimensions.

# Set `x` to be a fixed-size dimension.
tiledb.Group.create(group1_uri)
with tiledb.Group(group1_uri, mode="w") as group:
    group.add(uri=array_uri, name="z")
    group.meta["__tiledb_array_fixed_dimensions.z"] = "x"
xr.open_dataset(group1_uri, engine="tiledb")
# Set `y` to be a fixed-size dimension.
tiledb.Group.create(group2_uri)
with tiledb.Group(group2_uri, mode="w") as group:
    group.add(uri=array_uri, name="z")
    group.meta["__tiledb_array_fixed_dimensions.z"] = "x;y"
xr.open_dataset(group2_uri, engine="tiledb")