import netCDF4
import numpy as np
import tiledb
import tiledb.cf
import matplotlib.pyplot as plt
Converting a simple NetCDF file to a TileDB array
About this Example
What it Shows
The purpose of this example is to show the basics of converting a NetCDF file to TileDB arrays.
This includes:
- Options for auto-generating a converter from a NetCDF file.
- Changing the TileDB schema settings before conversion.
- Creating the TileDB group and copying data from the NetCDF file to the TileDB arrays.
Example dataset
This example shows convertering a small NetCDF file with 2 dimensions and 4 variables:
- Dimensions:
- x: size=100
- y: size=100
- Variables:
- x(x)
- description: evenly spaced values from -5 to 5
- data type: 64-bit floating point
- y(y)
- description: evenly spaced values from -5 to 5
- data type: 64-bit floating point
- A1(x, y)
- description: x + y
- data type: 64-bit floating point
- A1(x, y)
- description: sin((x/2)^2 + y^2
- data type: 64-bit floating point
- x(x)
Set-up Requirements
This example requires the following python packages are installed: netCDF4, numpy, tiledb, tiledb-cf, and matplotlib
# Set names for the output generated by the example.
= "output/netcdf-to-tiledb-basics"
output_dir = f"{output_dir}/simple1.nc"
netcdf_file = f"{output_dir}/simple_netcdf_to_group_1"
group_uri = f"{output_dir}/simple_netcdf_to_array_1" array_uri
# Reset output folder
import os
import shutil
=True)
shutil.rmtree(output_dir, ignore_errors os.mkdir(output_dir)
Create an example NetCDF file
If the NetCDF file does not exist, we create a small NetCDF file for this example.
= np.linspace(-5.0, 5.0, 100)
x_data = np.linspace(-5.0, 5.0, 100)
y_data = np.meshgrid(x_data, y_data, sparse=True)
xv, yv with netCDF4.Dataset(netcdf_file, mode="w") as dataset:
"title": "Simple dataset for examples"})
dataset.setncatts({"x", 100)
dataset.createDimension("y", 100)
dataset.createDimension(= dataset.createVariable("A1", np.float64, ("x", "y"))
A1 "full_name", "Example matrix A1")
A1.setncattr("description", "x + y")
A1.setncattr(= xv + yv
A1[:, :] = dataset.createVariable("A2", np.float64, ("x", "y"))
A2 = np.sin((xv / 2.0) ** 2 + yv**2)
A2[:, :] "full_name", "Example matrix A2")
A2.setncattr("description", "sin((x/2)^2 + y^2")
A2.setncattr(= dataset.createVariable("x", np.float64, ("x",))
x1 = x_data
x1[:] = dataset.createVariable("y", np.float64, ("y",))
y = y_data
y[:] print(f"Created example NetCDF file `{netcdf_file}`.")
Convert NetCDF file to a TileDB Group
In this section we convert a NetCDF file to TileDB in a way that:
- maps NetCDF dimensions to TileDB dimensions,
- maps NetCDF variables to TileDB attributes.
The functions NetCDF4ConverterEngine.from_file
and NetCDF4ConverterEngine.from_group
auto-generate a NetCDF4ConverterEngine
for an exising NetCDF file. The properties in the NetCDF4ConverterEngine
can be modified after the converter is generated.
Parameters:
Set the location of the NetCDF group to be converted.
- In
from_file
:input_file
: The input NetCDF file to generate the converter engine from.group_path
: The path to the NetCDF group to copy data from. Use'/'
for the root group.
- In
from_group
:input_netcdf_group
: The NetCDF group to generate the converter engine from. (Must be anetCDF4.Dataset
ornetCDF4.Group
.)
- In
Set the array grouping. A NetCDF variable maps to TileDB attributes. The
collect_attrs
parameters determines if each NetCDF variable is stored in a separate array, or if all NetCDF variables with the same underlying dimensions are stored in the same TileDB array. Scalar variables are always grouped together.collect_attrs
: IfTrue
, store all attributes with the same dimensions in the same array. Otherwise, store each attribute in a separate array.
Set default properties for TileDB dimension.
unlimited_dim_size
: The default size of the domain for TileDB dimensions created from unlimited NetCDF dimensions. IfNone
, the current size of the NetCDF dimension will be used.dim_dtype
: The default numpy dtype to use when converting a NetCDF dimension to a TileDB dimension.
Set tile sizes for TileDB dimensions. Multiple arrays in the TileDB group may have the same name, domain, and type, but different tiles and compression filters. The
tiles_by_var
andtiles_by_dims
parameters allow a way of setting the tiles for the dimensions in different arrays.tiles_by_var
: A map from the name of a single NetCDF variable to the tiles of the dimensions of the TileDB array that contains the data from that variable.tiles_by_dims
: A map from the name of NetCDF dimensions defining a variable to the tiles of the dimensions of any TileDB array that contains data from a variable defined on those dimensions.
Convert 1D variables with the same name and dimension to a TileDB dimension instead of a TileDB attribute. This is an advanced usage, and will move the data away from a NetCDF-like data model.
coords_to_dims
: IfTrue
, convert the NetCDF coordinate variable into a TileDB dimension for sparse arrays. Otherwise, convert the coordinate dimension into a TileDB dimension and the coordinate variable into a TileDB attribute.
Unpack NetCDF data that uses
add_offset
andscale_factor
.unpack_vars
: IfTrue
, for any variable that has NetCDF attributesadd_offset
orscale_factor
apply linear transformationx -> add_offset * x + scale_factor
to data before conversion. The data type will be set to the data type of the scaled data and the attributesadd_offset
andscale_factor
will be dropped.
Set filters for TileDB dimensions, attributes, and offsets.
offsets_filters
: Default TileDB filters for all offsets for variable length TileDB attributes and TileDB dimensions.attrs_filters
: Default TileDB filters for all attributes.
# Auto-generate NetCDF to TileDB conversion from a NetCDF file.
= tiledb.cf.NetCDF4ConverterEngine.from_file(
converter
netcdf_file,=np.uint32,
dim_dtype=[tiledb.ZstdFilter(level=7)],
attrs_filters
) converter
# Update properties manually by modifying the array creators
# 1. Update properties for x
= converter.get_array_creator_by_attr("x.data")
x_array = "x"
x_array.name = (20,)
x_array.domain_creator.tiles # 2. Update properties for y
= converter.get_array_creator_by_attr("y.data")
y_array = "y"
y_array.name = (20,)
y_array.domain_creator.tiles # 3. Update properties for array of matrices
= converter.get_array_creator_by_attr("A1")
data_array = "data"
data_array.name = (20, 20)
data_array.domain_creator.tiles converter
Run the conversions to create two dense TileDB arrays:
converter.convert_to_group(group_uri)
Examine the data in the arrays
Open the attributes from the generated TileDB group:
with tiledb.Group(group_uri) as group:
with (
="x.data") as x_array,
tiledb.cf.open_group_array(group, attr="y.data") as y_array,
tiledb.cf.open_group_array(group, attr="data") as data_array,
tiledb.cf.open_group_array(group, array
):= x_array[:]
x = y_array[:]
y = data_array[...]
data = data["A1"]
A1 = data["A2"]
A2 = tiledb.cf.AttrMetadata(data_array.meta, "A1")["description"]
a1_description = tiledb.cf.AttrMetadata(data_array.meta, "A2")["description"] a2_description
= plt.subplots(nrows=1, ncols=2)
fig, axes 0].contourf(x, y, A1)
axes[0].set_title(a1_description)
axes[1].contourf(x, y, A2)
axes[1].set_title(a2_description); axes[
Convert to a Single Sparse TileDB Array
In this section we convert a NetCDF file to TileDB in a way taht:
- maps NetCDF ‘coordinate’ variables to TileDB dimensions,
- maps NetCDF standard variables to TileDB attributes.
= tiledb.cf.NetCDF4ConverterEngine.from_file(
converter2 =True, attrs_filters=[tiledb.ZstdFilter(level=7)]
netcdf_file, coords_to_dims
) converter2
# Update properties for the array
"x").domain = (-5.0, 5.0)
converter2.get_shared_dim("y").domain = (-5.0, 5.0)
converter2.get_shared_dim(= converter2.get_array_creator("array0")
data_array = (1.0, 1.0)
data_array.domain_creator.tiles = 400
data_array.capacity converter2
converter2.convert_to_array(array_uri)
tiledb.ArraySchema.load(array_uri)