geospatial.ingestion
client.geospatial.ingestion
Functions
| Name | Description |
|---|---|
| build_file_list_udf | Build a list of sources |
| build_inputs_udf | Groups input URIs into batches. |
| consolidate_meta | Consolidate arrays in the dataset. |
| ingest_datasets | Ingest samples into a dataset. |
| ingest_datasets_dag | Ingests geospatial point clouds, geometries and images into TileDB arrays |
| ingest_geometry_udf | Internal udf that ingests server side batch of geometry files |
| ingest_point_cloud_udf | Internal udf that ingests server side batch of point cloud files |
| ingest_raster_udf | Internal udf that ingests server side batch of raster files |
| load_geometry_metadata | Return geospatial metadata for a sequence of input geometry data files |
| load_pointcloud_metadata | Return geospatial metadata for a sequence of input point cloud data files |
| load_raster_metadata | Return geospatial metadata for a sequence of input raster data files |
| read_uris | Read a list of URIs from a URI. |
| register_dataset_udf | Register the dataset on TileDB Cloud. |
| remove_dataset_type_from_array_meta | Removes dataset_type meta if the ingested result is an array. |
build_file_list_udf
client.geospatial.ingestion.build_file_list_udf(
dataset_type,
config=None,
search_uri=None,
pattern=None,
ignore=None,
dataset_list_uri=None,
max_files=None,
verbose=False,
trace=False,
log_uri=None,
)Build a list of sources :param dataset_type: dataset type, one of pointcloud, raster or geometry :param config: config dictionary, defaults to None :param search_uri: URI to search for geospatial dataset files, defaults to None :param pattern: Unix shell style pattern to match when searching for files, defaults to None :param ignore: Unix shell style pattern to ignore when searching for files, defaults to None :param dataset_list_uri: URI with a list of dataset URIs, defaults to None :param max_files: maximum number of URIs to read/find, defaults to None (no limit) :param verbose: verbose logging, defaults to False :param trace: bool, enabling log tracing, defaults to False :param log_uri: log array URI :return: A sequence of source files grouped into batches
build_inputs_udf
client.geospatial.ingestion.build_inputs_udf(
dataset_type,
sources,
config=None,
compression_filter=None,
tile_size=RASTER_TILE_SIZE,
pixels_per_fragment=PIXELS_PER_FRAGMENT,
chunk_size=POINT_CLOUD_CHUNK_SIZE,
nodata=None,
resampling='bilinear',
res=None,
verbose=False,
trace=False,
log_uri=None,
)Groups input URIs into batches. :param dataset_type: dataset type, one of pointcloud, raster or geometry :param sources: URIs to process :param config: config dictionary, defaults to None :param compression_filter: serialized tiledb filter, defaults to None :param tile_size: for rasters this is the tile (block) size for the merged destination array, defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param chunk_size: for point cloud this is the PDAL chunk size, defaults to 1000000 :param nodata: NODATA value for raster merging :param resampling: string, resampling method, one of None, bilinear, cubic, nearest and average :param res: Tuple[float, float], output resolution in x/y :param verbose: verbose logging, defaults to False :param trace: bool, enabling log tracing, to False :param log_uri: log array URI :return: A dict containing the kwargs needed for the next function call
consolidate_meta
client.geospatial.ingestion.consolidate_meta(
dataset_uri,
*,
config=None,
id='consolidate',
verbose=False,
trace=False,
log_uri=None,
)Consolidate arrays in the dataset.
:param dataset_uri: dataset URI :param config: config dictionary, defaults to None :param id: profiler event id, defaults to “consolidate” :param verbose: verbose logging, defaults to False
ingest_datasets
client.geospatial.ingestion.ingest_datasets(
dataset_uri,
*,
dataset_type,
acn=None,
config=None,
namespace=None,
register_name=None,
search_uri=None,
pattern=None,
ignore=None,
dataset_list_uri=None,
max_files=None,
compression_filter=None,
workers=MAX_WORKERS,
batch_size=BATCH_SIZE,
tile_size=RASTER_TILE_SIZE,
pixels_per_fragment=PIXELS_PER_FRAGMENT,
chunk_size=POINT_CLOUD_CHUNK_SIZE,
nodata=None,
res=None,
stats=False,
verbose=False,
trace=False,
log_uri=None,
)Ingest samples into a dataset.
:param dataset_uri: dataset URI :param dataset_type: dataset type, one of pointcloud, raster or geometry :param acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None :param config: config dictionary, defaults to None :param namespace: TileDB-Cloud namespace, defaults to None :param register_name: name to register the dataset with on TileDB Cloud, defaults to None :param search_uri: URI to search for geospatial dataset files, defaults to None :param pattern: Unix shell style pattern to match when searching for files, defaults to None :param ignore: Unix shell style pattern to ignore when searching for files, defaults to None :param dataset_list_uri: URI with a list of dataset URIs, defaults to None :param max_files: maximum number of URIs to read/find, defaults to None (no limit) :param compression_filter: serialized tiledb filter, defaults to None :param workers: number of workers for dataset ingestion, defaults to MAX_WORKERS :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param tile_size: for rasters this is the tile (block) size for the merged destination array defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param chunk_size: for point cloud this is the PDAL chunk size, defaults to 1000000 :param nodata: NODATA value for rasters :param res: Tuple[float, float], output resolution in x/y :param stats: bool, print TileDB stats to stdout :param verbose: verbose logging, defaults to False :param trace: bool, enable trace for logging, defaults to False :param log_uri: log array URI
ingest_datasets_dag
client.geospatial.ingestion.ingest_datasets_dag(
dataset_uri,
*,
dataset_type,
acn=None,
config=None,
workspace=None,
register_name=None,
search_uri=None,
pattern=None,
ignore=None,
dataset_list_uri=None,
max_files=None,
compression_filter=None,
workers=MAX_WORKERS,
batch_size=BATCH_SIZE,
tile_size=RASTER_TILE_SIZE,
pixels_per_fragment=PIXELS_PER_FRAGMENT,
chunk_size=POINT_CLOUD_CHUNK_SIZE,
nodata=None,
resampling='bilinear',
res=None,
stats=False,
verbose=False,
trace=False,
log_uri=None,
)Ingests geospatial point clouds, geometries and images into TileDB arrays
:param dataset_uri: dataset URI :param dataset_type: dataset type, one of pointcloud, raster or geometry :param acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None :param config: config dictionary, defaults to None :param workspace: TileDB-Cloud workspace, defaults to None :param register_name: name to register the dataset with on TileDB Cloud, defaults to None and the destination array is not registered :param search_uri: URI to search for geospatial dataset files, defaults to None :param pattern: Unix shell style pattern to match when searching for files, defaults to None :param ignore: Unix shell style pattern to ignore when searching for files, defaults to None :param dataset_list_uri: URI with a list of dataset URIs, defaults to None :param max_files: maximum number of URIs to read/find, defaults to None (no limit) :param compression_filter: serialized tiledb filter, defaults to None :param workers: number of workers for dataset ingestion, defaults to MAX_WORKERS :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param tile_size: for rasters this is the tile (block) size for the merged destination array, defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param chunk_size: for point cloud this is the PDAL chunk size, defaults to 1000000 :param nodata: NODATA value for raster merging :param resampling: string, resampling method, one of None, bilinear, cubic, nearest and average :param res: Tuple[float, float], output resolution in x/y :param stats: bool, print TileDB stats to stdout :param verbose: verbose logging, defaults to False :param trace: bool, enabling log tracing, defaults to False :param log_uri: log array URI
ingest_geometry_udf
client.geospatial.ingestion.ingest_geometry_udf(
dataset_uri,
args={},
sources=None,
schema=None,
extents=None,
crs=None,
chunk_size=GEOMETRY_CHUNK_SIZE,
batch_size=BATCH_SIZE,
compressor=None,
append=False,
verbose=False,
stats=False,
config=None,
id='geometry',
trace=False,
log_uri=None,
)Internal udf that ingests server side batch of geometry files into tiledb arrays using Fiona API
:param dataset_uri: str, output TileDB array name :param args: dict, input key value arguments as a dictionary :param sources: Sequence of input geometry file names :param schema: dict, dictionary of schema attributes and geometries :param extents: Extents of the destination geometry array :param crs: str, CRS for the destination dataset :param chunk_size: int, sets tile capacity and the number of geometries written at once :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param compressor: dict, serialized compression filter :param append: bool, whether to append to the array :param verbose: verbose logging, defaults to False :param stats: bool, print TileDB stats to stdout :param config: dict, configuration to pass on tiledb.VFS :param id: str, ID for logging :param trace, bool, enable trace logging :param log_uri: log array URI :return: if not appending then the function returns a tuple of file paths
ingest_point_cloud_udf
client.geospatial.ingestion.ingest_point_cloud_udf(
args={},
dataset_uri,
sources=None,
append=False,
chunk_size=POINT_CLOUD_CHUNK_SIZE,
batch_size=BATCH_SIZE,
verbose=False,
stats=False,
config=None,
id='pointcloud',
trace=False,
log_uri=None,
)Internal udf that ingests server side batch of point cloud files into tiledb arrays using PDAL API. Compression uses the default profile built in to PDAL.
:param args: dict or list, input key value arguments as a dictionary :param dataset_uri: str, output TileDB array name :param sources: Sequence of GeoMetadata objects :param append: bool, whether to append to the array :param chunk_size: PDAL configuration for chunking fragments :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param verbose: verbose logging, defaults to False :param stats: bool, print TileDB stats to stdout :param config: dict, configuration to pass on tiledb.VFS :param id: str, ID for logging :param trace, bool, enable trace logging :param log_uri: log array URI :return: if not appending then a sequence of file paths
ingest_raster_udf
client.geospatial.ingestion.ingest_raster_udf(
args={},
dataset_uri,
sources=None,
extents=None,
band_count=None,
dtype=None,
nodata=None,
pixels_per_fragment=PIXELS_PER_FRAGMENT,
tile_size=RASTER_TILE_SIZE,
resampling=DEFAULT_RASTER_SAMPLING,
append=False,
batch_size=BATCH_SIZE,
stats=False,
verbose=False,
config=None,
compressor=None,
id='raster',
trace=False,
log_uri=None,
)Internal udf that ingests server side batch of raster files into tiledb arrays using Rasterio API
:param args: dict, input key value arguments as a dictionary :param dataset_uri: str, output TileDB array name :param sources: tuple, sequence of GeoBlockMetadata objects containing the destination raster window and the input files that contribute to this window :param extents: Extents of the destination raster :param band_count: int, number of bands in destination array :param dtype: str, dtype of destination array :param nodata: float, NODATA value for destination raster :param tile_size: for rasters this is the tile (block) size for the merged destination array, defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param resampling: string, resampling method, one of None, bilinear, cubic, nearest and average :param append: bool, whether to append to the array :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param stats: bool, print TileDB stats to stdout :param verbose: verbose logging, defaults to False :param config: dict, configuration to pass on tiledb.VFS :param compressor: dict, serialized compression filter :param id: str, ID for logging :param trace, bool, enable trace logging :param log_uri: log array URI :return: if not appending then a sequence of populated GeoBlockMetadata objects
load_geometry_metadata
client.geospatial.ingestion.load_geometry_metadata(
sources,
*,
config=None,
verbose=False,
id='pointcloud_metadata',
trace=False,
log_uri=None,
)Return geospatial metadata for a sequence of input geometry data files
:param sources: A sequence of paths or path to input :param config: dict configuration to pass on tiledb.VFS :param verbose: bool, enable verbose logging, default is False :param trace: bool, enable trace logging, default is False :param log_uri: Optional[str] = None, :Return: list[GeoMetadata], a list of populated GeoMetadata objects
load_pointcloud_metadata
client.geospatial.ingestion.load_pointcloud_metadata(
sources,
*,
config=None,
verbose=False,
id='pointcloud_metadata',
trace=False,
log_uri=None,
)Return geospatial metadata for a sequence of input point cloud data files
:param sources: iterator, paths or path to process :param config: dict, configuration to pass on tiledb.VFS :param verbose: bool, enable verbose logging, default is False :param trace: bool, enable trace logging, default is False :param log_uri: Optional[str] = None, :Return: list[GeoMetadata], a list of populated GeoMetadata objects
load_raster_metadata
client.geospatial.ingestion.load_raster_metadata(
sources,
*,
config=None,
verbose=False,
id='raster_metadata',
trace=False,
log_uri=None,
)Return geospatial metadata for a sequence of input raster data files
:param sources: iterator, paths or path to process :param config: dict, configuration to pass on tiledb.VFS :param verbose: bool, enable verbose logging, default is False :param trace: bool, enable trace logging, default is False :param id: str, ID for logging :param log_uri: Optional[str] = None, :Return: list[GeoMetadata]: list of populated GeoMetadata objects
read_uris
client.geospatial.ingestion.read_uris(
list_uri,
dataset_type,
*,
log_uri=None,
config=None,
max_files=None,
verbose=False,
)Read a list of URIs from a URI.
:param list_uri: URI of the list of URIs :param dataset_type: dataset type, one of pointcloud, raster or geometry :param log_uri: log array URI :param config: config dictionary, defaults to None :param max_files: maximum number of URIs returned, defaults to None :param verbose: verbose logging, defaults to False :return: list of URIs
register_dataset_udf
client.geospatial.ingestion.register_dataset_udf(
dataset_uri,
*,
register_name,
namespace=None,
acn=None,
config=None,
verbose=False,
)Register the dataset on TileDB Cloud.
:param dataset_uri: dataset URI :param register_name: name to register the dataset with on TileDB Cloud :param namespace: TileDB Cloud namespace, defaults to the user’s default namespace :param acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None :param config: config dictionary, defaults to None :param verbose: verbose logging, defaults to False
remove_dataset_type_from_array_meta
client.geospatial.ingestion.remove_dataset_type_from_array_meta(
dataset_uri,
*,
verbose=False,
)Removes dataset_type meta if the ingested result is an array. FIXME: This exists to fix an internal UI issue until formally fixed. FIXME: Related ticket -> sc-48098
:param dataset_uri: dataset URI :param verbose: verbose logging, defaults to False