geospatial.ingestion

client.geospatial.ingestion

Functions

Name Description
build_file_list_udf Build a list of sources
build_inputs_udf Groups input URIs into batches.
consolidate_meta Consolidate arrays in the dataset.
ingest_datasets Ingest samples into a dataset.
ingest_datasets_dag Ingests geospatial point clouds, geometries and images into TileDB arrays
ingest_geometry_udf Internal udf that ingests server side batch of geometry files
ingest_point_cloud_udf Internal udf that ingests server side batch of point cloud files
ingest_raster_udf Internal udf that ingests server side batch of raster files
load_geometry_metadata Return geospatial metadata for a sequence of input geometry data files
load_pointcloud_metadata Return geospatial metadata for a sequence of input point cloud data files
load_raster_metadata Return geospatial metadata for a sequence of input raster data files
read_uris Read a list of URIs from a URI.
register_dataset_udf Register the dataset on TileDB Cloud.
remove_dataset_type_from_array_meta Removes dataset_type meta if the ingested result is an array.

build_file_list_udf

client.geospatial.ingestion.build_file_list_udf(
    dataset_type,
    config=None,
    search_uri=None,
    pattern=None,
    ignore=None,
    dataset_list_uri=None,
    max_files=None,
    verbose=False,
    trace=False,
    log_uri=None,
)

Build a list of sources :param dataset_type: dataset type, one of pointcloud, raster or geometry :param config: config dictionary, defaults to None :param search_uri: URI to search for geospatial dataset files, defaults to None :param pattern: Unix shell style pattern to match when searching for files, defaults to None :param ignore: Unix shell style pattern to ignore when searching for files, defaults to None :param dataset_list_uri: URI with a list of dataset URIs, defaults to None :param max_files: maximum number of URIs to read/find, defaults to None (no limit) :param verbose: verbose logging, defaults to False :param trace: bool, enabling log tracing, defaults to False :param log_uri: log array URI :return: A sequence of source files grouped into batches

build_inputs_udf

client.geospatial.ingestion.build_inputs_udf(
    dataset_type,
    sources,
    config=None,
    compression_filter=None,
    tile_size=RASTER_TILE_SIZE,
    pixels_per_fragment=PIXELS_PER_FRAGMENT,
    chunk_size=POINT_CLOUD_CHUNK_SIZE,
    nodata=None,
    resampling='bilinear',
    res=None,
    verbose=False,
    trace=False,
    log_uri=None,
)

Groups input URIs into batches. :param dataset_type: dataset type, one of pointcloud, raster or geometry :param sources: URIs to process :param config: config dictionary, defaults to None :param compression_filter: serialized tiledb filter, defaults to None :param tile_size: for rasters this is the tile (block) size for the merged destination array, defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param chunk_size: for point cloud this is the PDAL chunk size, defaults to 1000000 :param nodata: NODATA value for raster merging :param resampling: string, resampling method, one of None, bilinear, cubic, nearest and average :param res: Tuple[float, float], output resolution in x/y :param verbose: verbose logging, defaults to False :param trace: bool, enabling log tracing, to False :param log_uri: log array URI :return: A dict containing the kwargs needed for the next function call

consolidate_meta

client.geospatial.ingestion.consolidate_meta(
    dataset_uri,
    *,
    config=None,
    id='consolidate',
    verbose=False,
    trace=False,
    log_uri=None,
)

Consolidate arrays in the dataset.

:param dataset_uri: dataset URI :param config: config dictionary, defaults to None :param id: profiler event id, defaults to “consolidate” :param verbose: verbose logging, defaults to False

ingest_datasets

client.geospatial.ingestion.ingest_datasets(
    dataset_uri,
    *,
    dataset_type,
    acn=None,
    config=None,
    namespace=None,
    register_name=None,
    search_uri=None,
    pattern=None,
    ignore=None,
    dataset_list_uri=None,
    max_files=None,
    compression_filter=None,
    workers=MAX_WORKERS,
    batch_size=BATCH_SIZE,
    tile_size=RASTER_TILE_SIZE,
    pixels_per_fragment=PIXELS_PER_FRAGMENT,
    chunk_size=POINT_CLOUD_CHUNK_SIZE,
    nodata=None,
    res=None,
    stats=False,
    verbose=False,
    trace=False,
    log_uri=None,
)

Ingest samples into a dataset.

:param dataset_uri: dataset URI :param dataset_type: dataset type, one of pointcloud, raster or geometry :param acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None :param config: config dictionary, defaults to None :param namespace: TileDB-Cloud namespace, defaults to None :param register_name: name to register the dataset with on TileDB Cloud, defaults to None :param search_uri: URI to search for geospatial dataset files, defaults to None :param pattern: Unix shell style pattern to match when searching for files, defaults to None :param ignore: Unix shell style pattern to ignore when searching for files, defaults to None :param dataset_list_uri: URI with a list of dataset URIs, defaults to None :param max_files: maximum number of URIs to read/find, defaults to None (no limit) :param compression_filter: serialized tiledb filter, defaults to None :param workers: number of workers for dataset ingestion, defaults to MAX_WORKERS :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param tile_size: for rasters this is the tile (block) size for the merged destination array defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param chunk_size: for point cloud this is the PDAL chunk size, defaults to 1000000 :param nodata: NODATA value for rasters :param res: Tuple[float, float], output resolution in x/y :param stats: bool, print TileDB stats to stdout :param verbose: verbose logging, defaults to False :param trace: bool, enable trace for logging, defaults to False :param log_uri: log array URI

ingest_datasets_dag

client.geospatial.ingestion.ingest_datasets_dag(
    dataset_uri,
    *,
    dataset_type,
    acn=None,
    config=None,
    workspace=None,
    register_name=None,
    search_uri=None,
    pattern=None,
    ignore=None,
    dataset_list_uri=None,
    max_files=None,
    compression_filter=None,
    workers=MAX_WORKERS,
    batch_size=BATCH_SIZE,
    tile_size=RASTER_TILE_SIZE,
    pixels_per_fragment=PIXELS_PER_FRAGMENT,
    chunk_size=POINT_CLOUD_CHUNK_SIZE,
    nodata=None,
    resampling='bilinear',
    res=None,
    stats=False,
    verbose=False,
    trace=False,
    log_uri=None,
)

Ingests geospatial point clouds, geometries and images into TileDB arrays

:param dataset_uri: dataset URI :param dataset_type: dataset type, one of pointcloud, raster or geometry :param acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None :param config: config dictionary, defaults to None :param workspace: TileDB-Cloud workspace, defaults to None :param register_name: name to register the dataset with on TileDB Cloud, defaults to None and the destination array is not registered :param search_uri: URI to search for geospatial dataset files, defaults to None :param pattern: Unix shell style pattern to match when searching for files, defaults to None :param ignore: Unix shell style pattern to ignore when searching for files, defaults to None :param dataset_list_uri: URI with a list of dataset URIs, defaults to None :param max_files: maximum number of URIs to read/find, defaults to None (no limit) :param compression_filter: serialized tiledb filter, defaults to None :param workers: number of workers for dataset ingestion, defaults to MAX_WORKERS :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param tile_size: for rasters this is the tile (block) size for the merged destination array, defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param chunk_size: for point cloud this is the PDAL chunk size, defaults to 1000000 :param nodata: NODATA value for raster merging :param resampling: string, resampling method, one of None, bilinear, cubic, nearest and average :param res: Tuple[float, float], output resolution in x/y :param stats: bool, print TileDB stats to stdout :param verbose: verbose logging, defaults to False :param trace: bool, enabling log tracing, defaults to False :param log_uri: log array URI

ingest_geometry_udf

client.geospatial.ingestion.ingest_geometry_udf(
    dataset_uri,
    args={},
    sources=None,
    schema=None,
    extents=None,
    crs=None,
    chunk_size=GEOMETRY_CHUNK_SIZE,
    batch_size=BATCH_SIZE,
    compressor=None,
    append=False,
    verbose=False,
    stats=False,
    config=None,
    id='geometry',
    trace=False,
    log_uri=None,
)

Internal udf that ingests server side batch of geometry files into tiledb arrays using Fiona API

:param dataset_uri: str, output TileDB array name :param args: dict, input key value arguments as a dictionary :param sources: Sequence of input geometry file names :param schema: dict, dictionary of schema attributes and geometries :param extents: Extents of the destination geometry array :param crs: str, CRS for the destination dataset :param chunk_size: int, sets tile capacity and the number of geometries written at once :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param compressor: dict, serialized compression filter :param append: bool, whether to append to the array :param verbose: verbose logging, defaults to False :param stats: bool, print TileDB stats to stdout :param config: dict, configuration to pass on tiledb.VFS :param id: str, ID for logging :param trace, bool, enable trace logging :param log_uri: log array URI :return: if not appending then the function returns a tuple of file paths

ingest_point_cloud_udf

client.geospatial.ingestion.ingest_point_cloud_udf(
    args={},
    dataset_uri,
    sources=None,
    append=False,
    chunk_size=POINT_CLOUD_CHUNK_SIZE,
    batch_size=BATCH_SIZE,
    verbose=False,
    stats=False,
    config=None,
    id='pointcloud',
    trace=False,
    log_uri=None,
)

Internal udf that ingests server side batch of point cloud files into tiledb arrays using PDAL API. Compression uses the default profile built in to PDAL.

:param args: dict or list, input key value arguments as a dictionary :param dataset_uri: str, output TileDB array name :param sources: Sequence of GeoMetadata objects :param append: bool, whether to append to the array :param chunk_size: PDAL configuration for chunking fragments :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param verbose: verbose logging, defaults to False :param stats: bool, print TileDB stats to stdout :param config: dict, configuration to pass on tiledb.VFS :param id: str, ID for logging :param trace, bool, enable trace logging :param log_uri: log array URI :return: if not appending then a sequence of file paths

ingest_raster_udf

client.geospatial.ingestion.ingest_raster_udf(
    args={},
    dataset_uri,
    sources=None,
    extents=None,
    band_count=None,
    dtype=None,
    nodata=None,
    pixels_per_fragment=PIXELS_PER_FRAGMENT,
    tile_size=RASTER_TILE_SIZE,
    resampling=DEFAULT_RASTER_SAMPLING,
    append=False,
    batch_size=BATCH_SIZE,
    stats=False,
    verbose=False,
    config=None,
    compressor=None,
    id='raster',
    trace=False,
    log_uri=None,
)

Internal udf that ingests server side batch of raster files into tiledb arrays using Rasterio API

:param args: dict, input key value arguments as a dictionary :param dataset_uri: str, output TileDB array name :param sources: tuple, sequence of GeoBlockMetadata objects containing the destination raster window and the input files that contribute to this window :param extents: Extents of the destination raster :param band_count: int, number of bands in destination array :param dtype: str, dtype of destination array :param nodata: float, NODATA value for destination raster :param tile_size: for rasters this is the tile (block) size for the merged destination array, defaults to 1024 :param pixels_per_fragment: This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size :param resampling: string, resampling method, one of None, bilinear, cubic, nearest and average :param append: bool, whether to append to the array :param batch_size: batch size for dataset ingestion, defaults to BATCH_SIZE :param stats: bool, print TileDB stats to stdout :param verbose: verbose logging, defaults to False :param config: dict, configuration to pass on tiledb.VFS :param compressor: dict, serialized compression filter :param id: str, ID for logging :param trace, bool, enable trace logging :param log_uri: log array URI :return: if not appending then a sequence of populated GeoBlockMetadata objects

load_geometry_metadata

client.geospatial.ingestion.load_geometry_metadata(
    sources,
    *,
    config=None,
    verbose=False,
    id='pointcloud_metadata',
    trace=False,
    log_uri=None,
)

Return geospatial metadata for a sequence of input geometry data files

:param sources: A sequence of paths or path to input :param config: dict configuration to pass on tiledb.VFS :param verbose: bool, enable verbose logging, default is False :param trace: bool, enable trace logging, default is False :param log_uri: Optional[str] = None, :Return: list[GeoMetadata], a list of populated GeoMetadata objects

load_pointcloud_metadata

client.geospatial.ingestion.load_pointcloud_metadata(
    sources,
    *,
    config=None,
    verbose=False,
    id='pointcloud_metadata',
    trace=False,
    log_uri=None,
)

Return geospatial metadata for a sequence of input point cloud data files

:param sources: iterator, paths or path to process :param config: dict, configuration to pass on tiledb.VFS :param verbose: bool, enable verbose logging, default is False :param trace: bool, enable trace logging, default is False :param log_uri: Optional[str] = None, :Return: list[GeoMetadata], a list of populated GeoMetadata objects

load_raster_metadata

client.geospatial.ingestion.load_raster_metadata(
    sources,
    *,
    config=None,
    verbose=False,
    id='raster_metadata',
    trace=False,
    log_uri=None,
)

Return geospatial metadata for a sequence of input raster data files

:param sources: iterator, paths or path to process :param config: dict, configuration to pass on tiledb.VFS :param verbose: bool, enable verbose logging, default is False :param trace: bool, enable trace logging, default is False :param id: str, ID for logging :param log_uri: Optional[str] = None, :Return: list[GeoMetadata]: list of populated GeoMetadata objects

read_uris

client.geospatial.ingestion.read_uris(
    list_uri,
    dataset_type,
    *,
    log_uri=None,
    config=None,
    max_files=None,
    verbose=False,
)

Read a list of URIs from a URI.

:param list_uri: URI of the list of URIs :param dataset_type: dataset type, one of pointcloud, raster or geometry :param log_uri: log array URI :param config: config dictionary, defaults to None :param max_files: maximum number of URIs returned, defaults to None :param verbose: verbose logging, defaults to False :return: list of URIs

register_dataset_udf

client.geospatial.ingestion.register_dataset_udf(
    dataset_uri,
    *,
    register_name,
    namespace=None,
    acn=None,
    config=None,
    verbose=False,
)

Register the dataset on TileDB Cloud.

:param dataset_uri: dataset URI :param register_name: name to register the dataset with on TileDB Cloud :param namespace: TileDB Cloud namespace, defaults to the user’s default namespace :param acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None :param config: config dictionary, defaults to None :param verbose: verbose logging, defaults to False

remove_dataset_type_from_array_meta

client.geospatial.ingestion.remove_dataset_type_from_array_meta(
    dataset_uri,
    *,
    verbose=False,
)

Removes dataset_type meta if the ingested result is an array. FIXME: This exists to fix an internal UI issue until formally fixed. FIXME: Related ticket -> sc-48098

:param dataset_uri: dataset URI :param verbose: verbose logging, defaults to False