geospatial.ingestion
cloud.geospatial.ingestion
Functions
build_file_list_udf
cloud.geospatial.ingestion.build_file_list_udf(
dataset_type
config= None
search_uri= None
pattern= None
ignore= None
dataset_list_uri= None
max_files= None
verbose= False
trace= False
log_uri= None
)
Build a list of sources
Parameters
dataset_type
DatasetType
dataset type, one of pointcloud, raster or geometry
required
config
Optional[Mapping[str, object]]
config dictionary, defaults to None
None
search_uri
Optional[str]
URI to search for geospatial dataset files, defaults to None
None
pattern
Optional[str]
Unix shell style pattern to match when searching for files, defaults to None
None
ignore
Optional[str]
Unix shell style pattern to ignore when searching for files, defaults to None
None
dataset_list_uri
Optional[str]
URI with a list of dataset URIs, defaults to None
None
max_files
Optional[int]
maximum number of URIs to read/find, defaults to None (no limit)
None
verbose
bool
verbose logging, defaults to False
False
trace
bool
bool, enabling log tracing, defaults to False
False
log_uri
Optional[str]
log array URI
None
Returns
Sequence[str]
A sequence of source files grouped into batches
ingest_datasets
cloud.geospatial.ingestion.ingest_datasets(
dataset_uri
*
dataset_type
acn= None
config= None
namespace= None
register_name= None
search_uri= None
pattern= None
ignore= None
dataset_list_uri= None
max_files= None
compression_filter= None
workers= MAX_WORKERS
batch_size= BATCH_SIZE
tile_size= RASTER_TILE_SIZE
pixels_per_fragment= PIXELS_PER_FRAGMENT
chunk_size= POINT_CLOUD_CHUNK_SIZE
nodata= None
res= None
stats= False
verbose= False
trace= False
log_uri= None
)
Ingest samples into a dataset.
Parameters
dataset_uri
str
dataset URI
required
dataset_type
DatasetType
dataset type, one of pointcloud, raster or geometry
required
acn
Optional[str]
Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None
None
config
config dictionary, defaults to None
None
namespace
Optional[str]
TileDB-Cloud namespace, defaults to None
None
register_name
Optional[str]
name to register the dataset with on TileDB Cloud, defaults to None
None
search_uri
Optional[str]
URI to search for geospatial dataset files, defaults to None
None
pattern
Optional[str]
Unix shell style pattern to match when searching for files, defaults to None
None
ignore
Optional[str]
Unix shell style pattern to ignore when searching for files, defaults to None
None
dataset_list_uri
Optional[str]
URI with a list of dataset URIs, defaults to None
None
max_files
Optional[int]
maximum number of URIs to read/find, defaults to None (no limit)
None
compression_filter
Optional[dict]
serialized tiledb filter, defaults to None
None
workers
int
number of workers for dataset ingestion, defaults to MAX_WORKERS
MAX_WORKERS
batch_size
int
batch size for dataset ingestion, defaults to BATCH_SIZE
BATCH_SIZE
tile_size
int
for rasters this is the tile (block) size for the merged destination array defaults to 1024
RASTER_TILE_SIZE
pixels_per_fragment
int
This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size
PIXELS_PER_FRAGMENT
chunk_size
int
for point cloud this is the PDAL chunk size, defaults to 1000000
POINT_CLOUD_CHUNK_SIZE
nodata
Optional[float]
NODATA value for rasters
None
res
Tuple[float, float]
Tuple[float, float], output resolution in x/y
None
stats
bool
bool, print TileDB stats to stdout
False
verbose
bool
verbose logging, defaults to False
False
trace
bool
bool, enable trace for logging, defaults to False
False
log_uri
Optional[str]
log array URI
None
ingest_datasets_dag
cloud.geospatial.ingestion.ingest_datasets_dag(
dataset_uri
*
dataset_type
acn= None
config= None
namespace= None
register_name= None
search_uri= None
pattern= None
ignore= None
dataset_list_uri= None
max_files= None
compression_filter= None
workers= MAX_WORKERS
batch_size= BATCH_SIZE
tile_size= RASTER_TILE_SIZE
pixels_per_fragment= PIXELS_PER_FRAGMENT
chunk_size= POINT_CLOUD_CHUNK_SIZE
nodata= None
resampling= 'bilinear'
res= None
stats= False
verbose= False
trace= False
log_uri= None
)
Ingests geospatial point clouds, geometries and images into TileDB arrays
Parameters
dataset_uri
str
dataset URI
required
dataset_type
DatasetType
dataset type, one of pointcloud, raster or geometry
required
acn
Optional[str]
Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None
None
config
Optional[Mapping[str, object]]
config dictionary, defaults to None
None
namespace
Optional[str]
TileDB-Cloud namespace, defaults to None
None
register_name
Optional[str]
name to register the dataset with on TileDB Cloud, defaults to None and the destination array is not registered
None
search_uri
Optional[str]
URI to search for geospatial dataset files, defaults to None
None
pattern
Optional[str]
Unix shell style pattern to match when searching for files, defaults to None
None
ignore
Optional[str]
Unix shell style pattern to ignore when searching for files, defaults to None
None
dataset_list_uri
Optional[str]
URI with a list of dataset URIs, defaults to None
None
max_files
Optional[int]
maximum number of URIs to read/find, defaults to None (no limit)
None
compression_filter
Optional[dict]
serialized tiledb filter, defaults to None
None
workers
int
number of workers for dataset ingestion, defaults to MAX_WORKERS
MAX_WORKERS
batch_size
int
batch size for dataset ingestion, defaults to BATCH_SIZE
BATCH_SIZE
tile_size
int
for rasters this is the tile (block) size for the merged destination array, defaults to 1024
RASTER_TILE_SIZE
pixels_per_fragment
int
This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size
PIXELS_PER_FRAGMENT
chunk_size
int
for point cloud this is the PDAL chunk size, defaults to 1000000
POINT_CLOUD_CHUNK_SIZE
nodata
Optional[float]
NODATA value for raster merging
None
resampling
Optional[str]
string, resampling method, one of None, bilinear, cubic, nearest and average
'bilinear'
res
Tuple[float, float]
Tuple[float, float], output resolution in x/y
None
stats
bool
bool, print TileDB stats to stdout
False
verbose
bool
verbose logging, defaults to False
False
trace
bool
bool, enabling log tracing, defaults to False
False
log_uri
Optional[str]
log array URI
None
ingest_geometry_udf
cloud.geospatial.ingestion.ingest_geometry_udf(
dataset_uri
args= {}
sources= None
schema= None
extents= None
crs= None
chunk_size= GEOMETRY_CHUNK_SIZE
batch_size= BATCH_SIZE
compressor= None
append= False
verbose= False
stats= False
config= None
id = 'geometry'
trace= False
log_uri= None
)
Internal udf that ingests server side batch of geometry files into tiledb arrays using Fiona API
Parameters
dataset_uri
str
str, output TileDB array name
required
args
Union[Dict, List]
dict, input key value arguments as a dictionary
{}
sources
Sequence[str]
Sequence of input geometry file names
None
schema
dict
dict, dictionary of schema attributes and geometries
None
extents
Optional[XYBoundsTuple]
Extents of the destination geometry array
None
crs
Optional[str]
str, CRS for the destination dataset
None
chunk_size
Optional[int]
int, sets tile capacity and the number of geometries written at once
GEOMETRY_CHUNK_SIZE
batch_size
Optional[int]
batch size for dataset ingestion, defaults to BATCH_SIZE
BATCH_SIZE
compressor
Optional[dict]
dict, serialized compression filter
None
append
bool
bool, whether to append to the array
False
verbose
bool
verbose logging, defaults to False
False
stats
bool
bool, print TileDB stats to stdout
False
config
Optional[Mapping[str, object]]
dict, configuration to pass on tiledb.VFS
None
id
str
str, ID for logging
'geometry'
log_uri
Optional[str]
log array URI
None
Returns
Union[Sequence[os.PathLike], None]
if not appending then the function returns a tuple of file paths
ingest_point_cloud_udf
cloud.geospatial.ingestion.ingest_point_cloud_udf(
args= {}
dataset_uri
sources= None
append= False
chunk_size= POINT_CLOUD_CHUNK_SIZE
batch_size= BATCH_SIZE
verbose= False
stats= False
config= None
id = 'pointcloud'
trace= False
log_uri= None
)
Internal udf that ingests server side batch of point cloud files into tiledb arrays using PDAL API. Compression uses the default profile built in to PDAL.
Parameters
args
Union[Dict, List]
dict or list, input key value arguments as a dictionary
{}
dataset_uri
str
str, output TileDB array name
required
sources
Sequence[GeoMetadata]
Sequence of GeoMetadata objects
None
append
bool
bool, whether to append to the array
False
chunk_size
Optional[int]
PDAL configuration for chunking fragments
POINT_CLOUD_CHUNK_SIZE
batch_size
Optional[int]
batch size for dataset ingestion, defaults to BATCH_SIZE
BATCH_SIZE
verbose
bool
verbose logging, defaults to False
False
stats
bool
bool, print TileDB stats to stdout
False
config
Optional[Mapping[str, object]]
dict, configuration to pass on tiledb.VFS
None
id
str
str, ID for logging
'pointcloud'
log_uri
Optional[str]
log array URI
None
Returns
Union[Sequence[os.PathLike], None]
if not appending then a sequence of file paths
ingest_raster_udf
cloud.geospatial.ingestion.ingest_raster_udf(
args= {}
dataset_uri
sources= None
extents= None
band_count= None
dtype= None
nodata= None
pixels_per_fragment= PIXELS_PER_FRAGMENT
tile_size= RASTER_TILE_SIZE
resampling= DEFAULT_RASTER_SAMPLING
append= False
batch_size= BATCH_SIZE
stats= False
verbose= False
config= None
compressor= None
id = 'raster'
trace= False
log_uri= None
)
Internal udf that ingests server side batch of raster files into tiledb arrays using Rasterio API
Parameters
args
Union[Dict, List]
dict, input key value arguments as a dictionary
{}
dataset_uri
str
str, output TileDB array name
required
sources
Tuple[GeoBlockMetadata]
tuple, sequence of GeoBlockMetadata objects containing the destination raster window and the input files that contribute to this window
None
extents
Optional[BoundingBox]
Extents of the destination raster
None
band_count
Optional[int]
int, number of bands in destination array
None
dtype
Optional[str]
str, dtype of destination array
None
nodata
Optional[float]
float, NODATA value for destination raster
None
tile_size
int
for rasters this is the tile (block) size for the merged destination array, defaults to 1024
RASTER_TILE_SIZE
pixels_per_fragment
int
This is the number of pixels that will be written per fragment. Ideally aim to align as a factor of tile_size
PIXELS_PER_FRAGMENT
resampling
str
string, resampling method, one of None, bilinear, cubic, nearest and average
DEFAULT_RASTER_SAMPLING
append
bool
bool, whether to append to the array
False
batch_size
int
batch size for dataset ingestion, defaults to BATCH_SIZE
BATCH_SIZE
stats
bool
bool, print TileDB stats to stdout
False
verbose
bool
verbose logging, defaults to False
False
config
Optional[Mapping[str, object]]
dict, configuration to pass on tiledb.VFS
None
compressor
Optional[dict]
dict, serialized compression filter
None
id
str
str, ID for logging
'raster'
log_uri
Optional[str]
log array URI
None
Returns
Union[Sequence[GeoBlockMetadata], None]
if not appending then a sequence of populated GeoBlockMetadata objects
read_uris
cloud.geospatial.ingestion.read_uris(
list_uri
dataset_type
*
log_uri= None
config= None
max_files= None
verbose= False
)
Read a list of URIs from a URI.
Parameters
list_uri
str
URI of the list of URIs
required
dataset_type
DatasetType
dataset type, one of pointcloud, raster or geometry
required
log_uri
Optional[str]
log array URI
None
config
Optional[Mapping[str, object]]
config dictionary, defaults to None
None
max_files
Optional[int]
maximum number of URIs returned, defaults to None
None
verbose
bool
verbose logging, defaults to False
False
Returns
Sequence[str]
list of URIs
register_dataset_udf
cloud.geospatial.ingestion.register_dataset_udf(
dataset_uri
*
register_name
namespace= None
acn= None
config= None
verbose= False
)
Register the dataset on TileDB Cloud.
Parameters
dataset_uri
str
dataset URI
required
register_name
str
name to register the dataset with on TileDB Cloud
required
namespace
Optional[str]
TileDB Cloud namespace, defaults to the user’s default namespace
None
acn
Optional[str]
Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None
None
config
Optional[Mapping[str, object]]
config dictionary, defaults to None
None
verbose
bool
verbose logging, defaults to False
False