soma.ingest
cloud.soma.ingest
Functions
Name | Description |
---|---|
ingest_h5ad | Performs the actual work of ingesting H5AD data into TileDB. |
register_dataset_udf | Register the dataset on TileDB Cloud. |
run_ingest_workflow | Starts a workflow to ingest H5AD data into SOMA. |
run_ingest_workflow_udf | This is the highest-level ingestor component that runs on-node. Only here |
ingest_h5ad
cloud.soma.ingest.ingest_h5ad(
output_uri
input_uri
measurement_name=None
extra_tiledb_config=None
platform_config='write'
ingest_mode=logging.INFO
logging_level=False
dry_run )
Performs the actual work of ingesting H5AD data into TileDB.
Parameters
Name | Type | Description | Default |
---|---|---|---|
output_uri | str | The output URI to write to. This will probably look like tiledb://namespace/some://storage/uri . |
required |
input_uri | str | The URI of the H5AD file to read from. This file is read using TileDB VFS, so any path supported (and accessible) will work. | required |
measurement_name | str | The name of the Measurement within the Experiment to store the data. | required |
extra_tiledb_config | Optional[Dict[str, object]] | Extra configuration for TileDB. | None |
platform_config | Optional[Dict[str, object]] | The SOMA platform_config value to pass in, if any. |
None |
ingest_mode | str | One of the ingest modes supported by tiledbsoma.io.read_h5ad . |
'write' |
dry_run | bool | If provided and set to True , does the input-path traversals without ingesting data. |
False |
register_dataset_udf
cloud.soma.ingest.register_dataset_udf(
dataset_uri*
register_name
acn=None
namespace=None
config=False
verbose )
Register the dataset on TileDB Cloud.
Parameters
Name | Type | Description | Default |
---|---|---|---|
dataset_uri | str | dataset URI | required |
register_name | str | name to register the dataset with on TileDB Cloud | required |
namespace | Optional[str] | TileDB Cloud namespace, defaults to the user’s default namespace | None |
config | Optional[Mapping[str, Any]] | config dictionary, defaults to None | None |
verbose | bool | verbose logging, defaults to False | False |
run_ingest_workflow
cloud.soma.ingest.run_ingest_workflow(
output_uri
input_uri
measurement_name=None
pattern=None
extra_tiledb_config=None
platform_config='write'
ingest_mode=None
ingest_resources=None
namespace=None
register_name=None
acn=logging.INFO
logging_level=False
dry_run**kwargs
)
Starts a workflow to ingest H5AD data into SOMA.
Parameters
Name | Type | Description | Default |
---|---|---|---|
output_uri | str | The output URI to write to. This will probably look like tiledb://namespace/some://storage/uri . |
required |
input_uri | str | The URI of the H5AD file(s) to read from. These are read using TileDB VFS, so any path supported (and accessible) will work. If the input_uri passes vfs.is_file , it’s ingested. If the input_uri passes vfs.is_dir , then all first-level entries are ingested . In the latter, directory case, an input file is skipped if pattern is provided and doesn’t match the input file. As well, in the directory case, each entry’s basename is appended to the output_uri to form the entry’s output URI. For example, if a.h5ad` and b.h5adare present within input_uriof s3://bucket/h5ads/and output_uriis tiledb://namespace/s3://bucket/somas, then tiledb://namespace/s3://bucket/somas/aand tiledb://namespace/s3://bucket/somas/bare written. | _required_ | | measurement_name | str | The name of the Measurement within the Experiment to store the data. | _required_ | | pattern | Optional\[str\] | As described for input_uri. | `None` | | extra_tiledb_config | Optional\[Dict\[str, object\]\] | Extra configuration for TileDB. | `None` | | platform_config | Optional\[Dict\[str, object\]\] | The SOMA platform_configvalue to pass in, if any. | `None` | | ingest_mode | str | One of the ingest modes supported by tiledbsoma.io.read_h5ad. | `'write'` | | ingest_resources | Optional\[Dict\[str, object\]\] | A specification for the amount of resources to provide to the UDF executing the ingestion process, to override the default. | `None` | | namespace | Optional\[str\] | An alternate namespace to run the ingestion process under. | `None` | | register_name | Optional\[str\] | name to register the dataset with on TileDB Cloud. | `None` | | acn | Optional\[str\] | The name of the credentials to pass to the executing UDF. | `None` | | dry_run | bool | If provided and set to True`, does the input-path traversals without ingesting data. | False` |
Returns
Name | Type | Description |
---|---|---|
Dict[str, str] | A dictionary of {"status": "started", "graph_id": ...} , with the UUID of the graph on the server side, which can be used to manage execution and monitor progress. |
run_ingest_workflow_udf
cloud.soma.ingest.run_ingest_workflow_udf(
output_uri
input_uri
measurement_name=None
pattern=None
extra_tiledb_config=None
platform_config='write'
ingest_mode=None
ingest_resources=None
namespace=None
register_name=None
acn=logging.INFO
logging_level=False
dry_run**kwargs
)
This is the highest-level ingestor component that runs on-node. Only here can we do VFS with access_credentials_name – that does not work correctly on the client.