soma.ingest
client.soma.ingest
Functions
| Name | Description |
|---|---|
| ingest_h5ad | Ingests H5AD data by calling tiledbsoma.io.from_anndata |
| run_ingest_workflow | Starts a workflow to ingest H5AD data into SOMA. |
| run_ingest_workflow_udf | This is the highest-level ingestor component that runs on-node. Only |
ingest_h5ad
client.soma.ingest.ingest_h5ad(
output_uri,
input_uri,
measurement_name,
extra_tiledb_config=None,
platform_config=None,
ingest_mode='write',
logging_level=logging.INFO,
dry_run=False,
**kwargs,
)Ingests H5AD data by calling tiledbsoma.io.from_anndata
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| output_uri | str | The output URI to write to. This will probably look like “tiledb://workspace/teamspace/path/to/soma”. | required |
| input_uri | str | The URI of the H5AD file to read from. This file is read using TileDB VFS, so any path supported (and accessible) will work. | required |
| measurement_name | str | The name of the Measurement within the Experiment to store the data. | required |
| extra_tiledb_config | dict | Extra configuration for TileDB. | None |
| platform_config | dict | The SOMA platform_config value to pass in, if any. |
None |
| ingest_mode | str | One of the ingest modes supported by tiledbsoma.io.read_h5ad. |
'write' |
| logging_level | int | Set a logging level for this function. | logging.INFO |
| dry_run | bool | If provided and set to True, does the input-path traversals without ingesting data. |
False |
run_ingest_workflow
client.soma.ingest.run_ingest_workflow(
output_uri,
input_uri,
measurement_name,
pattern=None,
extra_tiledb_config=None,
platform_config=None,
ingest_mode='write',
ingest_resources=None,
acn=None,
logging_level=logging.INFO,
dry_run=False,
dag_factory=None,
dag_kwargs=None,
soma_image_name='genomics',
wait_for_inner=False,
**kwargs,
)Starts a workflow to ingest H5AD data into SOMA.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| output_uri | str | Output URI. | required |
| input_uri | str | The URI of the H5AD file(s) to read from. These are read using TileDB VFS, so any path supported (and accessible) will work. If the input_uri passes vfs.is_file, it is ingested. If the input_uri passes vfs.is_dir, then all first-level entries are ingested . In the latter, directory case, an input file is skipped if pattern is provided and doesn’t match the input file. As well, in the directory case, each entry’s basename is appended to the output_uri to form the entry’s output URI. For example, if “a.h5ad” and “b.h5ad” are present within input_uri of “s3://bucket/h5ads/” and output_uri is “tiledb://workspace/teamspace/somas”, then “tiledb://workspace/teamspace/somas/a” and “tiledb://workspace/teamspace/somas/b” are written. |
required |
| measurement_name | str | The name of the measurement within the experiment to store the data. | required |
| pattern | str | As described for input_uri. |
None |
| extra_tiledb_config | Optional[Dict[str, object]] | Extra configuration for TileDB. | None |
| platform_config | Optional[Dict[str, object]] | The SOMA platform_config value to pass in, if any. |
None |
| ingest_mode | Optional[str] | One of the ingest modes supported by tiledbsoma.io.read_h5ad. |
'write' |
| ingest_resources | dict | A specification for the amount of resources to provide to the UDF executing the ingestion process, to override the default. | None |
| acn | str | The name of the credentials to pass to the executing UDF. | None |
| dry_run | bool | If provided and set to True, does the input-path traversals without ingesting data. |
False |
| dag_factory | callable | Allows custom DAG classes to be used in tests. Defaults to dag.DAG. | None |
| dag_kwargs | dict | Keyword arguments for the dag_factory. | None |
| wait_for_inner | bool | Whether the inner task graph that computes run_ingest_workflow_udf() should wait for completion. Default: False. | False |
Returns
| Name | Type | Description |
|---|---|---|
| dict | A dictionary of {"status": "started", "graph_id": ...}, with the UUID of the graph on the server side, which can be used to manage execution and monitor progress. |
run_ingest_workflow_udf
client.soma.ingest.run_ingest_workflow_udf(
output_uri,
input_uri,
measurement_name,
pattern=None,
extra_tiledb_config=None,
platform_config=None,
ingest_mode='write',
ingest_resources=None,
acn=None,
logging_level=logging.INFO,
dry_run=False,
dag_factory=None,
dag_kwargs=None,
soma_image_name='genomics',
wait=False,
**kwargs,
)This is the highest-level ingestor component that runs on-node. Only here can we do VFS with access_credentials_name – that does not work correctly on the client.