soma.mapper
cloud.soma.mapper
Functions
Name | Description |
---|---|
build_collection_mapper_workflow_graph | The primary entrypoint for the mapper module. The caller passes in either a |
experiment_to_anndata_slice | This function is not to be called directly: please use |
experiment_to_axis_counts | Returns a tuple of (obs_counts, var_counts) if counts_only is True. |
run_collection_mapper_workflow | This is an asynchronous entry point, which launches the task graph and returns |
build_collection_mapper_workflow_graph
cloud.soma.mapper.build_collection_mapper_workflow_graph(=None
soma_collection_uri=None
soma_experiment_uris=None
experiment_names
measurement_name
X_layer_name=None
obs_query_string=None
var_query_string=None
obs_attrs=None
var_attrs=lambda x: x
callback=None
args_dict=None
extra_tiledb_config=None
platform_config=None
namespace='SOMAExperiment Collection Mapper'
task_graph_name=False
counts_only=False
use_batch_mode=None
resource_class=None
resources=None
access_credentials_name=False
verbose )
The primary entrypoint for the mapper module. The caller passes in either a sequence of SOMAExperiment
URIs or a SOMACollection
, which is simply a collection of SOMAExperiment objects. The caller also passes in query terms and a callback lambda which will be called on the to_anndata
output of each experiment’s query. The result will be a dictionary mapping experiment names to the callback lambda’s output for each input experiment.
For example, if the lambda maps an anndata object to its .shape
, then with SOMA experiments A
and B
, the task graph would return the dict {"A": (56868, 43050), "B": (23539, 42044)}
.
Parameters for input data:
Parameters
Name | Type | Description | Default |
---|---|---|---|
soma_collection_uri | Optional[str] | URI of a SOMACollection containing SOMAExperiment objects to be processed. Please specify only one of soma_collection_uri or soma_experiment_uris . |
None |
soma_experiment_uris | Optional[Sequence[str]] | List/tuple of URIs of SOMAExperiment objects to be processed. |
None |
experiment_names | Optional[Sequence[str]] | Optional list of experiment names. If not provided, all SOMAExperiment objects are processed as specified by soma_collection_uri or soma_experiment_uris . If provided, experiment_names can be used to further subset/restrict which SOMAExperiment objects will be processed. |
None |
measurement_name | str | Which SOMAMeasurement to query within the specified SOMAExperiment objects. For example, "RNA" . |
required |
X_layer_name | str | Which X layer to query within the specified SOMAMeasurement objects. For example, "data" , "raw" , "normalized" . Query parameters: |
required |
obs_query_string | Optional[str] | Optional query string for obs . For example: 'cell_type == "liver"' . |
None |
var_query_string | Optional[str] | Optional query string for var . For example: 'n_cells > 100' . |
None |
obs_attrs | Optional[Sequence[str]] | Optional list of obs attributes to return as query output. Default: all. |
None |
var_attrs | Optional[Sequence[str]] | Optional list of var attributes to return as query output. Default: all. Parameters for data processing: |
None |
callback | Callable | Your code to run on each UDF node, one for each SOMAExperiment . On each node, tiledbsoma.AxisQuery is run, using parameters you specify as above, and then query.to_anndata is run on that query output. Your callback function receives that query-output AnnData object. For example: lambda ad: ad.obs.shape . |
lambda x: x |
args_dict | Optional[Dict[str, Any]] | Optional additional arguments to be passed to your callback. If provided, this must be a dict from string experiment name, to dict of key-value pairs. | None |
counts_only | Optional[bool] | If specified, only return obs/var counts, not the result of the provided callback. TileDB configs: | False |
extra_tiledb_config | Optional[Dict[str, object]] | Currently unused; reserved for future use. | None |
platform_config | Optional[Dict[str, object]] | Currently unused; reserved for future use. Cloud configs: | None |
namespace | Optional[str] | TileDB namespace in which to run the UDFs. | None |
task_graph_name | str | Optional name for your task graph, so you can find it more easily among other runs. Real-time vs batch modes: | 'SOMAExperiment Collection Mapper' |
use_batch_mode | bool | If false (the default), uses real-time UDFs. These have lower latency but fewer resource options. | False |
resource_class | Optional[str] | "standard" or "large" . Only valid when use_batch_mode is False. |
None |
resources | Optional[Dict[str, object]] | Only valid when use_batch_mode is True. Example: resources={"cpu": "2", "memory": "8Gi"} . |
None |
access_credentials_name | Optional[str] | Only valid when use_batch_mode is True. Other: |
None |
verbose | bool | If True, enable verbose logging. Default: False. Return value: A DAG object. If you’ve named this dag , you’ll need to call dag.compute() , dag.wait() , and dag.end_results() . |
False |
experiment_to_anndata_slice
cloud.soma.mapper.experiment_to_anndata_slice(
exp*
measurement_name
X_layer_name=None
obs_query_string=None
var_query_string=None
obs_attrs=None
var_attrs )
This function is not to be called directly: please use run_collection_mapper_workflow
or build_collection_mapper_workflow_graph
. This is the function that runs as a UDF node for each SOMAExperiment
you specify.
experiment_to_axis_counts
cloud.soma.mapper.experiment_to_axis_counts(
exp*
measurement_name
X_layer_name=None
obs_query_string=None
var_query_string=None
obs_attrs=None
var_attrs )
Returns a tuple of (obs_counts, var_counts) if counts_only is True.
run_collection_mapper_workflow
cloud.soma.mapper.run_collection_mapper_workflow(=None
soma_collection_uri=None
soma_experiment_uris=None
experiment_names
measurement_name
X_layer_name=None
obs_query_string=None
var_query_string=None
obs_attrs=None
var_attrs=lambda x: x
callback=None
args_dict=None
extra_tiledb_config=None
platform_config=None
namespace='SOMAExperiment Collection Mapper'
task_graph_name=False
counts_only=False
use_batch_mode=None
resource_class=None
resources=None
access_credentials_name=False
verbose )
This is an asynchronous entry point, which launches the task graph and returns tracking information. Nominally this is not the primary use-case. Please see build_collection_mapper_workflow_graph
for information about arguments and return value.