soma.mapper

cloud.soma.mapper

Functions

Name	Description
build_collection_mapper_workflow_graph	The primary entrypoint for the mapper module. The caller passes in either a
experiment_to_anndata_slice	This function is not to be called directly: please use
experiment_to_axis_counts	Returns a tuple of (obs_counts, var_counts) if counts_only is True.
run_collection_mapper_workflow	This is an asynchronous entry point, which launches the task graph and returns

build_collection_mapper_workflow_graph

cloud.soma.mapper.build_collection_mapper_workflow_graph(
    soma_collection_uri=None
    soma_experiment_uris=None
    experiment_names=None
    measurement_name
    X_layer_name
    obs_query_string=None
    var_query_string=None
    obs_attrs=None
    var_attrs=None
    callback=lambda x: x
    args_dict=None
    extra_tiledb_config=None
    platform_config=None
    namespace=None
    task_graph_name='SOMAExperiment Collection Mapper'
    counts_only=False
    use_batch_mode=False
    resource_class=None
    resources=None
    access_credentials_name=None
    verbose=False
)

The primary entrypoint for the mapper module. The caller passes in either a sequence of SOMAExperiment URIs or a SOMACollection, which is simply a collection of SOMAExperiment objects. The caller also passes in query terms and a callback lambda which will be called on the to_anndata output of each experiment’s query. The result will be a dictionary mapping experiment names to the callback lambda’s output for each input experiment.

For example, if the lambda maps an anndata object to its .shape, then with SOMA experiments A and B, the task graph would return the dict {"A": (56868, 43050), "B": (23539, 42044)}.

Parameters for input data:

Parameters

Name	Type	Description	Default
soma_collection_uri	Optional[str]	URI of a `SOMACollection` containing `SOMAExperiment` objects to be processed. Please specify only one of `soma_collection_uri` or `soma_experiment_uris`.	`None`
soma_experiment_uris	Optional[Sequence[str]]	List/tuple of URIs of `SOMAExperiment` objects to be processed.	`None`
experiment_names	Optional[Sequence[str]]	Optional list of experiment names. If not provided, all `SOMAExperiment` objects are processed as specified by `soma_collection_uri` or `soma_experiment_uris`. If provided, `experiment_names` can be used to further subset/restrict which `SOMAExperiment` objects will be processed.	`None`
measurement_name	str	Which `SOMAMeasurement` to query within the specified `SOMAExperiment` objects. For example, `"RNA"`.	required
X_layer_name	str	Which `X` layer to query within the specified `SOMAMeasurement` objects. For example, `"data"`, `"raw"`, `"normalized"`. Query parameters:	required
obs_query_string	Optional[str]	Optional query string for `obs`. For example: `'cell_type == "liver"'`.	`None`
var_query_string	Optional[str]	Optional query string for `var`. For example: `'n_cells > 100'`.	`None`
obs_attrs	Optional[Sequence[str]]	Optional list of `obs` attributes to return as query output. Default: all.	`None`
var_attrs	Optional[Sequence[str]]	Optional list of `var` attributes to return as query output. Default: all. Parameters for data processing:	`None`
callback	Callable	Your code to run on each UDF node, one for each `SOMAExperiment`. On each node, `tiledbsoma.AxisQuery` is run, using parameters you specify as above, and then `query.to_anndata` is run on that query output. Your `callback` function receives that query-output AnnData object. For example: `lambda ad: ad.obs.shape`.	`lambda x: x`
args_dict	Optional[Dict[str, Any]]	Optional additional arguments to be passed to your callback. If provided, this must be a dict from string experiment name, to dict of key-value pairs.	`None`
counts_only	Optional[bool]	If specified, only return obs/var counts, not the result of the provided callback. TileDB configs:	`False`
extra_tiledb_config	Optional[Dict[str, object]]	Currently unused; reserved for future use.	`None`
platform_config	Optional[Dict[str, object]]	Currently unused; reserved for future use. Cloud configs:	`None`
namespace	Optional[str]	TileDB namespace in which to run the UDFs.	`None`
task_graph_name	str	Optional name for your task graph, so you can find it more easily among other runs. Real-time vs batch modes:	`'SOMAExperiment Collection Mapper'`
use_batch_mode	bool	If false (the default), uses real-time UDFs. These have lower latency but fewer resource options.	`False`
resource_class	Optional[str]	`"standard"` or `"large"`. Only valid when `use_batch_mode` is False.	`None`
resources	Optional[Dict[str, object]]	Only valid when `use_batch_mode` is True. Example: `resources={"cpu": "2", "memory": "8Gi"}`.	`None`
access_credentials_name	Optional[str]	Only valid when `use_batch_mode` is True. Other:	`None`
verbose	bool	If True, enable verbose logging. Default: False. Return value: A `DAG` object. If you’ve named this `dag`, you’ll need to call `dag.compute()`, `dag.wait()`, and `dag.end_results()`.	`False`

experiment_to_anndata_slice

cloud.soma.mapper.experiment_to_anndata_slice(
    exp
    *
    measurement_name
    X_layer_name
    obs_query_string=None
    var_query_string=None
    obs_attrs=None
    var_attrs=None
)

This function is not to be called directly: please use run_collection_mapper_workflow or build_collection_mapper_workflow_graph. This is the function that runs as a UDF node for each SOMAExperiment you specify.

experiment_to_axis_counts

cloud.soma.mapper.experiment_to_axis_counts(
    exp
    *
    measurement_name
    X_layer_name
    obs_query_string=None
    var_query_string=None
    obs_attrs=None
    var_attrs=None
)

Returns a tuple of (obs_counts, var_counts) if counts_only is True.

run_collection_mapper_workflow

cloud.soma.mapper.run_collection_mapper_workflow(
    soma_collection_uri=None
    soma_experiment_uris=None
    experiment_names=None
    measurement_name
    X_layer_name
    obs_query_string=None
    var_query_string=None
    obs_attrs=None
    var_attrs=None
    callback=lambda x: x
    args_dict=None
    extra_tiledb_config=None
    platform_config=None
    namespace=None
    task_graph_name='SOMAExperiment Collection Mapper'
    counts_only=False
    use_batch_mode=False
    resource_class=None
    resources=None
    access_credentials_name=None
    verbose=False
)

This is an asynchronous entry point, which launches the task graph and returns tracking information. Nominally this is not the primary use-case. Please see build_collection_mapper_workflow_graph for information about arguments and return value.