library(tiledbsc)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#> (status 2 uses the sf package in place of rgdal)
library(SeuratObject)
#> Loading required package: sp
While tiledbsc facilitates the storage and retrieval of
complete Seurat/Bioconductor datasets, it’s also possible to selectively
update specific components of an existing SOMA
.
As in the Introduction we’ll use the pbmc_small
dataset
from the SeuratObject package.
data("pbmc_small", package = "SeuratObject")
pbmc_small
#> An object of class Seurat
#> 230 features across 80 samples within 1 assay
#> Active assay: RNA (230 features, 20 variable features)
#> 2 dimensional reductions calculated: pca, tsne
We’ll start by creating a Seurat Assay
object that
contains only raw RNA counts.
pbmc_small_rna <- CreateAssayObject(
counts = GetAssayData(pbmc_small[["RNA"]], "counts")
)
Key(pbmc_small_rna) <- Key(pbmc_small[["RNA"]])
and then ingest this into a new SOMA
on disk:
This performed the following operations:
X
, obsm
, varm
,
obsp
, varp
, and misc
obs
, with a single dimension containing
all cell names and 0 attributes (because Seurat Assay
objects do not contain cell-level metadata)var
, with a single dimension containing
all feature names and 0 attributes (because the feature-level metadata
was empty)counts
within the X
group
and ingested all data from pbmc_small_rna
’s
counts
slotPrinting out the X
group, we can see it contains only a
single member, counts
.
soma$X
#> <AssayMatrixGroup>
#> uri: /tmp/RtmpO41gFl/pbmc_small_rna/X
#> arrays: counts
Note: A Seurat Assay
instantiated with data passed
to the counts
argument will store a reference to the same
matrix in both the counts
and data
slots
(i.e., GetAssayData(pbmc_small_rna, "counts")
and
GetAssayData(pbmc_small_rna, "data")
). To avoid redundantly
storing data on disk, tiledbsc* only ingests data from
data
when it is not identical to counts
.*
Now let’s populate pbmc_small_rna
’s counts
slot with normalized counts.
pbmc_small_rna <- SetAssayData(
pbmc_small_rna,
slot = "data",
new.data = GetAssayData(pbmc_small[["RNA"]], "data")
)
pbmc_small_rna
#> Assay data with 230 features for 80 cells
#> First 10 features:
#> MS4A1, CD79B, CD79A, HLA-DRA, TCL1A, HLA-DQB1, HVCN1, HLA-DMB, LTB,
#> LINC00926
At this point, counts
and data
contain
different transformations of the same matrix but only the
counts
data has been written to disk.
We could simply rerun
soma$from_seurat_assay(pbmc_small_rna)
to update the
SOMA
and ingest the normalized counts matrix into
X/data
, however, this would perform another full conversion
of the entire Assay
object. As a result, data from the
counts
slot would be re-ingested into
X/counts
and feature-level metadata would be re-ingested
into var
.
To avoid these redudndant ingestions, use the layers
argument to specify which of the assay slots (counts
,
data
, or scale.data
) should be ingested. We
also set var = FALSE
to prevent re-ingestion of the
feature-level metadata since it hasn’t changed.
soma$from_seurat_assay(pbmc_small_rna, layers = "data", var = FALSE)
#> No AssayMatrix found at '/tmp/RtmpO41gFl/pbmc_small_rna/X/data'
#> Creating new AssayMatrix array with index [var_id,obs_id] at '/tmp/RtmpO41gFl/pbmc_small_rna/X/data'
#> Adding 3 metadata keys to array
#> Ingesting AssayMatrix data into: /tmp/RtmpO41gFl/pbmc_small_rna/X/data
#> Finished converting Seurat Assay with key [rna_] to SOMA
Printing out the X
group again reveals that it now
contains 2 members: counts
and data
.
soma$X
#> <AssayMatrixGroup>
#> uri: /tmp/RtmpO41gFl/pbmc_small_rna/X
#> arrays: counts, data
Let’s repeat this process for the scaled/centered version of the data
that’s stored in the scale.data
slot.
pbmc_small_rna <- SetAssayData(
pbmc_small_rna,
slot = "scale.data",
new.data = GetAssayData(pbmc_small[["RNA"]], "scale.data")
)
soma$from_seurat_assay(pbmc_small_rna, layers = "scale.data", var = FALSE)
#> Skipping ingestion of 'data' because it is identical to 'counts'
#> No AssayMatrix found at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Creating new AssayMatrix array with index [var_id,obs_id] at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Adding 3 metadata keys to array
#> Ingesting AssayMatrix data into: /tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data
#> Finished converting Seurat Assay with key [rna_] to SOMA
soma$X
#> <AssayMatrixGroup>
#> uri: /tmp/RtmpO41gFl/pbmc_small_rna/X
#> arrays: counts, data, scale.data
The X
group now contains 3 arrays, each of which has
only been written to a single time. We can verify this by counting the
number of fragments contained within any of individual arrays:
soma$X$members$count$fragment_count()
#> [1] 1
TileDB creates a new fragment each time a write is performed, so 1 fragment confirms the array has been written to only once.
If the data in your Seurat Assay
has changed
since it was ingested into TileDB, you can follow the same basic process
to update the array.
Here, we’ll aritifically shift all of the values in the
scale.data
slot by 1 and then re-ingest the updated data
into the scale.data
layer on disk:
pbmc_small_rna <- SetAssayData(
pbmc_small_rna,
slot = "scale.data",
new.data = GetAssayData(pbmc_small[["RNA"]], "scale.data") + 1
)
soma$from_seurat_assay(pbmc_small_rna, layers = "scale.data", var = FALSE)
#> Skipping ingestion of 'data' because it is identical to 'counts'
#> Checking legacy validity mode for array: '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Found existing AssayMatrix at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Updating existing AssayMatrix at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Ingesting AssayMatrix data into: /tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data
#> Finished converting Seurat Assay with key [rna_] to SOMA
Now let’s verify that number of fragments in each X
’s
layers:
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] RcppSpdlog_0.0.14 SeuratObject_4.1.3 sp_2.0-0
#> [4] tiledbsc_0.1.5.9002
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.7 future_1.33.0 spdl_0.0.5
#> [4] stringi_1.7.12 lattice_0.21-8 listenv_0.9.0
#> [7] digest_0.6.33 magrittr_2.0.3 tiledb_0.20.3
#> [10] evaluate_0.21 grid_4.3.1 fastmap_1.1.1
#> [13] rprojroot_2.0.3 jsonlite_1.8.7 Matrix_1.5-4.1
#> [16] urltools_1.7.3 purrr_1.0.1 codetools_0.2-19
#> [19] textshaping_0.3.6 jquerylib_0.1.4 cli_3.6.1
#> [22] rlang_1.1.1 triebeard_0.4.1 parallelly_1.36.0
#> [25] future.apply_1.11.0 bit64_4.0.5 cachem_1.0.8
#> [28] yaml_2.3.7 tools_4.3.1 parallel_4.3.1
#> [31] nanotime_0.3.7 memoise_2.0.1 globals_0.16.2
#> [34] vctrs_0.6.3 R6_2.5.1 zoo_1.8-12
#> [37] lifecycle_1.0.3 stringr_1.5.0 fs_1.6.3
#> [40] bit_4.0.5 ragg_1.2.5 desc_1.4.2
#> [43] pkgdown_2.0.7 progressr_0.13.0 bslib_0.5.0
#> [46] glue_1.6.2 Rcpp_1.0.11 systemfonts_1.0.4
#> [49] xfun_0.40 knitr_1.43 htmltools_0.5.5
#> [52] rmarkdown_2.23 compiler_4.3.1 RcppCCTZ_0.2.12