library(tiledbsc)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#> (status 2 uses the sf package in place of rgdal)
library(SeuratObject)
#> Loading required package: spWhile tiledbsc facilitates the storage and retrieval of
complete Seurat/Bioconductor datasets, it’s also possible to selectively
update specific components of an existing SOMA.
As in the Introduction we’ll use the pbmc_small dataset
from the SeuratObject package.
data("pbmc_small", package = "SeuratObject")
pbmc_small
#> An object of class Seurat
#> 230 features across 80 samples within 1 assay
#> Active assay: RNA (230 features, 20 variable features)
#> 2 dimensional reductions calculated: pca, tsneWe’ll start by creating a Seurat Assay object that
contains only raw RNA counts.
pbmc_small_rna <- CreateAssayObject(
counts = GetAssayData(pbmc_small[["RNA"]], "counts")
)
Key(pbmc_small_rna) <- Key(pbmc_small[["RNA"]])and then ingest this into a new SOMA on disk:
This performed the following operations:
X, obsm, varm,
obsp, varp, and misc
obs, with a single dimension containing
all cell names and 0 attributes (because Seurat Assay
objects do not contain cell-level metadata)var, with a single dimension containing
all feature names and 0 attributes (because the feature-level metadata
was empty)counts within the X group
and ingested all data from pbmc_small_rna’s
counts slotPrinting out the X group, we can see it contains only a
single member, counts.
soma$X
#> <AssayMatrixGroup>
#> uri: /tmp/RtmpO41gFl/pbmc_small_rna/X
#> arrays: countsNote: A Seurat Assay instantiated with data passed
to the counts argument will store a reference to the same
matrix in both the counts and data slots
(i.e., GetAssayData(pbmc_small_rna, "counts") and
GetAssayData(pbmc_small_rna, "data")). To avoid redundantly
storing data on disk, tiledbsc* only ingests data from
data when it is not identical to counts.*
Now let’s populate pbmc_small_rna’s counts
slot with normalized counts.
pbmc_small_rna <- SetAssayData(
pbmc_small_rna,
slot = "data",
new.data = GetAssayData(pbmc_small[["RNA"]], "data")
)
pbmc_small_rna
#> Assay data with 230 features for 80 cells
#> First 10 features:
#> MS4A1, CD79B, CD79A, HLA-DRA, TCL1A, HLA-DQB1, HVCN1, HLA-DMB, LTB,
#> LINC00926At this point, counts and data contain
different transformations of the same matrix but only the
counts data has been written to disk.
We could simply rerun
soma$from_seurat_assay(pbmc_small_rna) to update the
SOMA and ingest the normalized counts matrix into
X/data, however, this would perform another full conversion
of the entire Assay object. As a result, data from the
counts slot would be re-ingested into
X/counts and feature-level metadata would be re-ingested
into var.
To avoid these redudndant ingestions, use the layers
argument to specify which of the assay slots (counts,
data, or scale.data) should be ingested. We
also set var = FALSE to prevent re-ingestion of the
feature-level metadata since it hasn’t changed.
soma$from_seurat_assay(pbmc_small_rna, layers = "data", var = FALSE)
#> No AssayMatrix found at '/tmp/RtmpO41gFl/pbmc_small_rna/X/data'
#> Creating new AssayMatrix array with index [var_id,obs_id] at '/tmp/RtmpO41gFl/pbmc_small_rna/X/data'
#> Adding 3 metadata keys to array
#> Ingesting AssayMatrix data into: /tmp/RtmpO41gFl/pbmc_small_rna/X/data
#> Finished converting Seurat Assay with key [rna_] to SOMAPrinting out the X group again reveals that it now
contains 2 members: counts and data.
soma$X
#> <AssayMatrixGroup>
#> uri: /tmp/RtmpO41gFl/pbmc_small_rna/X
#> arrays: counts, dataLet’s repeat this process for the scaled/centered version of the data
that’s stored in the scale.data slot.
pbmc_small_rna <- SetAssayData(
pbmc_small_rna,
slot = "scale.data",
new.data = GetAssayData(pbmc_small[["RNA"]], "scale.data")
)
soma$from_seurat_assay(pbmc_small_rna, layers = "scale.data", var = FALSE)
#> Skipping ingestion of 'data' because it is identical to 'counts'
#> No AssayMatrix found at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Creating new AssayMatrix array with index [var_id,obs_id] at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Adding 3 metadata keys to array
#> Ingesting AssayMatrix data into: /tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data
#> Finished converting Seurat Assay with key [rna_] to SOMA
soma$X
#> <AssayMatrixGroup>
#> uri: /tmp/RtmpO41gFl/pbmc_small_rna/X
#> arrays: counts, data, scale.dataThe X group now contains 3 arrays, each of which has
only been written to a single time. We can verify this by counting the
number of fragments contained within any of individual arrays:
soma$X$members$count$fragment_count()
#> [1] 1TileDB creates a new fragment each time a write is performed, so 1 fragment confirms the array has been written to only once.
If the data in your Seurat Assay has changed
since it was ingested into TileDB, you can follow the same basic process
to update the array.
Here, we’ll aritifically shift all of the values in the
scale.data slot by 1 and then re-ingest the updated data
into the scale.data layer on disk:
pbmc_small_rna <- SetAssayData(
pbmc_small_rna,
slot = "scale.data",
new.data = GetAssayData(pbmc_small[["RNA"]], "scale.data") + 1
)
soma$from_seurat_assay(pbmc_small_rna, layers = "scale.data", var = FALSE)
#> Skipping ingestion of 'data' because it is identical to 'counts'
#> Checking legacy validity mode for array: '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Found existing AssayMatrix at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Updating existing AssayMatrix at '/tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data'
#> Ingesting AssayMatrix data into: /tmp/RtmpO41gFl/pbmc_small_rna/X/scale.data
#> Finished converting Seurat Assay with key [rna_] to SOMANow let’s verify that number of fragments in each X’s
layers:
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] RcppSpdlog_0.0.14 SeuratObject_4.1.3 sp_2.0-0
#> [4] tiledbsc_0.1.5.9002
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.7 future_1.33.0 spdl_0.0.5
#> [4] stringi_1.7.12 lattice_0.21-8 listenv_0.9.0
#> [7] digest_0.6.33 magrittr_2.0.3 tiledb_0.20.3
#> [10] evaluate_0.21 grid_4.3.1 fastmap_1.1.1
#> [13] rprojroot_2.0.3 jsonlite_1.8.7 Matrix_1.5-4.1
#> [16] urltools_1.7.3 purrr_1.0.1 codetools_0.2-19
#> [19] textshaping_0.3.6 jquerylib_0.1.4 cli_3.6.1
#> [22] rlang_1.1.1 triebeard_0.4.1 parallelly_1.36.0
#> [25] future.apply_1.11.0 bit64_4.0.5 cachem_1.0.8
#> [28] yaml_2.3.7 tools_4.3.1 parallel_4.3.1
#> [31] nanotime_0.3.7 memoise_2.0.1 globals_0.16.2
#> [34] vctrs_0.6.3 R6_2.5.1 zoo_1.8-12
#> [37] lifecycle_1.0.3 stringr_1.5.0 fs_1.6.3
#> [40] bit_4.0.5 ragg_1.2.5 desc_1.4.2
#> [43] pkgdown_2.0.7 progressr_0.13.0 bslib_0.5.0
#> [46] glue_1.6.2 Rcpp_1.0.11 systemfonts_1.0.4
#> [49] xfun_0.40 knitr_1.43 htmltools_0.5.5
#> [52] rmarkdown_2.23 compiler_4.3.1 RcppCCTZ_0.2.12