Abstract

This vignette describe first steps with TileDB such as reading and writing of sparse and dense arrays.

Once the TileDB R package is installed, it can be loaded via `library(tiledb)`

. Installation is supported on Linux and macOS.

Documentation for the TileDB R package is available via the `help()`

function from within R as well as via the package documentation and an introductory notebook. Documentation about TileDB itself is also available.

Several “quickstart” examples that are discussed on the website are available in the examples directory. This vignette discusses similar examples.

In the following examples, the URIs describing arrays point to local file system object. When TileDB has been built with S3 support, and with proper AWS credentials in the usual environment variables, URIs such as `s3://some/data/bucket`

can be used where a local file would be used. See the script examples/ex_S3.R for an example.

We can consider the file `ex_1.R`

in the examples directory. It is a simple yet complete example extending `quickstart_dense.R`

by adding a second attribute.

*Read 1-D*

Extracts column 2 and rows 1 to 2 from A, returning a list object as there are multiple attributes.

```
R> A[1:2, 2]
$a
[,1]
[1,] 11
[2,] 12
$b
[,1]
[1,] 111
[2,] 112
$c
[,1]
[1,] "k"
[2,] "l"
```

Subset the returned list via `[[var]]`

or `$var`

. Numeric index also works.

```
R> A[1:2, 2][["a"]]
[,1]
[1,] 11
[2,] 12
R> A[1:2, 2]$a
[,1]
[1,] 11
[2,] 12
```

The two-dimensional indexing retains a matrix structure, but this can be overridden by setting `drop=TRUE`

which works for either example.

```
R> A[1:2, 2, drop=TRUE]$a
[1] 11 12
R>
```

The result is now a vector of the attribute type.

*Read 2-D*

This works analogously. But not selecting an attribute we now get a list of matrices.

```
R> A[6:9, 3:4]
$a
[,1] [,2]
[1,] 26 36
[2,] 27 37
[3,] 28 38
[4,] 29 39
$b
[,1] [,2]
[1,] 126 136
[2,] 127 137
[3,] 128 138
[4,] 129 139
$c
[,1] [,2]
[1,] "z" "H"
[2,] "brown" "I"
[3,] "fox" "J"
[4,] "A" "K"
```

We can restrict the selection to a subset of attributes when opening the array.

```
R> A <- tiledb_dense(uri = uri, attrs = c("b","c"))
R> A[6:9, 2:4]
$b
[,1] [,2] [,3]
[1,] 116 126 136
[2,] 117 127 137
[3,] 118 128 138
[4,] 119 129 139
$c
[,1] [,2] [,3]
[1,] "p" "z" "H"
[2,] "q" "brown" "I"
[3,] "r" "fox" "J"
[4,] "s" "A" "K"
```

We can also ask for data.frame objects by setting `as.data.frame=TRUE`

when opening the array.

```
R> A[6:9, 3:4]
a b c
1 26 126 z
2 27 127 brown
3 28 128 fox
4 29 129 A
5 36 136 H
6 37 137 I
7 38 138 J
8 39 139 K
```

This scheme can be generalized to variable cells, or cells where N>1, as we can expand each (atomistic) value over corresponding row and column indices.

The column types correspond to the attribute typed in the array schema, subject to the constraint mentioned above on R types. (The char comes in as a factor variable as is still the R 3.6.* default which is about to change. We can also override, users can too.)

```
R> sapply(A[6:9, 3:4], "class")
a b c rows cols
"integer" "numeric" "factor" "integer" "integer"
```

Consistent with the `data.frame`

semantics, *now* requesting a named column *reduces to a vector* as this happens at the R side:

```
R> A[1:3, 2:5]$b
[1] 111 112 113 121 122 123 131 132 133 141 142 143
```

The attribute selection works with `as.data.frame=TRUE`

as well:

```
R> A <- tiledb_dense(uri = uri, as.data.frame = TRUE, attrs = c("b","c"))
R> A[6:9, 2:4]
b c
1 116 p
2 117 q
3 118 r
4 119 s
5 126 z
6 127 brown
7 128 fox
8 129 A
9 136 H
10 137 I
11 138 J
12 139 K
```

*Simple Examples*

Basic reading returns the coordinates and any attributes. The following examples use the array created by the quickstart_sparse example.

```
R> A <- tiledb_sparse(uri = uri)
R> A[]
$coords
[1] 1 1 2 3 2 4
$a
[1] 1 3 2
```

We can also request a data.frame object, either when opening or by changing this object characteristic on the fly:

```
R> return.data.frame(A) <- TRUE
R> A[]
a rows cols
1 1 1 1
2 3 2 3
3 2 2 4
```

For sparse arrays, the return type is by default ‘extended’ showing rows and column but this can be overridden.

Assignment works similarly:

```
R> A[4,2] <- 42L
R> A[]
a rows cols
1 1 1 1
2 42 4 2
3 3 2 3
4 2 2 4
```

Reads can select rows and or columns:

```
R> A[2,]
a rows cols
1 3 2 3
2 2 2 4
R> A[,2]
a rows cols
1 42 4 2
```

Attributes can be selected similarly.

Similar to the dense array case described earlier, the file `ex_2.R`

illustrates some basic operations on sparse arrays. It also shows date and datetime types instead of just integer and double precision floats.

```
R> A <- tiledb_sparse(uri = uri, as.data.frame = TRUE)
R> A[1577858580:1577858700] # POSIX time seconds
a b d e rows
1 3 103 2020-01-11 2020-01-02 18:24:33.844293 1577858580
2 4 104 2020-01-15 2020-01-05 02:28:36.215681 1577858640
3 5 105 2020-01-19 2020-01-05 00:44:04.805775 1577858700
```

The row coordinate is currently a floating point representation of the underlying time type. We can both select attributes (here we excluded the “a” column) and select rows by time (as the time stamps get converted to the required floating point value).

```
R> attrs(A) <- c("b", "d", "e")
R> A[as.POSIXct("2020-01-01 00:01:00"):as.POSIXct("2020-01-01 00:03:00")]
b d e rows
1 101 2020-01-05 2020-01-01 03:03:07.548390 1577858460
2 102 2020-01-10 2020-01-02 21:02:19.748134 1577858520
3 103 2020-01-11 2020-01-02 18:24:33.844293 1577858580
```

More extended examples are available showing indexing by date(time) as well as character dimension.

The TileDB R package is documented via R help functions (*e.g.* `help("tiledb_sparse")`

shows information for the `tiledb_sparse()`

function) as well as via a website regrouping all documentation. An extended notebook is available, as are a numb examples/ directory.

TileDB itself has extensive installation, quickstart, and overall documentation as well as a support forum.