Benchmarks
We have implemented a big-ann-benchmarks interface for TileDB-Vector-Search, which is available in the tiledb
branch of our fork:
- https://github.com/TileDB-Inc/big-ann-benchmarks/tree/tiledb. This interface implements two new algorithms:
tiledb-flat
andtiledb-ivf-flat
, which are usable within the framework’s runner.
Building
- Build the
Dockerfile
at the root of this repository:
cd tiledb-vector-search
docker build -f Dockerfile . -t tiledb_vs
- Build the TileDB docker image in the big-ann fork (requires image from step 1):
git clone https://github.com/TileDB-Inc/big-ann-benchmarks/tree/tiledb
cd big-ann-benchmarks
docker build -f install/Dockerfile.tiledb . -t billion-scale-benchmark-tiledb
Running benchmarks
Create a local dataset.
note: the
create_dataset.py
command will download remote files the first time it runs, some of which can total >100GB). Use--skip-data
to avoid downloading the large base set.This command will download 7.7MB of data:
python create_dataset.py --dataset bigann-10M --skip-data
- Run the benchmarks, choosing either
tiledb-flat
ortiledb-ivf-flat
:
python run.py --dataset bigann-10M --algorithm tiledb-flat