Benchmarks
We have implemented a big-ann-benchmarks interface for TileDB-Vector-Search, which is available in the tiledb branch of our fork:
- https://github.com/TileDB-Inc/big-ann-benchmarks/tree/tiledb. This interface implements two new algorithms:
tiledb-flatandtiledb-ivf-flat, which are usable within the framework’s runner.
Building
- Build the
Dockerfileat the root of this repository:
cd tiledb-vector-search
docker build -f Dockerfile . -t tiledb_vs
- Build the TileDB docker image in the big-ann fork (requires image from step 1):
git clone https://github.com/TileDB-Inc/big-ann-benchmarks/tree/tiledb
cd big-ann-benchmarks
docker build -f install/Dockerfile.tiledb . -t billion-scale-benchmark-tiledb
Running benchmarks
Create a local dataset.
note: the
create_dataset.pycommand will download remote files the first time it runs, some of which can total >100GB). Use--skip-datato avoid downloading the large base set.This command will download 7.7MB of data:
python create_dataset.py --dataset bigann-10M --skip-data
- Run the benchmarks, choosing either
tiledb-flatortiledb-ivf-flat:
python run.py --dataset bigann-10M --algorithm tiledb-flat