flat_index

vector_search.flat_index

FlatIndex implementation.

Stores all vectors in a 2D TileDB array performing exhaustive similarity search between the query vectors and all the dataset vectors.

Classes

Name Description
FlatIndex Opens a FlatIndex loading all dataset vectors in main memory.

FlatIndex

vector_search.flat_index.FlatIndex(self, uri, config=None, timestamp=None, open_for_remote_query_execution=False, group=None, **kwargs)

Opens a FlatIndex loading all dataset vectors in main memory.

Parameters

Name Type Description Default
uri str URI of the index. required
config Optional[Mapping[str, Any]] TileDB config dictionary. None
timestamp If int, open the index at a given timestamp. If tuple, open at the given start and end timestamps. None
open_for_remote_query_execution bool If True, do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-None driver_mode is passed to query(). If False, load index data in main memory locally. Note that you can still use a taskgraph for query execution, you’ll just end up loading the data both on your local machine and in the cloud taskgraph. False

Methods

Name Description
get_dimensions Returns the dimension of the vectors in the index.
query_internal Queries a FlatIndex using the vectors already loaded in main memory.
vacuum The vacuuming process permanently deletes index files that are consolidated through the consolidation
get_dimensions

vector_search.flat_index.FlatIndex.get_dimensions()

Returns the dimension of the vectors in the index.

query_internal

vector_search.flat_index.FlatIndex.query_internal(queries, k=10, nthreads=8, **kwargs)

Queries a FlatIndex using the vectors already loaded in main memory.

Parameters
Name Type Description Default
queries np.ndarray 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. required
k int Number of results to return per query vector. 10
nthreads int Number of threads to use for query execution. 8
vacuum

vector_search.flat_index.FlatIndex.vacuum()

The vacuuming process permanently deletes index files that are consolidated through the consolidation process. TileDB separates consolidation from vacuuming, in order to make consolidation process-safe in the presence of concurrent reads and writes.

Note:

  1. Vacuuming is not process-safe and you should take extra care when invoking it.
  2. Vacuuming may affect the granularity of the time traveling functionality.

The Flat class vacuums consolidated fragment, array metadata and commits for the db and ids arrays.

Functions

Name Description
create Creates an empty FlatIndex.

create

vector_search.flat_index.create(uri, dimensions, vector_type, group_exists=False, group=None, config=None, storage_version=STORAGE_VERSION, distance_metric=vspy.DistanceMetric.SUM_OF_SQUARES, asset_creation_threads=None, **kwargs)

Creates an empty FlatIndex.

Parameters

Name Type Description Default
uri str URI of the index. required
dimensions int Number of dimensions for the vectors to be stored in the index. required
vector_type np.dtype Datatype of vectors. Supported values (uint8, int8, float32). required
group_exists bool If False it creates the TileDB group for the index. If True the method expects the TileDB group to be already created. False
config Optional[Mapping[str, Any]] TileDB config dictionary. None
storage_version str The TileDB vector search storage version to use. If not provided, use hte latest stable storage version. STORAGE_VERSION
distance_metric vspy.DistanceMetric Distance metric to use for the index. If not provided, use L2 distance. vspy.DistanceMetric.SUM_OF_SQUARES
group tiledb.Group TileDB group open in write mode. Internal, this is used to avoid opening the group multiple times during ingestion. None
asset_creation_threads Sequence[Thread] List of asset creation threads to append new threads. Internal, this is used to parallelize all asset creation during ingestion. None