index.Index

vector_search.index.Index(self, uri, config=None, timestamp=None)

Abstract Vector Index class.

All Vector Index algorithm implementations are instantiations of this class. Apart from the abstract method interfaces, Index provides implementations for common tasks i.e. supporting updates, time-traveling and metadata management.

Opens an Index reading metadata and applying time-traveling options.

Do not use this directly but rather instantiate the concrete Index classes.

Parameters

Name Type Description Default
uri str URI of the index. required
config Optional[Mapping[str, Any]] TileDB config dictionary. None
timestamp If int, open the index at a given timestamp. If tuple, open at the given start and end timestamps. None

Methods

Name Description
clear_history Clears the history maintained in a Vector Index based on its URI.
consolidate_updates Consolidates updates by merging updates form the updates table into the base index.
delete Deletes a vector by its external_id.
delete_batch Deletes vectors by their external_ids.
delete_index Deletes an index from storage based on its URI.
get_dimensions Abstract method implemented by all Vector Index implementations.
query Queries an index with a set of query vectors, retrieving the k most similar vectors for each query.
query_internal Abstract method implemented by all Vector Index implementations.
update Updates a vector by its external_id.
update_batch Updates a set vectors by their external_ids.

clear_history

vector_search.index.Index.clear_history(uri, timestamp, config=None)

Clears the history maintained in a Vector Index based on its URI.

This clears the update history before the provided timestamp.

Use this in collaboration with consolidate_updates to periodically cleanup update history.

Parameters

Name Type Description Default
uri str URI of the index. required
timestamp int Clears update history before this timestamp. required

consolidate_updates

vector_search.index.Index.consolidate_updates(retrain_index=False, **kwargs)

Consolidates updates by merging updates form the updates table into the base index.

The consolidation process is used to avoid query latency degradation as more updates are added to the index. It triggers a base index re-indexing, merging the non-consolidated updates and the rest of the base vectors.

Parameters

Name Type Description Default
retrain_index bool If true, retrain the index. If false, reuse data from the previous index. For IVF_FLAT retraining means we will recompute the centroids - when doing so you can pass any ingest() arguments used to configure computing centroids and we will use them when recomputing the centroids. Otherwise, if false, we will reuse the centroids from the previous index. False
**kwargs Extra kwargs passed here are passed to ingest function. {}

delete

vector_search.index.Index.delete(external_id, timestamp=None)

Deletes a vector by its external_id.

Parameters

Name Type Description Default
external_id np.uint64 External ID of the vector to be deleted. required
timestamp int Timestamp to use for the deletes to take place at. None

delete_batch

vector_search.index.Index.delete_batch(external_ids, timestamp=None)

Deletes vectors by their external_ids.

Parameters

Name Type Description Default
external_ids np.array External IDs of the vectors to be deleted. required
timestamp int Timestamp to use for the deletes to take place at. None

delete_index

vector_search.index.Index.delete_index(uri, config=None)

Deletes an index from storage based on its URI.

Parameters

Name Type Description Default
uri str URI of the index. required
config Optional[Mapping[str, Any]] TileDB config dictionary. None

get_dimensions

vector_search.index.Index.get_dimensions()

Abstract method implemented by all Vector Index implementations.

Returns the dimension of the vectors in the index.

query

vector_search.index.Index.query(queries, k, **kwargs)

Queries an index with a set of query vectors, retrieving the k most similar vectors for each query.

This provides an algorithm-agnostic implementation for updates:

  • Queries the non-consolidated updates table.
  • Calls the algorithm specific implementation of query_internal to query the base data.
  • Merges the results applying the updated data.

Parameters

Name Type Description Default
queries np.ndarray 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. required
k int Number of results to return per query vector. required
**kwargs Extra kwargs passed here are passed to the query_internal implementation of the concrete index class. {}

query_internal

vector_search.index.Index.query_internal(queries, k, **kwargs)

Abstract method implemented by all Vector Index implementations.

Queries the base index with a set of query vectors, retrieving the k most similar vectors for each query.

Parameters

Name Type Description Default
queries np.ndarray 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. required
k int Number of results to return per query vector. required
**kwargs Extra kwargs passed here for each algorithm implementation. {}

update

vector_search.index.Index.update(vector, external_id, timestamp=None)

Updates a vector by its external_id.

This can be used to add new vectors or update an existing vector with the same external_id.

Parameters

Name Type Description Default
vector np.array Vector data to be updated. required
external_id np.uint64 External ID of the vector. required
timestamp int Timestamp to use for the update to take place at. None

update_batch

vector_search.index.Index.update_batch(vectors, external_ids, timestamp=None)

Updates a set vectors by their external_ids.

This can be used to add new vectors or update existing vectors with the same external_id.

Parameters

Name Type Description Default
vectors np.ndarray 2D array containing the vectors to be updated. required
external_ids np.array External IDs of the vectors. required
timestamp int Timestamp to use for the updates to take place at. None