index.Index

vector_search.index.Index(self, uri, open_for_remote_query_execution=False, config=None, timestamp=None, group=None)

Abstract Vector Index class. Do not use this directly but rather use the open factory method.

All Vector Index algorithm implementations are instantiations of this class. Apart from the abstract method interfaces, Index provides implementations for common tasks i.e. supporting updates, time-traveling and metadata management.

Opens an Index reading metadata and applying time-traveling options.

Parameters

Name Type Description Default
uri str URI of the index. required
config Optional[Mapping[str, Any]] TileDB config dictionary. None
timestamp If int, open the index at a given timestamp. If tuple, open at the given start and end timestamps. None
open_for_remote_query_execution bool If True, do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-None driver_mode is passed to query(). If False, load index data in main memory locally. Note that you can still use a taskgraph for query execution, you’ll just end up loading the data both on your local machine and in the cloud taskgraph. False

Methods

Name Description
clear_history Clears the history maintained in a Vector Index based on its URI.
consolidate_updates Consolidates updates by merging updates form the updates table into the base index.
delete Deletes a vector by its external_id.
delete_batch Deletes vectors by their external_ids.
delete_index Deletes an index from storage based on its URI.
get_dimensions Abstract method implemented by all Vector Index implementations.
query Queries an index with a set of query vectors, retrieving the k most similar vectors for each query.
query_internal Abstract method implemented by all Vector Index implementations.
update Updates a vector by its external_id.
update_batch Updates a set vectors by their external_ids.
vacuum The vacuuming process permanently deletes index files that are consolidated through the consolidation

clear_history

vector_search.index.Index.clear_history(uri, timestamp, config=None)

Clears the history maintained in a Vector Index based on its URI.

This clears the update history before the provided timestamp.

Use this in collaboration with consolidate_updates to periodically cleanup update history.

Parameters

Name Type Description Default
uri str URI of the index. required
timestamp int Clears update history before this timestamp. required

consolidate_updates

vector_search.index.Index.consolidate_updates(retrain_index=False, **kwargs)

Consolidates updates by merging updates form the updates table into the base index.

The consolidation process is used to avoid query latency degradation as more updates are added to the index. It triggers a base index re-indexing, merging the non-consolidated updates and the rest of the base vectors.

TODO(sc-51202): This throws with a unintuitive error message if update()/delete()/etc. has not been called.

Parameters

Name Type Description Default
retrain_index bool If true, retrain the index. If false, reuse data from the previous index. For IVF_FLAT retraining means we will recompute the centroids - when doing so you can pass any ingest() arguments used to configure computing centroids and we will use them when recomputing the centroids. Otherwise, if false, we will reuse the centroids from the previous index. False
**kwargs Extra kwargs passed here are passed to ingest function. {}

delete

vector_search.index.Index.delete(external_id, timestamp=None)

Deletes a vector by its external_id.

Parameters

Name Type Description Default
external_id np.uint64 External ID of the vector to be deleted. required
timestamp int Timestamp to use for the deletes to take place at. None

delete_batch

vector_search.index.Index.delete_batch(external_ids, timestamp=None)

Deletes vectors by their external_ids.

Parameters

Name Type Description Default
external_ids np.array External IDs of the vectors to be deleted. required
timestamp int Timestamp to use for the deletes to take place at. None

delete_index

vector_search.index.Index.delete_index(uri, config=None)

Deletes an index from storage based on its URI.

Parameters

Name Type Description Default
uri str URI of the index. required
config Optional[Mapping[str, Any]] TileDB config dictionary. None

get_dimensions

vector_search.index.Index.get_dimensions()

Abstract method implemented by all Vector Index implementations.

Returns the dimension of the vectors in the index.

query

vector_search.index.Index.query(queries, k, driver_mode=None, driver_resource_class=None, driver_resources=None, driver_access_credentials_name=None, **kwargs)

Queries an index with a set of query vectors, retrieving the k most similar vectors for each query.

This provides an algorithm-agnostic implementation for updates:

  • Queries the non-consolidated updates table.
  • Calls the algorithm specific implementation of query_internal to query the base data.
  • Merges the results applying the updated data.

You can control where the query is executed by setting the driver_mode parameter: - With driver_mode = None, the driver logic for the query will be executed locally. - If driver_mode is not None, we will use a TileDB cloud taskgraph to re-open the index and run the query. With both options, certain implementations, i.e. IVF Flat, may let you create further TileDB taskgraphs as defined in the implementation specific query_internal methods.

Parameters

Name Type Description Default
queries np.ndarray 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. required
k int Number of results to return per query vector. required
driver_mode Optional[Mode] If not None, the query will be executed in a TileDB cloud taskgraph using the driver mode specified. None
driver_resource_class Optional[str] If driver_mode was REALTIME, the resources class (standard or large) to use for the driver execution. None
driver_resources Optional[Mapping[str, Any]] If driver_mode was BATCH, the resources to use for the driver execution. Example {"cpu": "1", "memory": "4Gi"} None
driver_access_credentials_name Optional[str] If driver_mode was not None, the access credentials name to use for the driver execution. None
**kwargs Extra kwargs passed here are passed to the query_internal implementation of the concrete index class. {}

query_internal

vector_search.index.Index.query_internal(queries, k, **kwargs)

Abstract method implemented by all Vector Index implementations.

Queries the base index with a set of query vectors, retrieving the k most similar vectors for each query.

Parameters

Name Type Description Default
queries np.ndarray 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. required
k int Number of results to return per query vector. required
**kwargs Extra kwargs passed here for each algorithm implementation. {}

update

vector_search.index.Index.update(vector, external_id, timestamp=None)

Updates a vector by its external_id.

This can be used to add new vectors or update an existing vector with the same external_id.

Parameters

Name Type Description Default
vector np.array Vector data to be updated. required
external_id np.uint64 External ID of the vector. required
timestamp int Timestamp to use for the update to take place at. None

update_batch

vector_search.index.Index.update_batch(vectors, external_ids, timestamp=None)

Updates a set vectors by their external_ids.

This can be used to add new vectors or update existing vectors with the same external_id.

Parameters

Name Type Description Default
vectors np.ndarray 2D array containing the vectors to be updated. required
external_ids np.array External IDs of the vectors. required
timestamp int Timestamp to use for the updates to take place at. None

vacuum

vector_search.index.Index.vacuum()

The vacuuming process permanently deletes index files that are consolidated through the consolidation process. TileDB separates consolidation from vacuuming, in order to make consolidation process-safe in the presence of concurrent reads and writes.

Note:

  1. Vacuuming is not process-safe and you should take extra care when invoking it.
  2. Vacuuming may affect the granularity of the time traveling functionality.

The Index class vacuums consolidated fragments of the updates array.