index.Index

vector_search.index.Index(self, uri, open_for_remote_query_execution=False, config=None, timestamp=None, group=None)

Abstract Vector Index class. Do not use this directly but rather use the open factory method.

All Vector Index algorithm implementations are instantiations of this class. Apart from the abstract method interfaces, Index provides implementations for common tasks i.e. supporting updates, time-traveling and metadata management.

Opens an Index reading metadata and applying time-traveling options.

Parameters

Name	Type	Description	Default
`uri`	str	URI of the index.	required
`config`	Optional[Mapping[str, Any]]	TileDB config dictionary.	`None`
`timestamp`		If int, open the index at a given timestamp. If tuple, open at the given start and end timestamps.	`None`
`open_for_remote_query_execution`	bool	If `True`, do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-`None` `driver_mode` is passed to `query()`. If `False`, load index data in main memory locally. Note that you can still use a taskgraph for query execution, you’ll just end up loading the data both on your local machine and in the cloud taskgraph.	`False`

Methods

Name	Description
clear_history	Clears the history maintained in a Vector Index based on its URI.
consolidate_updates	Consolidates updates by merging updates form the updates table into the base index.
delete	Deletes a vector by its `external_id`.
delete_batch	Deletes vectors by their `external_ids`.
delete_index	Deletes an index from storage based on its URI.
get_dimensions	Abstract method implemented by all Vector Index implementations.
query	Queries an index with a set of query vectors, retrieving the `k` most similar vectors for each query.
query_internal	Abstract method implemented by all Vector Index implementations.
update	Updates a `vector` by its `external_id`.
update_batch	Updates a set `vectors` by their `external_ids`.
vacuum	The vacuuming process permanently deletes index files that are consolidated through the consolidation

clear_history

vector_search.index.Index.clear_history(uri, timestamp, config=None)

Clears the history maintained in a Vector Index based on its URI.

This clears the update history before the provided timestamp.

Use this in collaboration with consolidate_updates to periodically cleanup update history.

Parameters

Name	Type	Description	Default
`uri`	str	URI of the index.	required
`timestamp`	int	Clears update history before this timestamp.	required

consolidate_updates

vector_search.index.Index.consolidate_updates(retrain_index=False, **kwargs)

Consolidates updates by merging updates form the updates table into the base index.

The consolidation process is used to avoid query latency degradation as more updates are added to the index. It triggers a base index re-indexing, merging the non-consolidated updates and the rest of the base vectors.

TODO(sc-51202): This throws with a unintuitive error message if update()/delete()/etc. has not been called.

Parameters

Name	Type	Description	Default
`retrain_index`	bool	If true, retrain the index. If false, reuse data from the previous index. For IVF_FLAT retraining means we will recompute the centroids - when doing so you can pass any ingest() arguments used to configure computing centroids and we will use them when recomputing the centroids. Otherwise, if false, we will reuse the centroids from the previous index.	`False`
`**kwargs`		Extra kwargs passed here are passed to `ingest` function.	`{}`

delete

vector_search.index.Index.delete(external_id, timestamp=None)

Deletes a vector by its external_id.

Parameters

Name	Type	Description	Default
`external_id`	np.uint64	External ID of the vector to be deleted.	required
`timestamp`	int	Timestamp to use for the deletes to take place at.	`None`

delete_batch

vector_search.index.Index.delete_batch(external_ids, timestamp=None)

Deletes vectors by their external_ids.

Parameters

Name	Type	Description	Default
`external_ids`	np.array	External IDs of the vectors to be deleted.	required
`timestamp`	int	Timestamp to use for the deletes to take place at.	`None`

delete_index

vector_search.index.Index.delete_index(uri, config=None)

Deletes an index from storage based on its URI.

Parameters

Name	Type	Description	Default
`uri`	str	URI of the index.	required
`config`	Optional[Mapping[str, Any]]	TileDB config dictionary.	`None`

get_dimensions

vector_search.index.Index.get_dimensions()

Abstract method implemented by all Vector Index implementations.

Returns the dimension of the vectors in the index.

query

vector_search.index.Index.query(queries, k, driver_mode=None, driver_resource_class=None, driver_resources=None, driver_access_credentials_name=None, **kwargs)

Queries an index with a set of query vectors, retrieving the k most similar vectors for each query.

This provides an algorithm-agnostic implementation for updates:

Queries the non-consolidated updates table.
Calls the algorithm specific implementation of query_internal to query the base data.
Merges the results applying the updated data.

You can control where the query is executed by setting the driver_mode parameter: - With driver_mode = None, the driver logic for the query will be executed locally. - If driver_mode is not None, we will use a TileDB cloud taskgraph to re-open the index and run the query. With both options, certain implementations, i.e. IVF Flat, may let you create further TileDB taskgraphs as defined in the implementation specific query_internal methods.

Parameters

Name	Type	Description	Default
`queries`	np.ndarray	2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call.	required
`k`	int	Number of results to return per query vector.	required
`driver_mode`	Optional[Mode]	If not `None`, the query will be executed in a TileDB cloud taskgraph using the driver mode specified.	`None`
`driver_resource_class`	Optional[str]	If `driver_mode` was `REALTIME`, the resources class (`standard` or `large`) to use for the driver execution.	`None`
`driver_resources`	Optional[Mapping[str, Any]]	If `driver_mode` was `BATCH`, the resources to use for the driver execution. Example `{"cpu": "1", "memory": "4Gi"}`	`None`
`driver_access_credentials_name`	Optional[str]	If `driver_mode` was not `None`, the access credentials name to use for the driver execution.	`None`
`**kwargs`		Extra kwargs passed here are passed to the `query_internal` implementation of the concrete index class.	`{}`

query_internal

vector_search.index.Index.query_internal(queries, k, **kwargs)

Abstract method implemented by all Vector Index implementations.

Queries the base index with a set of query vectors, retrieving the k most similar vectors for each query.

Parameters

Name	Type	Description	Default
`queries`	np.ndarray	2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call.	required
`k`	int	Number of results to return per query vector.	required
`**kwargs`		Extra kwargs passed here for each algorithm implementation.	`{}`

update

vector_search.index.Index.update(vector, external_id, timestamp=None)

Updates a vector by its external_id.

This can be used to add new vectors or update an existing vector with the same external_id.

Parameters

Name	Type	Description	Default
`vector`	np.array	Vector data to be updated.	required
`external_id`	np.uint64	External ID of the vector.	required
`timestamp`	int	Timestamp to use for the update to take place at.	`None`

update_batch

vector_search.index.Index.update_batch(vectors, external_ids, timestamp=None)

Updates a set vectors by their external_ids.

This can be used to add new vectors or update existing vectors with the same external_id.

Parameters

Name	Type	Description	Default
`vectors`	np.ndarray	2D array containing the vectors to be updated.	required
`external_ids`	np.array	External IDs of the vectors.	required
`timestamp`	int	Timestamp to use for the updates to take place at.	`None`

vacuum

vector_search.index.Index.vacuum()

The vacuuming process permanently deletes index files that are consolidated through the consolidation process. TileDB separates consolidation from vacuuming, in order to make consolidation process-safe in the presence of concurrent reads and writes.

Note:

Vacuuming is not process-safe and you should take extra care when invoking it.
Vacuuming may affect the granularity of the time traveling functionality.

The Index class vacuums consolidated fragments of the updates array.