index.Index
vector_search.index.Index(self, uri, config=None, timestamp=None)
Abstract Vector Index class.
All Vector Index algorithm implementations are instantiations of this class. Apart from the abstract method interfaces, Index
provides implementations for common tasks i.e. supporting updates, time-traveling and metadata management.
Opens an Index
reading metadata and applying time-traveling options.
Do not use this directly but rather instantiate the concrete Index classes.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | URI of the index. | required |
config |
Optional[Mapping[str, Any]] | TileDB config dictionary. | None |
timestamp |
If int, open the index at a given timestamp. If tuple, open at the given start and end timestamps. | None |
Methods
Name | Description |
---|---|
clear_history | Clears the history maintained in a Vector Index based on its URI. |
consolidate_updates | Consolidates updates by merging updates form the updates table into the base index. |
delete | Deletes a vector by its external_id . |
delete_batch | Deletes vectors by their external_ids . |
delete_index | Deletes an index from storage based on its URI. |
get_dimensions | Abstract method implemented by all Vector Index implementations. |
query | Queries an index with a set of query vectors, retrieving the k most similar vectors for each query. |
query_internal | Abstract method implemented by all Vector Index implementations. |
update | Updates a vector by its external_id . |
update_batch | Updates a set vectors by their external_ids . |
clear_history
vector_search.index.Index.clear_history(uri, timestamp, config=None)
Clears the history maintained in a Vector Index based on its URI.
This clears the update history before the provided timestamp
.
Use this in collaboration with consolidate_updates
to periodically cleanup update history.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | URI of the index. | required |
timestamp |
int | Clears update history before this timestamp. | required |
consolidate_updates
vector_search.index.Index.consolidate_updates(retrain_index=False, **kwargs)
Consolidates updates by merging updates form the updates table into the base index.
The consolidation process is used to avoid query latency degradation as more updates are added to the index. It triggers a base index re-indexing, merging the non-consolidated updates and the rest of the base vectors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
retrain_index |
bool | If true, retrain the index. If false, reuse data from the previous index. For IVF_FLAT retraining means we will recompute the centroids - when doing so you can pass any ingest() arguments used to configure computing centroids and we will use them when recomputing the centroids. Otherwise, if false, we will reuse the centroids from the previous index. | False |
**kwargs |
Extra kwargs passed here are passed to ingest function. |
{} |
delete
vector_search.index.Index.delete(external_id, timestamp=None)
Deletes a vector by its external_id
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
external_id |
np.uint64 | External ID of the vector to be deleted. | required |
timestamp |
int | Timestamp to use for the deletes to take place at. | None |
delete_batch
vector_search.index.Index.delete_batch(external_ids, timestamp=None)
Deletes vectors by their external_ids
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
external_ids |
np.array | External IDs of the vectors to be deleted. | required |
timestamp |
int | Timestamp to use for the deletes to take place at. | None |
delete_index
vector_search.index.Index.delete_index(uri, config=None)
Deletes an index from storage based on its URI.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | URI of the index. | required |
config |
Optional[Mapping[str, Any]] | TileDB config dictionary. | None |
get_dimensions
vector_search.index.Index.get_dimensions()
Abstract method implemented by all Vector Index implementations.
Returns the dimension of the vectors in the index.
query
vector_search.index.Index.query(queries, k, **kwargs)
Queries an index with a set of query vectors, retrieving the k
most similar vectors for each query.
This provides an algorithm-agnostic implementation for updates:
- Queries the non-consolidated updates table.
- Calls the algorithm specific implementation of
query_internal
to query the base data. - Merges the results applying the updated data.
Parameters
Name | Type | Description | Default |
---|---|---|---|
queries |
np.ndarray | 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. | required |
k |
int | Number of results to return per query vector. | required |
**kwargs |
Extra kwargs passed here are passed to the query_internal implementation of the concrete index class. |
{} |
query_internal
vector_search.index.Index.query_internal(queries, k, **kwargs)
Abstract method implemented by all Vector Index implementations.
Queries the base index with a set of query vectors, retrieving the k
most similar vectors for each query.
Parameters
Name | Type | Description | Default |
---|---|---|---|
queries |
np.ndarray | 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. | required |
k |
int | Number of results to return per query vector. | required |
**kwargs |
Extra kwargs passed here for each algorithm implementation. | {} |
update
vector_search.index.Index.update(vector, external_id, timestamp=None)
Updates a vector
by its external_id
.
This can be used to add new vectors or update an existing vector with the same external_id
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
vector |
np.array | Vector data to be updated. | required |
external_id |
np.uint64 | External ID of the vector. | required |
timestamp |
int | Timestamp to use for the update to take place at. | None |
update_batch
vector_search.index.Index.update_batch(vectors, external_ids, timestamp=None)
Updates a set vectors
by their external_ids
.
This can be used to add new vectors or update existing vectors with the same external_id
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
vectors |
np.ndarray | 2D array containing the vectors to be updated. | required |
external_ids |
np.array | External IDs of the vectors. | required |
timestamp |
int | Timestamp to use for the updates to take place at. | None |