vamana_index
vector_search.vamana_index
Vamana Index implementation.
Vamana is based on Microsoft’s DiskANN vector search library, as described in these papers:
Subramanya, Suhas Jayaram, and Rohan Kadekodi. DiskANN: Fast Accurate Billion-Point Nearest Neighbor Search on a Single Node.
Singh, Aditi, et al. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search. arXiv:2105.09613, arXiv, 20 May 2021, http://arxiv.org/abs/2105.09613.
Gollapudi, Siddharth, et al. “Filtered-DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters.” Proceedings of the ACM Web Conference 2023, ACM, 2023, pp. 3406-16, https://doi.org/10.1145/3543507.3583552.
Classes
Name | Description |
---|---|
VamanaIndex | Opens a VamanaIndex . |
VamanaIndex
vector_search.vamana_index.VamanaIndex(self, uri, config=None, timestamp=None, open_for_remote_query_execution=False, group=None, **kwargs)
Opens a VamanaIndex
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | URI of the index. | required |
config |
Optional[Mapping[str, Any]] | TileDB config dictionary. | None |
open_for_remote_query_execution |
bool | If True , do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-None driver_mode is passed to query() . If False , load index data in main memory locally. Note that you can still use a taskgraph for query execution, you’ll just end up loading the data both on your local machine and in the cloud taskgraph. |
False |
Methods
Name | Description |
---|---|
get_dimensions | Returns the dimension of the vectors in the index. |
query_internal | Queries a VamanaIndex . |
get_dimensions
vector_search.vamana_index.VamanaIndex.get_dimensions()
Returns the dimension of the vectors in the index.
query_internal
vector_search.vamana_index.VamanaIndex.query_internal(queries, k=10, l_search=L_SEARCH_DEFAULT, **kwargs)
Queries a VamanaIndex
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
queries |
np.ndarray | 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. | required |
k |
int | Number of results to return per query vector. | 10 |
l_search |
Optional[int] | How deep to search. Larger parameters will result in slower latencies, but higher accuracies. Should be >= k, and if it’s not, we will set it to k. | L_SEARCH_DEFAULT |
Functions
Name | Description |
---|---|
create | Creates an empty VamanaIndex. |
create
vector_search.vamana_index.create(uri, dimensions, vector_type, l_build=L_BUILD_DEFAULT, r_max_degree=R_MAX_DEGREE_DEFAULT, config=None, storage_version=STORAGE_VERSION, distance_metric=vspy.DistanceMetric.SUM_OF_SQUARES, **kwargs)
Creates an empty VamanaIndex.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | URI of the index. | required |
dimensions |
int | Number of dimensions for the vectors to be stored in the index. | required |
vector_type |
np.dtype | Datatype of vectors. Supported values (uint8, int8, float32). | required |
l_build |
int | The number of neighbors considered for each node during construction of the graph. Larger values will take more time to build but result in indices that provide higher recall for the same search complexity. l_build should be >= r_max_degree unless you need to build indices quickly and can compromise on quality. Typically between 75 and 200. If not provided, use the default value of 100. | L_BUILD_DEFAULT |
r_max_degree |
int | The maximum degree for each node in the final graph. Larger values will result in larger indices and longer indexing times, but better search quality. Typically between 60 and 150. If not provided, use the default value of 64. | R_MAX_DEGREE_DEFAULT |
config |
Optional[Mapping[str, Any]] | TileDB config dictionary. | None |
storage_version |
str | The TileDB vector search storage version to use. If not provided, use the latest stable storage version. | STORAGE_VERSION |