flat_index
vector_search.flat_index
FlatIndex implementation.
Stores all vectors in a 2D TileDB array performing exhaustive similarity search between the query vectors and all the dataset vectors.
Classes
Name | Description |
---|---|
FlatIndex | Opens a FlatIndex loading all dataset vectors in main memory. |
FlatIndex
vector_search.flat_index.FlatIndex(self, uri, config=None, timestamp=None, open_for_remote_query_execution=False, group=None, **kwargs)
Opens a FlatIndex
loading all dataset vectors in main memory.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | URI of the index. | required |
config |
Optional[Mapping[str, Any]] | TileDB config dictionary. | None |
timestamp |
If int, open the index at a given timestamp. If tuple, open at the given start and end timestamps. | None |
|
open_for_remote_query_execution |
bool | If True , do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-None driver_mode is passed to query() . If False , load index data in main memory locally. Note that you can still use a taskgraph for query execution, you’ll just end up loading the data both on your local machine and in the cloud taskgraph. |
False |
Methods
Name | Description |
---|---|
get_dimensions | Returns the dimension of the vectors in the index. |
query_internal | Queries a FlatIndex using the vectors already loaded in main memory. |
vacuum | The vacuuming process permanently deletes index files that are consolidated through the consolidation |
get_dimensions
vector_search.flat_index.FlatIndex.get_dimensions()
Returns the dimension of the vectors in the index.
query_internal
vector_search.flat_index.FlatIndex.query_internal(queries, k=10, nthreads=8, **kwargs)
Queries a FlatIndex using the vectors already loaded in main memory.
Parameters
Name | Type | Description | Default |
---|---|---|---|
queries |
np.ndarray | 2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call. | required |
k |
int | Number of results to return per query vector. | 10 |
nthreads |
int | Number of threads to use for query execution. | 8 |
vacuum
vector_search.flat_index.FlatIndex.vacuum()
The vacuuming process permanently deletes index files that are consolidated through the consolidation process. TileDB separates consolidation from vacuuming, in order to make consolidation process-safe in the presence of concurrent reads and writes.
Note:
- Vacuuming is not process-safe and you should take extra care when invoking it.
- Vacuuming may affect the granularity of the time traveling functionality.
The Flat class vacuums consolidated fragment, array metadata and commits for the db
and ids
arrays.
Functions
Name | Description |
---|---|
create | Creates an empty FlatIndex. |
create
vector_search.flat_index.create(uri, dimensions, vector_type, group_exists=False, group=None, config=None, storage_version=STORAGE_VERSION, distance_metric=vspy.DistanceMetric.SUM_OF_SQUARES, asset_creation_threads=None, **kwargs)
Creates an empty FlatIndex.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | URI of the index. | required |
dimensions |
int | Number of dimensions for the vectors to be stored in the index. | required |
vector_type |
np.dtype | Datatype of vectors. Supported values (uint8, int8, float32). | required |
group_exists |
bool | If False it creates the TileDB group for the index. If True the method expects the TileDB group to be already created. | False |
config |
Optional[Mapping[str, Any]] | TileDB config dictionary. | None |
storage_version |
str | The TileDB vector search storage version to use. If not provided, use hte latest stable storage version. | STORAGE_VERSION |
distance_metric |
vspy.DistanceMetric | Distance metric to use for the index. If not provided, use L2 distance. | vspy.DistanceMetric.SUM_OF_SQUARES |
group |
tiledb.Group | TileDB group open in write mode. Internal, this is used to avoid opening the group multiple times during ingestion. | None |
asset_creation_threads |
Sequence[Thread] | List of asset creation threads to append new threads. Internal, this is used to parallelize all asset creation during ingestion. | None |