flat_index

vector_search.flat_index

FlatIndex implementation.

Stores all vectors in a 2D TileDB array performing exhaustive similarity search between the query vectors and all the dataset vectors.

Classes

Name	Description
FlatIndex	Opens a `FlatIndex` loading all dataset vectors in main memory.

FlatIndex

vector_search.flat_index.FlatIndex(self, uri, config=None, timestamp=None, open_for_remote_query_execution=False, group=None, **kwargs)

Opens a FlatIndex loading all dataset vectors in main memory.

Parameters

Name	Type	Description	Default
`uri`	str	URI of the index.	required
`config`	Optional[Mapping[str, Any]]	TileDB config dictionary.	`None`
`timestamp`		If int, open the index at a given timestamp. If tuple, open at the given start and end timestamps.	`None`
`open_for_remote_query_execution`	bool	If `True`, do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-`None` `driver_mode` is passed to `query()`. If `False`, load index data in main memory locally. Note that you can still use a taskgraph for query execution, you’ll just end up loading the data both on your local machine and in the cloud taskgraph.	`False`

Methods

Name	Description
get_dimensions	Returns the dimension of the vectors in the index.
query_internal	Queries a FlatIndex using the vectors already loaded in main memory.
vacuum	The vacuuming process permanently deletes index files that are consolidated through the consolidation

get_dimensions

vector_search.flat_index.FlatIndex.get_dimensions()

Returns the dimension of the vectors in the index.

query_internal

vector_search.flat_index.FlatIndex.query_internal(queries, k=10, nthreads=8, **kwargs)

Queries a FlatIndex using the vectors already loaded in main memory.

Parameters

Name	Type	Description	Default
`queries`	np.ndarray	2D array of query vectors. This can be used as a batch query interface by passing multiple queries in one call.	required
`k`	int	Number of results to return per query vector.	`10`
`nthreads`	int	Number of threads to use for query execution.	`8`

vacuum

vector_search.flat_index.FlatIndex.vacuum()

The vacuuming process permanently deletes index files that are consolidated through the consolidation process. TileDB separates consolidation from vacuuming, in order to make consolidation process-safe in the presence of concurrent reads and writes.

Note:

Vacuuming is not process-safe and you should take extra care when invoking it.
Vacuuming may affect the granularity of the time traveling functionality.

The Flat class vacuums consolidated fragment, array metadata and commits for the db and ids arrays.

Functions

Name	Description
create	Creates an empty FlatIndex.

create

vector_search.flat_index.create(uri, dimensions, vector_type, group_exists=False, group=None, config=None, storage_version=STORAGE_VERSION, distance_metric=vspy.DistanceMetric.SUM_OF_SQUARES, asset_creation_threads=None, **kwargs)

Creates an empty FlatIndex.

Parameters

Name	Type	Description	Default
`uri`	str	URI of the index.	required
`dimensions`	int	Number of dimensions for the vectors to be stored in the index.	required
`vector_type`	np.dtype	Datatype of vectors. Supported values (uint8, int8, float32).	required
`group_exists`	bool	If False it creates the TileDB group for the index. If True the method expects the TileDB group to be already created.	`False`
`config`	Optional[Mapping[str, Any]]	TileDB config dictionary.	`None`
`storage_version`	str	The TileDB vector search storage version to use. If not provided, use hte latest stable storage version.	`STORAGE_VERSION`
`distance_metric`	vspy.DistanceMetric	Distance metric to use for the index. If not provided, use L2 distance.	`vspy.DistanceMetric.SUM_OF_SQUARES`
`group`	tiledb.Group	TileDB group open in write mode. Internal, this is used to avoid opening the group multiple times during ingestion.	`None`
`asset_creation_threads`	Sequence[Thread]	List of asset creation threads to append new threads. Internal, this is used to parallelize all asset creation during ingestion.	`None`