object_readers.ObjectReader

vector_search.object_readers.ObjectReader()

Abstract class that can be used to read Objects from different sources and formats.

Methods

Name Description
get_partitions Returns a list of ObjectPartitions for the reader.
init_kwargs Returns a dictionary containing kwargs that can be used to re-initialize the ObjectReader.
metadata_array_uri Returns the URI of a TileDB array that can be used to read Object metadata.
metadata_attributes Returns a list of TileDB Attributes describing the metadata of the Objects.
partition_class_name Returns the class name of ObjectPartition generated by this ObjectReader.
read_objects Reads the objects corresponding to an ObjectPartition.
read_objects_by_external_ids Reads the objects corresponding to a list of external_ids.

get_partitions

vector_search.object_readers.ObjectReader.get_partitions(**kwargs)

Returns a list of ObjectPartitions for the reader. Each partition can be read independently and used for distributed embedding and ingestion.

init_kwargs

vector_search.object_readers.ObjectReader.init_kwargs()

Returns a dictionary containing kwargs that can be used to re-initialize the ObjectReader.

This is used to serialize the ObjectReader and pass it as argument to UDF tasks.

metadata_array_uri

vector_search.object_readers.ObjectReader.metadata_array_uri()

Returns the URI of a TileDB array that can be used to read Object metadata. This array should have only one external_id dimension and attributes the list of TileDB attributes returned by metadata_attributes.

Returns None, if a metadata array does not exist and should be materialized by object ingestion.

metadata_attributes

vector_search.object_readers.ObjectReader.metadata_attributes()

Returns a list of TileDB Attributes describing the metadata of the Objects.

Returns None, if there are no Object metadata.

partition_class_name

vector_search.object_readers.ObjectReader.partition_class_name()

Returns the class name of ObjectPartition generated by this ObjectReader.

The ObjectPartition class should be defined in the same Python file as the ObjectReader.

read_objects

vector_search.object_readers.ObjectReader.read_objects(partition)

Reads the objects corresponding to an ObjectPartition.

Returns a tuple containing the object data and metadata respectively. Data and metadata are OrderedDicts having structure similar to TileDB-Py read results. Data and metadata should contain at least an external_id dimension used to identify the different objects.

read_objects_by_external_ids

vector_search.object_readers.ObjectReader.read_objects_by_external_ids(ids)

Reads the objects corresponding to a list of external_ids.

Returns an OrderedDict, containing the object data, having structure similar to TileDB-Py read results.