SemanticRetriever#

class ragger_duck.retrieval.SemanticRetriever(*, embedding, top_k=1)#

Retrieve the k-nearest neighbors using a semantic embedding.

The index is build using the FAISS library.

Parameters:
embeddingtransformer

An embedding following the scikit-learn transformer API.

top_kint, default=1

Number of documents to retrieve.

Attributes:
X_fit_list of str or dict

The input data.

X_embedded_ndarray of shape (n_sentences, n_features)

The embedded data.

index_faiss index

The index to retrieve the k-nearest neighbors.

Methods

fit(X[, y])

Embed the sentences and create the index.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

query(query)

Retrieve the most relevant documents for the query.

set_params(**params)

Set the parameters of this estimator.

fit(X, y=None)#

Embed the sentences and create the index.

Parameters:
Xlist of str or dict

The input data.

yNone

This parameter is ignored.

Returns:
self

The fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

query(query)#

Retrieve the most relevant documents for the query.

The inner product is used to compute the cosine similarity meaning that we expect the embedding to be normalized.

Parameters:
querystr

The input data.

Returns:
list of str or dict

The list of the most relevant document from the training set.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.