SemanticRetriever#
- class ragger_duck.retrieval.SemanticRetriever(*, embedding, top_k=1)#
Retrieve the k-nearest neighbors using a semantic embedding.
The index is build using the FAISS library.
- Parameters:
- embeddingtransformer
An embedding following the scikit-learn transformer API.
- top_kint, default=1
Number of documents to retrieve.
- Attributes:
- X_fit_list of str or dict
The input data.
- X_embedded_ndarray of shape (n_sentences, n_features)
The embedded data.
- index_faiss index
The index to retrieve the k-nearest neighbors.
Methods
fit
(X[, y])Embed the sentences and create the index.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
query
(query)Retrieve the most relevant documents for the query.
set_params
(**params)Set the parameters of this estimator.
- fit(X, y=None)#
Embed the sentences and create the index.
- Parameters:
- Xlist of str or dict
The input data.
- yNone
This parameter is ignored.
- Returns:
- self
The fitted estimator.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- query(query)#
Retrieve the most relevant documents for the query.
The inner product is used to compute the cosine similarity meaning that we expect the embedding to be normalized.
- Parameters:
- querystr
The input data.
- Returns:
- list of str or dict
The list of the most relevant document from the training set.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.