BM25Retriever#

class ragger_duck.retrieval.BM25Retriever(*, count_vectorizer=None, top_k=1, b=0.75, k1=1.6)#

Retrieve the k-nearest neighbors using a lexical search based on BM25.

Parameters:
count_vectorizertransformer, default=None

A count vectorizer to compute the count of terms in documents. If None, a sklearn.feature_extraction.text.CountVectorizer is used.

top_kint, default=1

Number of documents to retrieve.

Attributes:
X_fit_list of str or dict

The input data.

X_counts_sparse matrix of shape (n_documents, n_features)

The count of terms in documents.

count_vectorizer_transformer

The count vectorizer used to compute the count of terms in documents.

n_terms_by_document_ndarray of shape (n_sentences,)

The number of terms by document.

averaged_document_length_float

The average number of terms by document.

idf_ndarray of shape (n_features,)

The inverse document frequency.

Methods

fit(X[, y])

Compute the vocabulary and the idf.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

query(query)

Retrieve the most relevant documents for the query.

set_params(**params)

Set the parameters of this estimator.

fit(X, y=None)#

Compute the vocabulary and the idf.

Parameters:
Xlist of str or dict

The input data.

yNone

This parameter is ignored.

Returns:
self

The fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

query(query)#

Retrieve the most relevant documents for the query.

Parameters:
querystr

The input data.

Returns:
list of str or dict

The list of the most relevant document from the training set.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.