BM25Retriever#
- class ragger_duck.retrieval.BM25Retriever(*, count_vectorizer=None, top_k=1, b=0.75, k1=1.6)#
Retrieve the k-nearest neighbors using a lexical search based on BM25.
- Parameters:
- count_vectorizertransformer, default=None
A count vectorizer to compute the count of terms in documents. If None, a
sklearn.feature_extraction.text.CountVectorizer
is used.- top_kint, default=1
Number of documents to retrieve.
- Attributes:
- X_fit_list of str or dict
The input data.
- X_counts_sparse matrix of shape (n_documents, n_features)
The count of terms in documents.
- count_vectorizer_transformer
The count vectorizer used to compute the count of terms in documents.
- n_terms_by_document_ndarray of shape (n_sentences,)
The number of terms by document.
- averaged_document_length_float
The average number of terms by document.
- idf_ndarray of shape (n_features,)
The inverse document frequency.
Methods
fit
(X[, y])Compute the vocabulary and the idf.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
query
(query)Retrieve the most relevant documents for the query.
set_params
(**params)Set the parameters of this estimator.
- fit(X, y=None)#
Compute the vocabulary and the idf.
- Parameters:
- Xlist of str or dict
The input data.
- yNone
This parameter is ignored.
- Returns:
- self
The fitted estimator.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- query(query)#
Retrieve the most relevant documents for the query.
- Parameters:
- querystr
The input data.
- Returns:
- list of str or dict
The list of the most relevant document from the training set.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.