SentenceTransformer#
- class ragger_duck.embedding.SentenceTransformer(model_name_or_path=None, modules=None, device=None, cache_folder=None, use_auth_token=None, batch_size=32, show_progress_bar=True)#
Sentence transformer that embeds sentences to embeddings.
This is a thin wrapper around
SentenceTransformer
that follows the scikit-learn API and thus can be used inside a scikit-learn pipeline.- Parameters:
- model_name_or_pathstr, default=None
If it is a filepath on disc, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model from Huggingface models repository with that name.
- modulesIterable of nn.Module, default=None
This parameter can be used to create custom SentenceTransformer models from scratch.
- devicestr, default=None
Device (e.g. “cpu”, “cuda”, “mps”) that should be used for computation. If None, checks if a GPU can be used.
- cache_folderstr, default=None
Path to store models.
- use_auth_tokenbool or str, default=None
HuggingFace authentication token to download private models.
- batch_sizeint, default=32
The batch size to use during
transform
.- show_progress_barbool, default=True
Whether to show a progress bar or not during
transform
.
Methods
fit
([X, y])No-op operation, only validate parameters.
fit_transform
(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Embed sentences to vectors.
- fit(X=None, y=None)#
No-op operation, only validate parameters.
- Parameters:
- XNone
This parameter is ignored.
- yNone
This parameter is ignored.
- Returns:
- self
The fitted estimator.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of
transform
andfit_transform
."default"
: Default output format of a transformer"pandas"
: DataFrame output"polars"
: Polars outputNone
: Transform configuration is unchanged
Added in version 1.4:
"polars"
option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)#
Embed sentences to vectors.
- Parameters:
- Xstr or Iterable of str or dict or length (n_sentences,)
The sentences to embed.
If
str
, a single sentence to embed;If
list
ofstr
, a list of sentences to embed;If
list
ofdict
, a list of dictionaries with a key “text” that contains the sentence to embed.
- Returns:
- embeddingndarray of shape (n_sentences, embedding_size)
The embedding of the sentences.