SentenceTransformer#

class ragger_duck.embedding.SentenceTransformer(model_name_or_path=None, modules=None, device=None, cache_folder=None, use_auth_token=None, batch_size=32, show_progress_bar=True)#

Sentence transformer that embeds sentences to embeddings.

This is a thin wrapper around SentenceTransformer that follows the scikit-learn API and thus can be used inside a scikit-learn pipeline.

Parameters:
model_name_or_pathstr, default=None

If it is a filepath on disc, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model from Huggingface models repository with that name.

modulesIterable of nn.Module, default=None

This parameter can be used to create custom SentenceTransformer models from scratch.

devicestr, default=None

Device (e.g. “cpu”, “cuda”, “mps”) that should be used for computation. If None, checks if a GPU can be used.

cache_folderstr, default=None

Path to store models.

use_auth_tokenbool or str, default=None

HuggingFace authentication token to download private models.

batch_sizeint, default=32

The batch size to use during transform.

show_progress_barbool, default=True

Whether to show a progress bar or not during transform.

Methods

fit([X, y])

No-op operation, only validate parameters.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Embed sentences to vectors.

fit(X=None, y=None)#

No-op operation, only validate parameters.

Parameters:
XNone

This parameter is ignored.

yNone

This parameter is ignored.

Returns:
self

The fitted estimator.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • "default": Default output format of a transformer

  • "pandas": DataFrame output

  • "polars": Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: "polars" option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)#

Embed sentences to vectors.

Parameters:
Xstr or Iterable of str or dict or length (n_sentences,)

The sentences to embed.

  • If str, a single sentence to embed;

  • If list of str, a list of sentences to embed;

  • If list of dict, a list of dictionaries with a key “text” that contains the sentence to embed.

Returns:
embeddingndarray of shape (n_sentences, embedding_size)

The embedding of the sentences.