GalleryExampleExtractor#
- class ragger_duck.scraping.GalleryExampleExtractor(*, chunk_size=300, chunk_overlap=50, n_jobs=None)#
- Extract text from the examples of the gallery. - Parameters:
- chunk_sizeint or None, default=300
- The size of the chunks to split the text into. If None, the text is not chunked. 
- chunk_overlapint, default=50
- The overlap between two consecutive chunks. 
- n_jobsint, default=None
- The number of jobs to run in parallel. If None, then the number of jobs is set to the number of CPU cores. 
 
- Attributes:
- text_splitter_langchain.text_splitter.RecursiveCharacterTextSplitter
- The text splitter to use to chunk the document. If - chunk_sizeis None, this attribute is None.
 
- text_splitter_
 - Methods - fit([X, y])- No-op operation, only validate parameters. - fit_transform(X[, y])- Fit to data, then transform it. - Get metadata routing of this object. - get_params([deep])- Get parameters for this estimator. - set_output(*[, transform])- Set output container. - set_params(**params)- Set the parameters of this estimator. - transform(X)- Extract text from the API documentation. - fit(X=None, y=None)#
- No-op operation, only validate parameters. - Parameters:
- XNone
- This parameter is ignored. 
- yNone
- This parameter is ignored. 
 
- Returns:
- self
- The fitted estimator. 
 
 
 - fit_transform(X, y=None, **fit_params)#
- Fit to data, then transform it. - Fits transformer to - Xand- ywith optional parameters- fit_paramsand returns a transformed version of- X.- Parameters:
- Xarray-like of shape (n_samples, n_features)
- Input samples. 
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
- Target values (None for unsupervised transformations). 
- **fit_paramsdict
- Additional fit parameters. 
 
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
- Transformed array. 
 
 
 - get_metadata_routing()#
- Get metadata routing of this object. - Please check User Guide on how the routing mechanism works. - Returns:
- routingMetadataRequest
- A - MetadataRequestencapsulating routing information.
 
 
 - get_params(deep=True)#
- Get parameters for this estimator. - Parameters:
- deepbool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- Returns:
- paramsdict
- Parameter names mapped to their values. 
 
 
 - set_output(*, transform=None)#
- Set output container. - See Introducing the set_output API for an example on how to use the API. - Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
- Configure output of - transformand- fit_transform.- "default": Default output format of a transformer
- "pandas": DataFrame output
- "polars": Polars output
- None: Transform configuration is unchanged
 - Added in version 1.4: - "polars"option was added.
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 - set_params(**params)#
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
- Estimator parameters. 
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 - transform(X)#
- Extract text from the API documentation. - Parameters:
- Xpathlib.Path
- The path to the API documentation folder. 
 
- X
- Returns:
- outputlist
- A list of dictionaries containing the source and text of the User Guide documentation. 
 
 
 
 
    
  
  
