Getting Started#
Deploy Ragger Duck#
To ease the deployment, we rely on pixi
. Refer to following
link for installing pixi
but in short, for the
currently supported platform, the following should be enough:
curl -fsSL https://pixi.sh/install.sh | bash
In the latest stage, pixi
will be in charge to create the Python environments to
build the scikit-learn documentation, train the retrievers, and launch the Web Console.
We already setup several environments for you depending on the platform and hardware
at your disposal:
cpu
: this is a cross-platform environments (i.e. linux and MacOS on x86_64 and arm64);mps
: this is an environment for MacOS on M1/M2/M3 chips;cuda-12-1
: this is an environment for linux on x86_64 machine with GPU support. We used it to make experiment on Scaleway instance that provides an L4 GPU.cuda-11-7
: similar tocuda-12-1
but relying on cuda 11.7 instead of 12.1.
Note that you can modify the pixi.toml
to create your own environments since the
cuda version used in the cuda-12-1
or cuda-11-7
environment might not suits your
needs.
Cloning the project#
The GitHub repository self-contained all the necessary source files for building the RAG. You need to clone the repository in a recursive way to get the scikit-learn source files as a submodule:
git clone --recursive git@github.com:probabl-ai/sklearn-ragger-duck.git
Install dependencies using pixi
#
The subsequent steps will require some dependencies to be installed. They are defined
in the pixi.lock
file and can be installed using pixi install
. However, you need
to specify which environment you want to use as stated in the previous section. Here,
we will use the cpu
environment:
pixi install --frozen -e cpu
Build the scikit-learn documentation#
First, we need to build the scikit-learn documentation since some of the retrievers will rely on the HTML generated pages. You can build the documentation by running the following command:
pixi run --frozen build-doc-sklearn
Train the semantic and lexical retrievers#
We need to train a set of lexical and semantic retrievers on the API documentation, the user guide, and the gallery of examples. We will have different retrievers for each of these type of documentation. You can refer User Guide for more details on the strategy used to train the retrievers.
You can launch the training of the retrievers by running the following command:
pixi run --frozen train-retrievers
Pixi might propose you to select a specific environment to make the training. You can also specify the environment by running the following command:
pixi run --frozen -e cpu train-retrievers
Download the Large Language Model#
You need to get a Large Language Model (LLM). For testing purpose, you can get the Mistral 7b model by running the following command:
pixi run --frozen fetch-mistral-7b
Launch the Web Console#
Now, you are all set to start the web console.
Then, Launch the Web Console by running the following command:
pixi run --frozen start-ragger-duck
You will also be required to select an environment depending on which hardware you want to offload the LLM.
Then, you can access the Web Console at the following address:
http://127.0.0.1:8123
Use the Ragger Duck library#
When using pixi
as discussed earlier, Ragger Duck is installed in editable mode in the
environment. However, we also make Ragger Duck installable via pip
:
pip install -e .
However, we don’t install any of the dependencies since it is hardware dependent and
can be better handled with pixi
.