Getting Started#

Deploy Ragger Duck#

To ease the deployment, we rely on pixi. Refer to following link for installing pixi but in short, for the currently supported platform, the following should be enough:

curl -fsSL https://pixi.sh/install.sh | bash

In the latest stage, pixi will be in charge to create the Python environments to build the scikit-learn documentation, train the retrievers, and launch the Web Console. We already setup several environments for you depending on the platform and hardware at your disposal:

cpu: this is a cross-platform environments (i.e. linux and MacOS on x86_64 and arm64);
mps: this is an environment for MacOS on M1/M2/M3 chips;
cuda-12-1: this is an environment for linux on x86_64 machine with GPU support. We used it to make experiment on Scaleway instance that provides an L4 GPU.
cuda-11-7: similar to cuda-12-1 but relying on cuda 11.7 instead of 12.1.

Note that you can modify the pixi.toml to create your own environments since the cuda version used in the cuda-12-1 or cuda-11-7 environment might not suits your needs.

Cloning the project#

The GitHub repository self-contained all the necessary source files for building the RAG. You need to clone the repository in a recursive way to get the scikit-learn source files as a submodule:

git clone --recursive git@github.com:probabl-ai/sklearn-ragger-duck.git

Install dependencies using `pixi`#

The subsequent steps will require some dependencies to be installed. They are defined in the pixi.lock file and can be installed using pixi install. However, you need to specify which environment you want to use as stated in the previous section. Here, we will use the cpu environment:

pixi install --frozen -e cpu

Build the scikit-learn documentation#

First, we need to build the scikit-learn documentation since some of the retrievers will rely on the HTML generated pages. You can build the documentation by running the following command:

pixi run --frozen build-doc-sklearn

Train the semantic and lexical retrievers#

We need to train a set of lexical and semantic retrievers on the API documentation, the user guide, and the gallery of examples. We will have different retrievers for each of these type of documentation. You can refer User Guide for more details on the strategy used to train the retrievers.

You can launch the training of the retrievers by running the following command:

pixi run --frozen train-retrievers

Pixi might propose you to select a specific environment to make the training. You can also specify the environment by running the following command:

pixi run --frozen -e cpu train-retrievers

Download the Large Language Model#

You need to get a Large Language Model (LLM). For testing purpose, you can get the Mistral 7b model by running the following command:

pixi run --frozen fetch-mistral-7b

Launch the Web Console#

Now, you are all set to start the web console.

Then, Launch the Web Console by running the following command:

pixi run --frozen start-ragger-duck

You will also be required to select an environment depending on which hardware you want to offload the LLM.

Then, you can access the Web Console at the following address:

http://127.0.0.1:8123

Use the Ragger Duck library#

When using pixi as discussed earlier, Ragger Duck is installed in editable mode in the environment. However, we also make Ragger Duck installable via pip:

pip install -e .

However, we don’t install any of the dependencies since it is hardware dependent and can be better handled with pixi.