EstimAI - How to setup a local LLM for the first time?

Quick setup guide with koboldcpp

You're interested in trying estimAI but have no idea how to set up your local AI. Here is a quick guide to help you.

Requirements:

A nvidia GPU with at least 8go VRAM
A M1/M3 Mac
A good CPU and at least 32Go RAM (running a LLM off the cpu may be slow and will only work with GGUF quantization)

Koboldcpp

First, you'll need to download the last release of koboldcpp: here

Pick the version matching your system then install it.

AI model

Once it's done, you'll need a model to play with. You can find thousands of models on huggingface.co: huggingface.co

Here's a short list of small base models quantized in GGUF you can try:

Download it and load it with koboldcpp.

There is one setting you'll need to tweak in koboldcpp before starting: the context size
Context size is basically the volume of the text the LLM can process. Which means that for roleplay, you will need a large amount because all the messages are stored in the context. But increasing the context will require more VRAM and may have a negative effect on the model.
8K Context is a minimum, and 32k will probably require too much VRAM.

Eudaimonia

Once koboldcpp has instantiated the model and is running, you will need to fill in the settings in eudaimonia. The backend URL for koboldcpp is http://localhost:5001 by default.
The prompt template is important and you'll need to refer to the model page. In the list above, the prompt template of each model is shown in brackets.
The parameters presets are not something you should look at for now, just pick what's default (min_p).
Temperature has an effect on LLM's randomness, which usually means that a higher temperature leads to more creativity, while a lower value leads to more deterministic answers.
You should set a temperature between 0.7 and 1.25 maximum (1.25 may be too much).

Local inference with LLM is still in its early stages. While a lot of incredible work has been done for the last 2 years, I think we still have a lot to explore and experiment. Personally, I'm completely overwhelmed by all the new models, papers, and technologies coming day after day. There is just not enough time in a day.
So don't hesitate to try everything, tweak the settings, play with the prompt, buy a new HDD and download all the models you can, then share your findings!

Video summary

I've made a quick video you can follow which shows the different steps to set up your local AI:

My computer is a potato but I really want to try this.

You can look at my other guide on how to set up runpod with koboldcpp Here