If you want to try EstimAI but don't have a good computer, here's a quick guide on how to rent a GPU in the cloud.
Go to runpod.io and create an account. You'll need to add funds to rent a GPU. Prices vary depending on the GPU you wish to rent and the service you will be using.
runpod.io
Once you have your account, you can go to Runpod's templates and search for the official koboldcpp template. Koboldcpp will serve as the backend API for instantiating the AI.
Select it and then click on the ‘Deploy' button to choose a GPU.
The choice of GPU is mainly determined by the amount of VRAM and the size of the model you want to use. A 4090 will probably work well with all GGUF models under 34B parameters, but you could also choose a 3090 or a less expansive A5000 (generation will be a little slower). If you want to use a 70B model, I recommend a GPU with at least 48GB VRAM.
Once you have chosen your GPU, click on the "Edit template" button to specify the model you are going to use.
Then, you'll need to go to huggingface to get the download link.
For example if you want to use Cydonia-22B, go to Huggingface - Cydonia-22B-v1.3-GGUFclick on the ‘Files and versions' tab, then in the list of files, for version Q4_K_M.gguf, right-click on the download icon to copy the link.
Then go back to runpod.io in the ‘Edit template' modal window, and in ‘Environment Variables' find the ‘KCPP_MODEL' field and paste the model link.
Now it's time to deploy. You'll need to wait a little while for everything to be downloaded and installed on your pod. After a few minutes (once you see that the GPU memory bar is filled), click on the ‘Connect' button and open the ‘HTTP Service [Port 5001]' link.
Copy the URL (which looks like this: https://[id]-[port].proxy.runpod.net).
And fill in the ‘AI Backend URL' field in eudaimonia with this URL.
If you wish to change the context length (default 4096), you can modify the ‘KCPP_ARGS' field in the "Edit template" form.
Example: --usecublas mmq --gpulayers 999 --contextsize 8192 --multiuser 20 --flashattention --ignoremissing
Make sure to put the same context length value in runpod and eudaimonia.
If nothing happens after deploying the pod and model, even after waiting several minutes, you can troubleshoot the problem by consulting the logs.
That's it, you can now play with EstimAI!
I've made a quick video you can follow which shows the different steps to set up koboldcpp on runpod.io:
Here's a small list of recommended quantized models (dozens of base models exist and hundreds of finetunes, feel free to test any model you think may be good).
To help you get the best model, you can find an up to date benchmarks of dedicated models (for erotic roleplay) here!