import CodeBlock from '@theme/CodeBlock'; # ⁉️ Frequently Asked Questions
How much VRAM a LLM model consumes?
By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.
What GPUs are required for reduced-precision inference (e.g int8)?

To determine the mapping between the GPU card type and its compute capability, please visit this page

How to utilize multiple NVIDIA GPUs?

Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES accordingly.

How can I convert my own model for use with Tabby?

Since version 0.5.0, Tabby's inference now operates entirely on llama.cpp, allowing the use of any GGUF-compatible model format with Tabby. To enhance accessibility, we have curated models that we benchmarked, available at registry-tabby.

Users are free to fork the repository to create their own registry. If a user's registry is located at https://github.com/USERNAME/registry-tabby, the model ID will be USERNAME/model.

For details on the registry format, please refer to models.json