import CodeBlock from '@theme/CodeBlock'; # ⁉️ Frequently Asked Questions
How much VRAM a LLM model consumes?
By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.
What GPUs are required for reduced-precision inference (e.g int8)?

To determine the mapping between the GPU card type and its compute capability, please visit this page

How to utilize multiple NVIDIA GPUs?

Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES accordingly.

How can I convert my own model for use with Tabby?

Follow the instructions provided in the Model Spec.

Please note that the spec is unstable and does not adhere to semver.