# ⁉️ Frequently Asked Questions

How much VRAM a LLM model consumes?

By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.

What GPUs are required for reduced-precision inference (e.g int8)?

To determine the mapping between the GPU card type and its compute capability, please visit this page