# ⁉️ Frequently Asked Questions How much VRAM a LLM model consumes? By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B. What GPUs are required for reduced-precision inference (e.g int8)? int8: Compute Capability >= 7.0 or Compute Capability 6.1 float16: Compute Capability >= 7.0 bfloat16: Compute Capability >= 8.0 To determine the mapping between the GPU card type and its compute capability, please visit this page
To determine the mapping between the GPU card type and its compute capability, please visit this page