tabby/website/docs/03-faq.md

731 B

⁉️ Frequently Asked Questions

How much VRAM a LLM model consumes?
By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.
What GPUs are required for reduced-precision inference (e.g int8)?
  • int8: Compute Capability >= 7.0 or Compute Capability 6.1
  • float16: Compute Capability >= 7.0
  • bfloat16: Compute Capability >= 8.0

To determine the mapping between the GPU card type and its compute capability, please visit this page