import CodeBlock from '@theme/CodeBlock'; # ⁉️ Frequently Asked Questions
How much VRAM a LLM model consumes?
By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.
What GPUs are required for reduced-precision inference (e.g int8)?

To determine the mapping between the GPU card type and its compute capability, please visit this page

How to utilize multiple NVIDIA GPUs?

Tabby supports replicating models on multiple GPUs to increase throughput. You can specify the devices for model replication by using the --device-indices option.

# Replicate model to GPU 0 and GPU 1.{'\n'} tabby serve ... --device-indices 0 --device-indices 1