docs: add faq.md (#381)

* docs: add 03-faq.md

* docs: add section on reduced precision inference compute capability

* update
release-0.0
Meng Zhang 2023-08-30 23:09:23 +08:00 committed by GitHub
parent 57baecb370
commit 133ce9ac56
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 20 additions and 0 deletions

20
website/docs/03-faq.md Normal file
View File

@ -0,0 +1,20 @@
# Frequently Asked Questions
<details>
<summary>How much VRAM a LLM model consumes?</summary>
<div>By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.</div>
</details>
<details>
<summary>What GPUs are required for reduced-precision inference (e.g int8)?</summary>
<div>
<ul>
<li>int8: Compute Capability >= 7.0 or Compute Capability 6.1</li>
<li>float16: Compute Capability >= 7.0</li>
<li>bfloat16: Compute Capability >= 8.0</li>
</ul>
<p>
To determine the mapping between the GPU card type and its compute capability, please visit <a href="https://developer.nvidia.com/cuda-gpus">this page</a>
</p>
</div>
</details>