From 133ce9ac56e88902aafcf423e3de521dc613826c Mon Sep 17 00:00:00 2001 From: Meng Zhang Date: Wed, 30 Aug 2023 23:09:23 +0800 Subject: [PATCH] docs: add faq.md (#381) * docs: add 03-faq.md * docs: add section on reduced precision inference compute capability * update --- ...ting-started.md => 01-gettings-started.md} | 0 .../01-docker.mdx | 0 .../02-apple.md | 0 .../02-self-hosting.md} | 0 website/docs/03-faq.md | 20 +++++++++++++++++++ 5 files changed, 20 insertions(+) rename website/docs/{getting-started.md => 01-gettings-started.md} (100%) rename website/docs/{self-hosting => 02-self-hosting}/01-docker.mdx (100%) rename website/docs/{self-hosting => 02-self-hosting}/02-apple.md (100%) rename website/docs/{self-hosting/self-hosting.md => 02-self-hosting/02-self-hosting.md} (100%) create mode 100644 website/docs/03-faq.md diff --git a/website/docs/getting-started.md b/website/docs/01-gettings-started.md similarity index 100% rename from website/docs/getting-started.md rename to website/docs/01-gettings-started.md diff --git a/website/docs/self-hosting/01-docker.mdx b/website/docs/02-self-hosting/01-docker.mdx similarity index 100% rename from website/docs/self-hosting/01-docker.mdx rename to website/docs/02-self-hosting/01-docker.mdx diff --git a/website/docs/self-hosting/02-apple.md b/website/docs/02-self-hosting/02-apple.md similarity index 100% rename from website/docs/self-hosting/02-apple.md rename to website/docs/02-self-hosting/02-apple.md diff --git a/website/docs/self-hosting/self-hosting.md b/website/docs/02-self-hosting/02-self-hosting.md similarity index 100% rename from website/docs/self-hosting/self-hosting.md rename to website/docs/02-self-hosting/02-self-hosting.md diff --git a/website/docs/03-faq.md b/website/docs/03-faq.md new file mode 100644 index 0000000..9edb500 --- /dev/null +++ b/website/docs/03-faq.md @@ -0,0 +1,20 @@ +# Frequently Asked Questions + +
+ How much VRAM a LLM model consumes? +
By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.
+
+ +
+ What GPUs are required for reduced-precision inference (e.g int8)? +
+
    +
  • int8: Compute Capability >= 7.0 or Compute Capability 6.1
  • +
  • float16: Compute Capability >= 7.0
  • +
  • bfloat16: Compute Capability >= 8.0
  • +
+

+ To determine the mapping between the GPU card type and its compute capability, please visit this page +

+
+