From 495e6aaf7682a549bda6d19baabf54edf876e09d Mon Sep 17 00:00:00 2001 From: Meng Zhang Date: Thu, 6 Apr 2023 00:24:33 +0800 Subject: [PATCH] Update README.md --- README.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 42e418e..027d3d0 100644 --- a/README.md +++ b/README.md @@ -49,9 +49,19 @@ curl -X POST http://localhost:5000/v1/completions -H 'Content-Type: application/ }' ``` -To use the GPU backend (triton) for a faster inference speed, use `deployment/docker-compose.yml`: +To use the GPU backend (triton) for a faster inference speed: ```bash -docker-compose up +docker run \ + --gpus all \ + -it --rm \ + -v ./data:/data \ + -v ./data/hf_cache:/home/app/.cache/huggingface \ + -p 5000:5000 \ + -p 8501:8501 \ + -p 8080:8080 \ + -e MODEL_NAME=TabbyML/J-350M \ + -e MODEL_BACKEND=triton \ + tabbyml/tabby ``` Note: To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.