From 495e6aaf7682a549bda6d19baabf54edf876e09d Mon Sep 17 00:00:00 2001
From: Meng Zhang <wsxiaoys.lh@gmail.com>
Date: Thu, 6 Apr 2023 00:24:33 +0800
Subject: [PATCH] Update README.md

---
 README.md | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 42e418e..027d3d0 100644
--- a/README.md
+++ b/README.md
@@ -49,9 +49,19 @@ curl -X POST http://localhost:5000/v1/completions -H 'Content-Type: application/
 }'
 ```
 
-To use the GPU backend (triton) for a faster inference speed, use `deployment/docker-compose.yml`:
+To use the GPU backend (triton) for a faster inference speed:
 ```bash
-docker-compose up
+docker run \
+  --gpus all \
+  -it --rm \
+  -v ./data:/data \
+  -v ./data/hf_cache:/home/app/.cache/huggingface \
+  -p 5000:5000 \
+  -p 8501:8501 \
+  -p 8080:8080 \
+  -e MODEL_NAME=TabbyML/J-350M \
+  -e MODEL_BACKEND=triton \
+  tabbyml/tabby
 ```
 Note: To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.