Go to file

Meng Zhang aa7ed053ec Update README.md		2023-04-06 00:25:16 +08:00
.github/workflows	Update docker.yml	2023-04-05 23:55:44 +08:00
clients/vscode	chore: bump vscode plugin version	2023-04-06 00:04:09 +08:00
deployment	Update tabby.sh	2023-04-05 23:29:09 +08:00
development	feat: support single container (#46 )	2023-04-05 20:19:43 +08:00
docs/internal	feat: improve events system (#40 )	2023-04-04 13:22:16 +08:00
tabby	Fix app dir	2023-04-05 23:00:58 +08:00
tests	test: support TABBY_API_HOST in k6 tests	2023-04-04 11:14:22 +08:00
.dockerignore	…
.gitattributes	…
.gitignore	Add supervisord.pid to gitignore	2023-03-29 16:41:18 +08:00
.pre-commit-config.yaml	feat: support stopping words in python backend. (#32 )	2023-03-29 20:23:11 +08:00
Dockerfile	feat: support single container (#46 )	2023-04-05 20:19:43 +08:00
LICENSE	…
Makefile	test: support TABBY_API_HOST in k6 tests	2023-04-04 11:14:22 +08:00
README.md	Update README.md	2023-04-06 00:25:16 +08:00
poetry.lock	Add bitsandbytes (#35 )	2023-03-29 20:47:44 +08:00
pyproject.toml	Add bitsandbytes (#35 )	2023-03-29 20:47:44 +08:00

README.md

🐾 Tabby

Warning Tabby is still in the alpha phrase

An opensource / on-prem alternative to GitHub Copilot.

Features

Self-contained, with no need for a DBMS or cloud service
Web UI for visualizing and configuration models and MLOps.
OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE).
Consumer level GPU supports (FP-16 weight loading with various optimization).

Get started

Docker

The easiest way of getting started is using the official docker image:

# Create data dir and grant owner to 1000 (Tabby run as uid 1000 in container)
mkdir -p data/hf_cache && chown -R 1000 data

docker run \
  -it --rm \
  -v ./data:/data \
  -v ./data/hf_cache:/home/app/.cache/huggingface \
  -p 5000:5000 \
  -p 8501:8501 \
  -p 8080:8080 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

To use the GPU backend (triton) for a faster inference speed:

docker run \
  --gpus all \
  -it --rm \
  -v ./data:/data \
  -v ./data/hf_cache:/home/app/.cache/huggingface \
  -p 5000:5000 \
  -p 8501:8501 \
  -p 8080:8080 \
  -e MODEL_NAME=TabbyML/J-350M \
  -e MODEL_BACKEND=triton \
  tabbyml/tabby

Note: To use GPUs, you need to install the NVIDIA Container Toolkit. We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.

You can then query the server using /v1/completions endpoint:

curl -X POST http://localhost:5000/v1/completions -H 'Content-Type: application/json' --data '{
    "prompt": "def binarySearch(arr, left, right, x):\n    mid = (left +"
}'

We also provides an interactive playground in admin panel localhost:8501

Skypilot

See deployment/skypilot/README.md

API documentation

Tabby opens an FastAPI server at localhost:5000, which embeds an OpenAPI documentation of the HTTP API.

Development

Go to development directory.

make dev

make dev-python  # Turn off triton backend (for non-cuda env developers)

TODOs

VIM Client #36
Fine-tuning models on private code repository. #23
Production ready (Open Telemetry, Prometheus metrics).