docs: fix and add additional information in the Modal installation page (#748)

* Add additional information in modal installation docs

* docs: update tabby version to 0.5.5

update Modal installation script
extract-routes
Bryan 2023-11-11 09:45:26 +08:00 committed by GitHub
parent 41f60d3204
commit 71815bef8f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 46 additions and 4 deletions

View File

@ -4,7 +4,7 @@ modal serve app.py
from modal import Image, Stub, asgi_app, gpu
IMAGE_NAME = "tabbyml/tabby:0.4.0"
IMAGE_NAME = "tabbyml/tabby:0.5.5"
MODEL_ID = "TabbyML/StarCoder-1B"
GPU_CONFIG = gpu.T4()

View File

@ -7,16 +7,27 @@
First we import the components we need from `modal`.
```python
from modal import Image, Mount, Secret, Stub, asgi_app, gpu, method
from modal import Image, Stub, asgi_app, gpu
```
Next, we set the base docker image version, which model to serve, taking care to specify the GPU configuration required to fit the model into VRAM.
```python
IMAGE_NAME = "tabbyml/tabby:0.5.5"
MODEL_ID = "TabbyML/StarCoder-1B"
GPU_CONFIG = gpu.T4()
```
Currently supported GPUs in Modal:
- `T4`: Low-cost GPU option, providing 16GiB of GPU memory.
- `L4`: Mid-tier GPU option, providing 24GiB of GPU memory.
- `A100`: The most powerful GPU available in the cloud. Available in 40GiB and 80GiB GPU memory configurations.
- `A10G`: A10G GPUs deliver up to 3.3x better ML training performance, 3x better ML inference performance, and 3x better graphics performance, in comparison to NVIDIA T4 GPUs.
- `Any`: Selects any one of the GPU classes available within Modal, according to availability.
For detailed usage, please check official [Modal GPU reference](https://modal.com/docs/reference/modal.gpu).
## Define the container image
We want to create a Modal image which has the Tabby model cache pre-populated. The benefit of this is that the container no longer has to re-download the model - instead, it will take advantage of Modals internal filesystem for faster cold starts.
@ -40,7 +51,7 @@ def download_model():
### Image definition
Well start from a image by tabby, and override the default ENTRYPOINT for Modal to run its own which enables seamless serverless deployments.
Well start from an image by tabby, and override the default ENTRYPOINT for Modal to run its own which enables seamless serverless deployments.
Next we run the download step to pre-populate the image with our model weights.
@ -49,7 +60,7 @@ Finally, we install the `asgi-proxy-lib` to interface with modal's asgi webserve
```python
image = (
Image.from_registry(
"tabbyml/tabby:0.3.1",
IMAGE_NAME,
add_python="3.11",
)
.dockerfile_commands("ENTRYPOINT []")
@ -68,6 +79,7 @@ The endpoint function is represented with Modal's `@stub.function`. Here, we:
4. Keep idle containers for 2 minutes before spinning them down.
```python
stub = Stub("tabby-server-" + MODEL_ID.split("/")[-1], image=image)
@stub.function(
gpu=GPU_CONFIG,
allow_concurrent_inputs=10,
@ -118,6 +130,36 @@ def app():
Once we deploy this model with `modal serve app.py`, it will output the url of the web endpoint, in a form of `https://<USERNAME>--tabby-server-starcoder-1b-app-dev.modal.run`.
To test if the server is working, you can send a post request to the web endpoint.
```shell
curl --location 'https://<USERNAME>--tabby-server-starcoder-1b-app-dev.modal.run/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
"language": "python",
"segments": {
"prefix": "def fib(n):\n ",
"suffix": "\n return fib(n - 1) + fib(n - 2)"
}
}'
```
If you can get json response like in the following case, the app server is up and have fun!
```json
{
"id": "cmpl-4196b0c7-f417-4c48-9329-4a56aa86baea",
"choices": [
{
"index": 0,
"text": "if n == 0:\n return 0\n elif n == 1:\n return 1\n else:"
}
]
}
```
![App Running](./app-running.png)
Now it can be used as tabby server url in tabby editor extensions!