Meng Zhang
c92b9e11c3
chore: fix rust build warning
2023-12-11 12:42:57 +08:00
Eric
e0d0133d86
feat: support build tabby on windows ( #948 )
...
* feat: update config to support build on windows
* resolve comment
* update release.yml
* resolve comment
2023-12-11 12:14:49 +08:00
Mikko Tiihonen
9aed0dee08
feat: Add support for 7840U iGPU type ( #960 )
...
rocminfo reports that my AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics GPU version is gfx1103
2023-12-06 23:11:05 +00:00
xcnick
2c2c95ccd7
fix: output unicode characters error ( #925 )
2023-12-01 12:18:26 +08:00
Meng Zhang
ffd5ef3449
fix: avoid llama.cpp's racing ( #923 )
2023-11-30 23:52:20 +08:00
Meng Zhang
8d1303d6e4
fix: properly recycle request id ( #920 )
2023-11-30 17:01:52 +08:00
Meng Zhang
9c905e4849
feat: add rocm support ( #913 )
...
* Added build configurations for Intel and AMD hardware
* Improved rocm build
* Added options for OneAPI and ROCm
* Build llama using icx
* [autofix.ci] apply automated fixes
* Fixed rocm image
* Build ROCm
* Tried to adjust compile flags for SYCL
* Removed references to oneAPI
* Provide info about the used device for ROCm
* Added ROCm documentation
* Addressed review comments
* Refactored to expose generic accelerator information
* Pull request cleanup
* cleanup
* cleanup
* Delete .github/workflows/docker-cuda.yml
* Delete .github/workflows/docker-rocm.yml
* Delete crates/tabby-common/src/api/accelerator.rs
* update
* cleanup
* update
* update
* update
* update
---------
Co-authored-by: Cromefire_ <cromefire+git@pm.me>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2023-11-29 03:27:03 +00:00
Meng Zhang
2b131ad1d2
refactor: handle max output length in StopCondition ( #910 )
...
* refactor: handle max output length in StopCondition
* trim stop words
* [autofix.ci] apply automated fixes
---------
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2023-11-28 16:57:16 +08:00
Meng Zhang
e92a8c8005
Release 0.7.0-dev
...
http-api-bindings@0.7.0-dev
juniper-axum@0.7.0-dev
llama-cpp-bindings@0.7.0-dev
tabby@0.7.0-dev
tabby-common@0.7.0-dev
tabby-download@0.7.0-dev
tabby-inference@0.7.0-dev
tabby-scheduler@0.7.0-dev
tabby-webserver@0.7.0-dev
Generated by cargo-workspaces
2023-11-27 14:58:58 +08:00
Meng Zhang
b1481b0e2e
chore: release 0.6.0 ( #882 )
...
* add loadtest
* release 0.6.0
* Release 0.6.0-rc.0
http-api-bindings@0.6.0-rc.0
juniper-axum@0.6.0-rc.0
llama-cpp-bindings@0.6.0-rc.0
tabby@0.6.0-rc.0
tabby-common@0.6.0-rc.0
tabby-download@0.6.0-rc.0
tabby-inference@0.6.0-rc.0
tabby-scheduler@0.6.0-rc.0
tabby-webserver@0.6.0-rc.0
Generated by cargo-workspaces
* Release 0.6.0-rc.1
http-api-bindings@0.6.0-rc.1
juniper-axum@0.6.0-rc.1
llama-cpp-bindings@0.6.0-rc.1
tabby@0.6.0-rc.1
tabby-common@0.6.0-rc.1
tabby-download@0.6.0-rc.1
tabby-inference@0.6.0-rc.1
tabby-scheduler@0.6.0-rc.1
tabby-webserver@0.6.0-rc.1
Generated by cargo-workspaces
* Release 0.6.0-rc.2
http-api-bindings@0.6.0-rc.2
juniper-axum@0.6.0-rc.2
llama-cpp-bindings@0.6.0-rc.2
tabby@0.6.0-rc.2
tabby-common@0.6.0-rc.2
tabby-download@0.6.0-rc.2
tabby-inference@0.6.0-rc.2
tabby-scheduler@0.6.0-rc.2
tabby-webserver@0.6.0-rc.2
Generated by cargo-workspaces
* Release 0.6.0
http-api-bindings@0.6.0
juniper-axum@0.6.0
llama-cpp-bindings@0.6.0
tabby@0.6.0
tabby-common@0.6.0
tabby-download@0.6.0
tabby-inference@0.6.0
tabby-scheduler@0.6.0
tabby-webserver@0.6.0
Generated by cargo-workspaces
2023-11-27 14:57:45 +08:00
Meng Zhang
23a49beaa9
feat(ci): support manylinux build for cpu / cuda ( #899 )
2023-11-26 16:37:12 +08:00
Maciej
ebbe6e5af8
fix: helpful message when llama.cpp submodule is not present ( #719 ) ( #775 )
2023-11-13 07:51:46 +00:00
Erfan Safari
138b7459c5
feat: add LLAMA_CPP_N_THREADS env ( #742 )
...
* feat: add LLAMA_CPP_N_THREADS and LLAMA_CPP_N_THREADS_BATCH envs
* apply format
* improve: use LLAMA_CPP_N_THREADS for both n_threads and n_threads_batch
* Update crates/llama-cpp-bindings/src/engine.cc
---------
Co-authored-by: Meng Zhang <meng@tabbyml.com>
2023-11-09 19:54:23 +00:00
Meng Zhang
8c669dee8e
fix: llama.cpp queuing logic ( #741 )
2023-11-09 08:29:54 +00:00
Meng Zhang
cde3602877
feat: sync llama.cpp to latest
2023-11-08 16:06:09 -08:00
Meng Zhang
8ab35b2639
feat: add --parallelism to control throughput and vram usage ( #727 )
...
* feat: add --parallelism to control throughput and vram usage
* update default
* Revert "update default"
This reverts commit 349792c0d48d913dcd8be4ce1c9d7ce887918f29.
* cargo fmt
2023-11-08 18:31:22 +00:00
Meng Zhang
1ad0d39903
fix: deadlock between background job and requests ( #720 )
...
* fix: deadlock between background job and requests
* refactor: extract LlamaService
2023-11-07 13:11:28 -08:00
Meng Zhang
ca52ac4b01
fix: support cpu only run in llama.cpp cuda build
2023-11-06 22:59:24 -08:00
Meng Zhang
eb7ae96157
fix: llama.cpp requires kv cache to be N_CTX * parallelism ( #714 )
2023-11-07 06:16:36 +00:00
Meng Zhang
9344c32b31
fix: when there's an error happens in background inference loop, it should exit the process ( #713 )
2023-11-06 20:41:49 +00:00
Meng Zhang
00e0c4fddc
chore: add machete check to ensure no unused dependencies ( #701 )
...
* refactor: remove useless dependencies
* add machete
2023-11-05 02:48:05 +00:00
Meng Zhang
64e0abb8cc
fix(llama.cpp): wrongly index for n_seq in warmup
2023-11-04 17:53:22 -07:00
Meng Zhang
c7c67c2f90
fix: llama.cpp warmp logic
2023-11-04 14:28:04 -07:00
Meng Zhang
fc9c9f644b
Release 0.6.0-dev
...
http-api-bindings@0.6.0-dev
llama-cpp-bindings@0.6.0-dev
tabby@0.6.0-dev
tabby-common@0.6.0-dev
tabby-download@0.6.0-dev
tabby-inference@0.6.0-dev
tabby-scheduler@0.6.0-dev
Generated by cargo-workspaces
2023-11-03 18:04:12 -07:00
Meng Zhang
ec8d88de0d
chore: release 0.5.0 ( #697 )
...
* Release 0.5.0-rc.0
http-api-bindings@0.5.0-rc.0
llama-cpp-bindings@0.5.0-rc.0
tabby@0.5.0-rc.0
tabby-common@0.5.0-rc.0
tabby-download@0.5.0-rc.0
tabby-inference@0.5.0-rc.0
tabby-scheduler@0.5.0-rc.0
Generated by cargo-workspaces
* fix: docker branch tag should only generate when not empty
* Release 0.5.0-rc.1
http-api-bindings@0.5.0-rc.1
llama-cpp-bindings@0.5.0-rc.1
tabby@0.5.0-rc.1
tabby-common@0.5.0-rc.1
tabby-download@0.5.0-rc.1
tabby-inference@0.5.0-rc.1
tabby-scheduler@0.5.0-rc.1
Generated by cargo-workspaces
* fix: handlebar syntax in meta action
* Release 0.5.0-rc.2
http-api-bindings@0.5.0-rc.2
llama-cpp-bindings@0.5.0-rc.2
tabby@0.5.0-rc.2
tabby-common@0.5.0-rc.2
tabby-download@0.5.0-rc.2
tabby-inference@0.5.0-rc.2
tabby-scheduler@0.5.0-rc.2
Generated by cargo-workspaces
* fix: handlebar syntax in meta action
* Release 0.5.0-rc.3
http-api-bindings@0.5.0-rc.3
llama-cpp-bindings@0.5.0-rc.3
tabby@0.5.0-rc.3
tabby-common@0.5.0-rc.3
tabby-download@0.5.0-rc.3
tabby-inference@0.5.0-rc.3
tabby-scheduler@0.5.0-rc.3
Generated by cargo-workspaces
* docs: update change log and docs
* fix: collect_snippet should handle NotReady error
* Release 0.5.0-rc.4
http-api-bindings@0.5.0-rc.4
llama-cpp-bindings@0.5.0-rc.4
tabby@0.5.0-rc.4
tabby-common@0.5.0-rc.4
tabby-download@0.5.0-rc.4
tabby-inference@0.5.0-rc.4
tabby-scheduler@0.5.0-rc.4
Generated by cargo-workspaces
* Release 0.5.0
http-api-bindings@0.5.0
llama-cpp-bindings@0.5.0
tabby@0.5.0
tabby-common@0.5.0
tabby-download@0.5.0
tabby-inference@0.5.0
tabby-scheduler@0.5.0
Generated by cargo-workspaces
2023-11-03 18:02:03 -07:00
Meng Zhang
acb3a33d78
fix: handle non utf-8 / utf-16 error
2023-11-02 16:29:30 -07:00
Meng Zhang
eb34850a5e
fix: output err if step failed
2023-11-02 16:15:11 -07:00
Meng Zhang
4c7eae584e
feat: add model warmup logic ( #693 )
2023-11-02 23:07:32 +00:00
Meng Zhang
296342efd8
refactor: use llama.cpp tokenizer ( #683 )
...
* refactor: switch to llama.cpp tokenizer to simplify implementation
* refactor: remove tokenizer dependency from tabby
* refactor: renaming decoding to stop condition
* refactor: remove tokenizer dependency
* refactor: remove submodule
* chore: update formatting
* move tokenization to c++
2023-10-31 22:16:09 +00:00
Meng Zhang
89a63dbf33
fix: when send failed, treat the request as stopped ( #673 )
2023-10-30 06:27:09 +00:00
Meng Zhang
7330d75de6
chore: clear cache when there's no active requests
2023-10-29 16:30:30 -07:00
Meng Zhang
7bd99d14c0
feat: support continuous batching in llama.cpp backend ( #659 )
...
* refactor: switch back to llama batch interface
* feat: support cont batching
2023-10-28 23:37:05 -07:00
Meng Zhang
444222683a
fix(llama.cpp): bump upstream fix for starcoder model on cuda
2023-10-28 02:03:34 -07:00
Meng Zhang
9309e0314f
fix: fix docker build
2023-10-27 21:25:45 -07:00
Meng Zhang
6dd12ce1ec
fix: adding cuda search path to docker build.
2023-10-27 19:40:35 -07:00
Meng Zhang
2d948639be
fix: docker build for llama cuda backend
2023-10-27 16:36:54 -07:00
Meng Zhang
23bd542cec
feat: switch cuda backend to llama.cpp ( #656 )
...
* feat: switch cuda backend to llama.cpp
* fix
* fix
2023-10-27 13:41:22 -07:00
Meng Zhang
f37840566b
feat: upgrade llama.cpp ( #645 )
...
* feat: upgrade llama.cpp
* update download files
* update changelog
* Update CHANGELOG.md
* Update CHANGELOG.md
2023-10-27 12:18:46 -07:00
Meng Zhang
1a4c2aa71f
feat: swtich cpu backend to llama.cpp ( #638 )
...
* feat: swtich Cpu backend to llama.cpp
* feat: switch cpu serving to ggml
* fix cargo.toml
* use optional dependency
* fix compliation
* update ci target
2023-10-25 15:40:11 -07:00
Meng Zhang
e171776774
Release 0.5.0-dev
...
ctranslate2-bindings@0.5.0-dev
http-api-bindings@0.5.0-dev
llama-cpp-bindings@0.5.0-dev
rust-cxx-cmake-bridge@0.5.0-dev
tabby@0.5.0-dev
tabby-common@0.5.0-dev
tabby-download@0.5.0-dev
tabby-inference@0.5.0-dev
tabby-scheduler@0.5.0-dev
Generated by cargo-workspaces
2023-10-24 13:05:33 -07:00
Meng Zhang
99a7053b6f
refactor: extract language configuration into individual toml file ( #564 )
...
* refactor: extract language configuration into individual toml file
* feat: add golang language configuration (#565 )
2023-10-16 00:24:44 +00:00
Meng Zhang
82e893d569
Release 0.4.0-dev
...
ctranslate2-bindings@0.4.0-dev
http-api-bindings@0.4.0-dev
llama-cpp-bindings@0.4.0-dev
rust-cxx-cmake-bridge@0.4.0-dev
tabby@0.4.0-dev
tabby-common@0.4.0-dev
tabby-download@0.4.0-dev
tabby-inference@0.4.0-dev
tabby-scheduler@0.4.0-dev
Generated by cargo-workspaces
2023-10-13 17:54:14 -07:00
Meng Zhang
4dbaf4f312
Release 0.3.0
...
ctranslate2-bindings@0.3.0
http-api-bindings@0.3.0
llama-cpp-bindings@0.3.0
rust-cxx-cmake-bridge@0.3.0
tabby@0.3.0
tabby-common@0.3.0
tabby-download@0.3.0
tabby-inference@0.3.0
tabby-scheduler@0.3.0
Generated by cargo-workspaces
2023-10-13 17:45:07 -07:00
Meng Zhang
eb463ba496
Release 0.3.0-rc.1
...
ctranslate2-bindings@0.3.0-rc.1
http-api-bindings@0.3.0-rc.1
llama-cpp-bindings@0.3.0-rc.1
rust-cxx-cmake-bridge@0.3.0-rc.1
tabby@0.3.0-rc.1
tabby-common@0.3.0-rc.1
tabby-download@0.3.0-rc.1
tabby-inference@0.3.0-rc.1
tabby-scheduler@0.3.0-rc.1
Generated by cargo-workspaces
2023-10-13 11:43:34 -07:00
Meng Zhang
182aceed41
Release 0.3.0-rc.0
...
ctranslate2-bindings@0.3.0-rc.0
http-api-bindings@0.3.0-rc.0
llama-cpp-bindings@0.3.0-rc.0
tabby@0.3.0-rc.0
tabby-common@0.3.0-rc.0
tabby-download@0.3.0-rc.0
tabby-inference@0.3.0-rc.0
tabby-scheduler@0.3.0-rc.0
Generated by cargo-workspaces
2023-10-13 11:24:36 -07:00
Meng Zhang
6dbb712918
Release 0.3.0-dev
...
ctranslate2-bindings@0.3.0-dev
http-api-bindings@0.3.0-dev
llama-cpp-bindings@0.3.0-dev
tabby@0.3.0-dev
tabby-common@0.3.0-dev
tabby-download@0.3.0-dev
tabby-inference@0.3.0-dev
tabby-scheduler@0.3.0-dev
Generated by cargo-workspaces
2023-10-09 19:39:27 -07:00
Meng Zhang
1731c3075e
chore: Update version to 0.2.0
2023-10-03 13:32:21 -07:00
Meng Zhang
692c2fe0fd
Release 0.2.0-rc.0
...
ctranslate2-bindings@0.2.0-rc.0
http-api-bindings@0.2.0-rc.0
llama-cpp-bindings@0.2.0-rc.0
tabby@0.2.0-rc.0
tabby-common@0.2.0-rc.0
tabby-download@0.2.0-rc.0
tabby-inference@0.2.0-rc.0
tabby-scheduler@0.2.0-rc.0
Generated by cargo-workspaces
2023-10-02 19:14:12 -07:00
Meng Zhang
f05dd3a2f6
refactor: cleanup chat api make it message oriented ( #497 )
...
* refactor: refactor into /chat/completions api
* Revert "feat: support request level stop words (#492 )"
This reverts commit 0d6840e372 .
* feat: adjust interface
* switch interface in tabby-playground
* move to chat/prompt, add unit test
* update interface
2023-10-02 15:39:15 +00:00
Meng Zhang
dfdd0373a6
fix: when llama model loads failed, panic in rust stack
2023-10-01 22:25:25 -07:00