tabby

Commit Graph

Author	SHA1	Message	Date
xcnick	2c2c95ccd7	fix: output unicode characters error (#925 )	2023-12-01 12:18:26 +08:00
Meng Zhang	ffd5ef3449	fix: avoid llama.cpp's racing (#923 )	2023-11-30 23:52:20 +08:00
Meng Zhang	8d1303d6e4	fix: properly recycle request id (#920 )	2023-11-30 17:01:52 +08:00
Meng Zhang	9c905e4849	feat: add rocm support (#913 ) * Added build configurations for Intel and AMD hardware * Improved rocm build * Added options for OneAPI and ROCm * Build llama using icx * [autofix.ci] apply automated fixes * Fixed rocm image * Build ROCm * Tried to adjust compile flags for SYCL * Removed references to oneAPI * Provide info about the used device for ROCm * Added ROCm documentation * Addressed review comments * Refactored to expose generic accelerator information * Pull request cleanup * cleanup * cleanup * Delete .github/workflows/docker-cuda.yml * Delete .github/workflows/docker-rocm.yml * Delete crates/tabby-common/src/api/accelerator.rs * update * cleanup * update * update * update * update --------- Co-authored-by: Cromefire_ <cromefire+git@pm.me> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>	2023-11-29 03:27:03 +00:00
Meng Zhang	2b131ad1d2	refactor: handle max output length in StopCondition (#910 ) * refactor: handle max output length in StopCondition * trim stop words * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>	2023-11-28 16:57:16 +08:00
Meng Zhang	e92a8c8005	Release 0.7.0-dev http-api-bindings@0.7.0-dev juniper-axum@0.7.0-dev llama-cpp-bindings@0.7.0-dev tabby@0.7.0-dev tabby-common@0.7.0-dev tabby-download@0.7.0-dev tabby-inference@0.7.0-dev tabby-scheduler@0.7.0-dev tabby-webserver@0.7.0-dev Generated by cargo-workspaces	2023-11-27 14:58:58 +08:00
Meng Zhang	b1481b0e2e	chore: release 0.6.0 (#882 ) * add loadtest * release 0.6.0 * Release 0.6.0-rc.0 http-api-bindings@0.6.0-rc.0 juniper-axum@0.6.0-rc.0 llama-cpp-bindings@0.6.0-rc.0 tabby@0.6.0-rc.0 tabby-common@0.6.0-rc.0 tabby-download@0.6.0-rc.0 tabby-inference@0.6.0-rc.0 tabby-scheduler@0.6.0-rc.0 tabby-webserver@0.6.0-rc.0 Generated by cargo-workspaces * Release 0.6.0-rc.1 http-api-bindings@0.6.0-rc.1 juniper-axum@0.6.0-rc.1 llama-cpp-bindings@0.6.0-rc.1 tabby@0.6.0-rc.1 tabby-common@0.6.0-rc.1 tabby-download@0.6.0-rc.1 tabby-inference@0.6.0-rc.1 tabby-scheduler@0.6.0-rc.1 tabby-webserver@0.6.0-rc.1 Generated by cargo-workspaces * Release 0.6.0-rc.2 http-api-bindings@0.6.0-rc.2 juniper-axum@0.6.0-rc.2 llama-cpp-bindings@0.6.0-rc.2 tabby@0.6.0-rc.2 tabby-common@0.6.0-rc.2 tabby-download@0.6.0-rc.2 tabby-inference@0.6.0-rc.2 tabby-scheduler@0.6.0-rc.2 tabby-webserver@0.6.0-rc.2 Generated by cargo-workspaces * Release 0.6.0 http-api-bindings@0.6.0 juniper-axum@0.6.0 llama-cpp-bindings@0.6.0 tabby@0.6.0 tabby-common@0.6.0 tabby-download@0.6.0 tabby-inference@0.6.0 tabby-scheduler@0.6.0 tabby-webserver@0.6.0 Generated by cargo-workspaces	2023-11-27 14:57:45 +08:00
Meng Zhang	23a49beaa9	feat(ci): support manylinux build for cpu / cuda (#899 )	2023-11-26 16:37:12 +08:00
Maciej	ebbe6e5af8	fix: helpful message when llama.cpp submodule is not present (#719 ) (#775 )	2023-11-13 07:51:46 +00:00
Erfan Safari	138b7459c5	feat: add LLAMA_CPP_N_THREADS env (#742 ) * feat: add LLAMA_CPP_N_THREADS and LLAMA_CPP_N_THREADS_BATCH envs * apply format * improve: use LLAMA_CPP_N_THREADS for both n_threads and n_threads_batch * Update crates/llama-cpp-bindings/src/engine.cc --------- Co-authored-by: Meng Zhang <meng@tabbyml.com>	2023-11-09 19:54:23 +00:00
Meng Zhang	8c669dee8e	fix: llama.cpp queuing logic (#741 )	2023-11-09 08:29:54 +00:00
Meng Zhang	cde3602877	feat: sync llama.cpp to latest	2023-11-08 16:06:09 -08:00
Meng Zhang	8ab35b2639	feat: add --parallelism to control throughput and vram usage (#727 ) * feat: add --parallelism to control throughput and vram usage * update default * Revert "update default" This reverts commit 349792c0d48d913dcd8be4ce1c9d7ce887918f29. * cargo fmt	2023-11-08 18:31:22 +00:00
Meng Zhang	1ad0d39903	fix: deadlock between background job and requests (#720 ) * fix: deadlock between background job and requests * refactor: extract LlamaService	2023-11-07 13:11:28 -08:00
Meng Zhang	ca52ac4b01	fix: support cpu only run in llama.cpp cuda build	2023-11-06 22:59:24 -08:00
Meng Zhang	eb7ae96157	fix: llama.cpp requires kv cache to be N_CTX * parallelism (#714 )	2023-11-07 06:16:36 +00:00
Meng Zhang	9344c32b31	fix: when there's an error happens in background inference loop, it should exit the process (#713 )	2023-11-06 20:41:49 +00:00
Meng Zhang	00e0c4fddc	chore: add machete check to ensure no unused dependencies (#701 ) * refactor: remove useless dependencies * add machete	2023-11-05 02:48:05 +00:00
Meng Zhang	64e0abb8cc	fix(llama.cpp): wrongly index for n_seq in warmup	2023-11-04 17:53:22 -07:00
Meng Zhang	c7c67c2f90	fix: llama.cpp warmp logic	2023-11-04 14:28:04 -07:00
Meng Zhang	fc9c9f644b	Release 0.6.0-dev http-api-bindings@0.6.0-dev llama-cpp-bindings@0.6.0-dev tabby@0.6.0-dev tabby-common@0.6.0-dev tabby-download@0.6.0-dev tabby-inference@0.6.0-dev tabby-scheduler@0.6.0-dev Generated by cargo-workspaces	2023-11-03 18:04:12 -07:00
Meng Zhang	ec8d88de0d	chore: release 0.5.0 (#697 ) * Release 0.5.0-rc.0 http-api-bindings@0.5.0-rc.0 llama-cpp-bindings@0.5.0-rc.0 tabby@0.5.0-rc.0 tabby-common@0.5.0-rc.0 tabby-download@0.5.0-rc.0 tabby-inference@0.5.0-rc.0 tabby-scheduler@0.5.0-rc.0 Generated by cargo-workspaces * fix: docker branch tag should only generate when not empty * Release 0.5.0-rc.1 http-api-bindings@0.5.0-rc.1 llama-cpp-bindings@0.5.0-rc.1 tabby@0.5.0-rc.1 tabby-common@0.5.0-rc.1 tabby-download@0.5.0-rc.1 tabby-inference@0.5.0-rc.1 tabby-scheduler@0.5.0-rc.1 Generated by cargo-workspaces * fix: handlebar syntax in meta action * Release 0.5.0-rc.2 http-api-bindings@0.5.0-rc.2 llama-cpp-bindings@0.5.0-rc.2 tabby@0.5.0-rc.2 tabby-common@0.5.0-rc.2 tabby-download@0.5.0-rc.2 tabby-inference@0.5.0-rc.2 tabby-scheduler@0.5.0-rc.2 Generated by cargo-workspaces * fix: handlebar syntax in meta action * Release 0.5.0-rc.3 http-api-bindings@0.5.0-rc.3 llama-cpp-bindings@0.5.0-rc.3 tabby@0.5.0-rc.3 tabby-common@0.5.0-rc.3 tabby-download@0.5.0-rc.3 tabby-inference@0.5.0-rc.3 tabby-scheduler@0.5.0-rc.3 Generated by cargo-workspaces * docs: update change log and docs * fix: collect_snippet should handle NotReady error * Release 0.5.0-rc.4 http-api-bindings@0.5.0-rc.4 llama-cpp-bindings@0.5.0-rc.4 tabby@0.5.0-rc.4 tabby-common@0.5.0-rc.4 tabby-download@0.5.0-rc.4 tabby-inference@0.5.0-rc.4 tabby-scheduler@0.5.0-rc.4 Generated by cargo-workspaces * Release 0.5.0 http-api-bindings@0.5.0 llama-cpp-bindings@0.5.0 tabby@0.5.0 tabby-common@0.5.0 tabby-download@0.5.0 tabby-inference@0.5.0 tabby-scheduler@0.5.0 Generated by cargo-workspaces	2023-11-03 18:02:03 -07:00
Meng Zhang	acb3a33d78	fix: handle non utf-8 / utf-16 error	2023-11-02 16:29:30 -07:00
Meng Zhang	eb34850a5e	fix: output err if step failed	2023-11-02 16:15:11 -07:00
Meng Zhang	4c7eae584e	feat: add model warmup logic (#693 )	2023-11-02 23:07:32 +00:00
Meng Zhang	296342efd8	refactor: use llama.cpp tokenizer (#683 ) * refactor: switch to llama.cpp tokenizer to simplify implementation * refactor: remove tokenizer dependency from tabby * refactor: renaming decoding to stop condition * refactor: remove tokenizer dependency * refactor: remove submodule * chore: update formatting * move tokenization to c++	2023-10-31 22:16:09 +00:00
Meng Zhang	89a63dbf33	fix: when send failed, treat the request as stopped (#673 )	2023-10-30 06:27:09 +00:00
Meng Zhang	7330d75de6	chore: clear cache when there's no active requests	2023-10-29 16:30:30 -07:00
Meng Zhang	7bd99d14c0	feat: support continuous batching in llama.cpp backend (#659 ) * refactor: switch back to llama batch interface * feat: support cont batching	2023-10-28 23:37:05 -07:00
Meng Zhang	444222683a	fix(llama.cpp): bump upstream fix for starcoder model on cuda	2023-10-28 02:03:34 -07:00
Meng Zhang	9309e0314f	fix: fix docker build	2023-10-27 21:25:45 -07:00
Meng Zhang	6dd12ce1ec	fix: adding cuda search path to docker build.	2023-10-27 19:40:35 -07:00
Meng Zhang	2d948639be	fix: docker build for llama cuda backend	2023-10-27 16:36:54 -07:00
Meng Zhang	23bd542cec	feat: switch cuda backend to llama.cpp (#656 ) * feat: switch cuda backend to llama.cpp * fix * fix	2023-10-27 13:41:22 -07:00
Meng Zhang	f37840566b	feat: upgrade llama.cpp (#645 ) * feat: upgrade llama.cpp * update download files * update changelog * Update CHANGELOG.md * Update CHANGELOG.md	2023-10-27 12:18:46 -07:00
Meng Zhang	1a4c2aa71f	feat: swtich cpu backend to llama.cpp (#638 ) * feat: swtich Cpu backend to llama.cpp * feat: switch cpu serving to ggml * fix cargo.toml * use optional dependency * fix compliation * update ci target	2023-10-25 15:40:11 -07:00
Meng Zhang	e171776774	Release 0.5.0-dev ctranslate2-bindings@0.5.0-dev http-api-bindings@0.5.0-dev llama-cpp-bindings@0.5.0-dev rust-cxx-cmake-bridge@0.5.0-dev tabby@0.5.0-dev tabby-common@0.5.0-dev tabby-download@0.5.0-dev tabby-inference@0.5.0-dev tabby-scheduler@0.5.0-dev Generated by cargo-workspaces	2023-10-24 13:05:33 -07:00
Meng Zhang	99a7053b6f	refactor: extract language configuration into individual toml file (#564 ) * refactor: extract language configuration into individual toml file * feat: add golang language configuration (#565)	2023-10-16 00:24:44 +00:00
Meng Zhang	82e893d569	Release 0.4.0-dev ctranslate2-bindings@0.4.0-dev http-api-bindings@0.4.0-dev llama-cpp-bindings@0.4.0-dev rust-cxx-cmake-bridge@0.4.0-dev tabby@0.4.0-dev tabby-common@0.4.0-dev tabby-download@0.4.0-dev tabby-inference@0.4.0-dev tabby-scheduler@0.4.0-dev Generated by cargo-workspaces	2023-10-13 17:54:14 -07:00
Meng Zhang	4dbaf4f312	Release 0.3.0 ctranslate2-bindings@0.3.0 http-api-bindings@0.3.0 llama-cpp-bindings@0.3.0 rust-cxx-cmake-bridge@0.3.0 tabby@0.3.0 tabby-common@0.3.0 tabby-download@0.3.0 tabby-inference@0.3.0 tabby-scheduler@0.3.0 Generated by cargo-workspaces	2023-10-13 17:45:07 -07:00
Meng Zhang	eb463ba496	Release 0.3.0-rc.1 ctranslate2-bindings@0.3.0-rc.1 http-api-bindings@0.3.0-rc.1 llama-cpp-bindings@0.3.0-rc.1 rust-cxx-cmake-bridge@0.3.0-rc.1 tabby@0.3.0-rc.1 tabby-common@0.3.0-rc.1 tabby-download@0.3.0-rc.1 tabby-inference@0.3.0-rc.1 tabby-scheduler@0.3.0-rc.1 Generated by cargo-workspaces	2023-10-13 11:43:34 -07:00
Meng Zhang	182aceed41	Release 0.3.0-rc.0 ctranslate2-bindings@0.3.0-rc.0 http-api-bindings@0.3.0-rc.0 llama-cpp-bindings@0.3.0-rc.0 tabby@0.3.0-rc.0 tabby-common@0.3.0-rc.0 tabby-download@0.3.0-rc.0 tabby-inference@0.3.0-rc.0 tabby-scheduler@0.3.0-rc.0 Generated by cargo-workspaces	2023-10-13 11:24:36 -07:00
Meng Zhang	6dbb712918	Release 0.3.0-dev ctranslate2-bindings@0.3.0-dev http-api-bindings@0.3.0-dev llama-cpp-bindings@0.3.0-dev tabby@0.3.0-dev tabby-common@0.3.0-dev tabby-download@0.3.0-dev tabby-inference@0.3.0-dev tabby-scheduler@0.3.0-dev Generated by cargo-workspaces	2023-10-09 19:39:27 -07:00
Meng Zhang	1731c3075e	chore: Update version to 0.2.0	2023-10-03 13:32:21 -07:00
Meng Zhang	692c2fe0fd	Release 0.2.0-rc.0 ctranslate2-bindings@0.2.0-rc.0 http-api-bindings@0.2.0-rc.0 llama-cpp-bindings@0.2.0-rc.0 tabby@0.2.0-rc.0 tabby-common@0.2.0-rc.0 tabby-download@0.2.0-rc.0 tabby-inference@0.2.0-rc.0 tabby-scheduler@0.2.0-rc.0 Generated by cargo-workspaces	2023-10-02 19:14:12 -07:00
Meng Zhang	f05dd3a2f6	refactor: cleanup chat api make it message oriented (#497 ) * refactor: refactor into /chat/completions api * Revert "feat: support request level stop words (#492)" This reverts commit `0d6840e372`. * feat: adjust interface * switch interface in tabby-playground * move to chat/prompt, add unit test * update interface	2023-10-02 15:39:15 +00:00
Meng Zhang	dfdd0373a6	fix: when llama model loads failed, panic in rust stack	2023-10-01 22:25:25 -07:00
Meng Zhang	2171ba72ff	refactor: cleanup llama cpp implementations to fix warnings (#495 )	2023-09-30 08:37:36 -07:00
Meng Zhang	0d6840e372	feat: support request level stop words (#492 )	2023-09-29 18:21:57 +00:00
Meng Zhang	486e507079	fix: correct Decoding behavior in incremental manner (#491 ) * feat: implement IncrementalDecoding * refactor: use IncrementalDecoding for ctranslate2 * refactor: rename StopWords to DecodingFactory * refactor: move decoding logic to tabby-inference * feat: optimize decoding range * cleanup	2023-09-29 13:06:47 +00:00

1 2

66 Commits (b9de9d63a1b5804840fa3b7625836566ac4bc812)