Erfan Safari
138b7459c5
feat: add LLAMA_CPP_N_THREADS env ( #742 )
...
* feat: add LLAMA_CPP_N_THREADS and LLAMA_CPP_N_THREADS_BATCH envs
* apply format
* improve: use LLAMA_CPP_N_THREADS for both n_threads and n_threads_batch
* Update crates/llama-cpp-bindings/src/engine.cc
---------
Co-authored-by: Meng Zhang <meng@tabbyml.com>
2023-11-09 19:54:23 +00:00
Meng Zhang
8c669dee8e
fix: llama.cpp queuing logic ( #741 )
2023-11-09 08:29:54 +00:00
Meng Zhang
8ab35b2639
feat: add --parallelism to control throughput and vram usage ( #727 )
...
* feat: add --parallelism to control throughput and vram usage
* update default
* Revert "update default"
This reverts commit 349792c0d48d913dcd8be4ce1c9d7ce887918f29.
* cargo fmt
2023-11-08 18:31:22 +00:00
Meng Zhang
1ad0d39903
fix: deadlock between background job and requests ( #720 )
...
* fix: deadlock between background job and requests
* refactor: extract LlamaService
2023-11-07 13:11:28 -08:00
Meng Zhang
eb7ae96157
fix: llama.cpp requires kv cache to be N_CTX * parallelism ( #714 )
2023-11-07 06:16:36 +00:00
Meng Zhang
9344c32b31
fix: when there's an error happens in background inference loop, it should exit the process ( #713 )
2023-11-06 20:41:49 +00:00
Meng Zhang
64e0abb8cc
fix(llama.cpp): wrongly index for n_seq in warmup
2023-11-04 17:53:22 -07:00
Meng Zhang
c7c67c2f90
fix: llama.cpp warmp logic
2023-11-04 14:28:04 -07:00
Meng Zhang
acb3a33d78
fix: handle non utf-8 / utf-16 error
2023-11-02 16:29:30 -07:00
Meng Zhang
eb34850a5e
fix: output err if step failed
2023-11-02 16:15:11 -07:00
Meng Zhang
4c7eae584e
feat: add model warmup logic ( #693 )
2023-11-02 23:07:32 +00:00
Meng Zhang
296342efd8
refactor: use llama.cpp tokenizer ( #683 )
...
* refactor: switch to llama.cpp tokenizer to simplify implementation
* refactor: remove tokenizer dependency from tabby
* refactor: renaming decoding to stop condition
* refactor: remove tokenizer dependency
* refactor: remove submodule
* chore: update formatting
* move tokenization to c++
2023-10-31 22:16:09 +00:00
Meng Zhang
89a63dbf33
fix: when send failed, treat the request as stopped ( #673 )
2023-10-30 06:27:09 +00:00
Meng Zhang
7330d75de6
chore: clear cache when there's no active requests
2023-10-29 16:30:30 -07:00
Meng Zhang
7bd99d14c0
feat: support continuous batching in llama.cpp backend ( #659 )
...
* refactor: switch back to llama batch interface
* feat: support cont batching
2023-10-28 23:37:05 -07:00
Meng Zhang
444222683a
fix(llama.cpp): bump upstream fix for starcoder model on cuda
2023-10-28 02:03:34 -07:00
Meng Zhang
f37840566b
feat: upgrade llama.cpp ( #645 )
...
* feat: upgrade llama.cpp
* update download files
* update changelog
* Update CHANGELOG.md
* Update CHANGELOG.md
2023-10-27 12:18:46 -07:00
Meng Zhang
1a4c2aa71f
feat: swtich cpu backend to llama.cpp ( #638 )
...
* feat: swtich Cpu backend to llama.cpp
* feat: switch cpu serving to ggml
* fix cargo.toml
* use optional dependency
* fix compliation
* update ci target
2023-10-25 15:40:11 -07:00
Meng Zhang
99a7053b6f
refactor: extract language configuration into individual toml file ( #564 )
...
* refactor: extract language configuration into individual toml file
* feat: add golang language configuration (#565 )
2023-10-16 00:24:44 +00:00
Meng Zhang
f05dd3a2f6
refactor: cleanup chat api make it message oriented ( #497 )
...
* refactor: refactor into /chat/completions api
* Revert "feat: support request level stop words (#492 )"
This reverts commit 0d6840e372 .
* feat: adjust interface
* switch interface in tabby-playground
* move to chat/prompt, add unit test
* update interface
2023-10-02 15:39:15 +00:00
Meng Zhang
dfdd0373a6
fix: when llama model loads failed, panic in rust stack
2023-10-01 22:25:25 -07:00
Meng Zhang
2171ba72ff
refactor: cleanup llama cpp implementations to fix warnings ( #495 )
2023-09-30 08:37:36 -07:00
Meng Zhang
0d6840e372
feat: support request level stop words ( #492 )
2023-09-29 18:21:57 +00:00
Meng Zhang
486e507079
fix: correct Decoding behavior in incremental manner ( #491 )
...
* feat: implement IncrementalDecoding
* refactor: use IncrementalDecoding for ctranslate2
* refactor: rename StopWords to DecodingFactory
* refactor: move decoding logic to tabby-inference
* feat: optimize decoding range
* cleanup
2023-09-29 13:06:47 +00:00
Meng Zhang
5d9ca6928c
feat: update llama.cpp ( #488 )
...
* feat: update llama.cpp
* remove useless include
2023-09-28 23:59:59 +00:00
Meng Zhang
44f013f26e
feat: add /generate and /generate_streaming ( #482 )
...
* feat: add generate_stream interface
* extract engine::create_engine
* feat add generate::generate
* support streaming in llama.cpp
* support streaming in ctranslate2
* update
* fix formatting
* refactor: extract helpers functions
2023-09-28 17:20:50 +00:00
Meng Zhang
97eeb6b926
feat: update llama.cpp to fetch latest starcoder support ( #452 )
...
* feat: bump llama.cpp to HEAD
* fix: turn off add_bos by default
2023-09-16 03:41:49 +00:00
Meng Zhang
30afa19bc0
feat: add LLAMA_CPP_LOG_LEVEL to control log level of llama.cpp ( #436 )
2023-09-12 14:41:39 +00:00
Meng Zhang
ad3b974d5c
feat: implement input truncation for llama-cpp-bindings ( #416 )
...
* feat: implement input truncation for llama-cpp-bindings
* set max input length to 1024
* fix: batching tokens with n_batches
* fix batching
2023-09-09 00:20:51 +08:00
Meng Zhang
e93a971d0e
feat: tune llama metal backend performance ( #393 )
...
* feat: support eos based stop
* feat: print performance stats after each inference
* update llama.cpp
* update commits
2023-09-05 10:14:29 +08:00
Meng Zhang
b0074d7e30
feat: support cancellation in llama backend [TAB-146] ( #392 )
...
* feat: support cancellation in llama backend
* fix lint
2023-09-03 02:15:54 +00:00
Meng Zhang
3573d4378e
feat: llama.cpp for metal support [TAB-146] ( #391 )
...
* feat: init commit adding llama-cpp-bindings
* add llama.cpp submodule
* add LlamaEngine to hold llama context / llama model
* add cxxbridge
* add basic greedy sampling
* move files
* make compile success
* connect TextGeneration with LlamaEngine
* experimental support llama.cpp
* add metal device
* add Accelerate
* fix namespace for llama-cpp-bindings
* fix lint
* move stepping logic to rust
* add stop words package
* use stop-words in ctranslate2-bindings
* use raw string for regex
* use Arc<Tokenizer> for sharing tokenizers
* refactor: remove useless stop_words_encoding_offset
* switch to tokenizers 0.13.4-rc.3
* fix lints in cpp
* simplify implementation of greedy decoding
* feat: split metal feature for llama backend
* add ci
* update ci
* build tabby bin in ci build
2023-09-03 09:59:07 +08:00