Meng Zhang
486e507079
fix: correct Decoding behavior in incremental manner ( #491 )
...
* feat: implement IncrementalDecoding
* refactor: use IncrementalDecoding for ctranslate2
* refactor: rename StopWords to DecodingFactory
* refactor: move decoding logic to tabby-inference
* feat: optimize decoding range
* cleanup
2023-09-29 13:06:47 +00:00
Meng Zhang
5d9ca6928c
feat: update llama.cpp ( #488 )
...
* feat: update llama.cpp
* remove useless include
2023-09-28 23:59:59 +00:00
Meng Zhang
56b7b850af
fix: Linkage issue on latest xcode commandline tools clang ( #486 )
2023-09-28 17:46:02 +00:00
Meng Zhang
44f013f26e
feat: add /generate and /generate_streaming ( #482 )
...
* feat: add generate_stream interface
* extract engine::create_engine
* feat add generate::generate
* support streaming in llama.cpp
* support streaming in ctranslate2
* update
* fix formatting
* refactor: extract helpers functions
2023-09-28 17:20:50 +00:00
Meng Zhang
97eeb6b926
feat: update llama.cpp to fetch latest starcoder support ( #452 )
...
* feat: bump llama.cpp to HEAD
* fix: turn off add_bos by default
2023-09-16 03:41:49 +00:00
Meng Zhang
30afa19bc0
feat: add LLAMA_CPP_LOG_LEVEL to control log level of llama.cpp ( #436 )
2023-09-12 14:41:39 +00:00
Meng Zhang
ad3b974d5c
feat: implement input truncation for llama-cpp-bindings ( #416 )
...
* feat: implement input truncation for llama-cpp-bindings
* set max input length to 1024
* fix: batching tokens with n_batches
* fix batching
2023-09-09 00:20:51 +08:00
Meng Zhang
e93a971d0e
feat: tune llama metal backend performance ( #393 )
...
* feat: support eos based stop
* feat: print performance stats after each inference
* update llama.cpp
* update commits
2023-09-05 10:14:29 +08:00
vodkaslime
3c7c8d9293
feat: add cargo test to github actions and run only unit tests in ci [TAB-185] ( #390 )
...
* feat: add cargo test to github actions
* chore: fix lint
* chore: add openblas dependency
* chore: update build dependency
* chore: resolve comments
* chore: fix lint
* chore: fix lint
* chore: test installing dependencies
* chore: refactor integ test
* update ci
* cleanup
---------
Co-authored-by: Meng Zhang <meng@tabbyml.com>
2023-09-03 05:04:52 +00:00
Meng Zhang
c8339a2912
refactor: use TabbyML/llama.cpp submodule
2023-09-03 12:38:54 +08:00
Meng Zhang
3acd5d9bc4
refactor: remove llama.cpp subtree
2023-09-03 12:37:26 +08:00
Meng Zhang
92c8ae8ee7
feat: embed ggml-metal.metal
2023-09-03 10:41:03 +08:00
Meng Zhang
ed6c5b2e60
Merge commit 'aad80a58b07836bfbf6aedd50993bc54b4257388' as 'crates/llama-cpp-bindings/llama.cpp'
2023-09-03 10:07:10 +08:00
Meng Zhang
d4137463ef
remove llama.cpp submodule
2023-09-03 10:04:26 +08:00
Meng Zhang
e360b438b4
fix lint
2023-09-03 10:01:28 +08:00
Meng Zhang
3f7aa99b0d
feat: support cancellation in llama backend
2023-09-03 09:59:40 +08:00
Meng Zhang
3573d4378e
feat: llama.cpp for metal support [TAB-146] ( #391 )
...
* feat: init commit adding llama-cpp-bindings
* add llama.cpp submodule
* add LlamaEngine to hold llama context / llama model
* add cxxbridge
* add basic greedy sampling
* move files
* make compile success
* connect TextGeneration with LlamaEngine
* experimental support llama.cpp
* add metal device
* add Accelerate
* fix namespace for llama-cpp-bindings
* fix lint
* move stepping logic to rust
* add stop words package
* use stop-words in ctranslate2-bindings
* use raw string for regex
* use Arc<Tokenizer> for sharing tokenizers
* refactor: remove useless stop_words_encoding_offset
* switch to tokenizers 0.13.4-rc.3
* fix lints in cpp
* simplify implementation of greedy decoding
* feat: split metal feature for llama backend
* add ci
* update ci
* build tabby bin in ci build
2023-09-03 09:59:07 +08:00