Commit Graph

8 Commits (296342efd84e862d4c3de7a6e2b359288d46b7fb)

Author SHA1 Message Date
Meng Zhang 296342efd8
refactor: use llama.cpp tokenizer (#683)
* refactor: switch to llama.cpp tokenizer to simplify implementation

* refactor: remove tokenizer dependency from tabby

* refactor: renaming decoding to stop condition

* refactor: remove tokenizer dependency

* refactor: remove submodule

* chore: update formatting

* move tokenization to c++
2023-10-31 22:16:09 +00:00
Meng Zhang 7bd99d14c0
feat: support continuous batching in llama.cpp backend (#659)
* refactor: switch back to llama batch interface

* feat: support cont batching
2023-10-28 23:37:05 -07:00
Meng Zhang 1a4c2aa71f
feat: swtich cpu backend to llama.cpp (#638)
* feat: swtich Cpu backend to llama.cpp

* feat: switch cpu serving to ggml

* fix cargo.toml

* use optional dependency

* fix compliation

* update ci target
2023-10-25 15:40:11 -07:00
Meng Zhang 2171ba72ff
refactor: cleanup llama cpp implementations to fix warnings (#495) 2023-09-30 08:37:36 -07:00
Meng Zhang 486e507079
fix: correct Decoding behavior in incremental manner (#491)
* feat: implement IncrementalDecoding

* refactor: use IncrementalDecoding for ctranslate2

* refactor: rename StopWords to DecodingFactory

* refactor: move decoding logic to tabby-inference

* feat: optimize decoding range

* cleanup
2023-09-29 13:06:47 +00:00
Meng Zhang ad3b974d5c
feat: implement input truncation for llama-cpp-bindings (#416)
* feat: implement input truncation for llama-cpp-bindings

* set max input length to 1024

* fix: batching tokens with n_batches

* fix batching
2023-09-09 00:20:51 +08:00
Meng Zhang e93a971d0e
feat: tune llama metal backend performance (#393)
* feat: support eos based stop

* feat: print performance stats after each inference

* update llama.cpp

* update commits
2023-09-05 10:14:29 +08:00
Meng Zhang 3573d4378e
feat: llama.cpp for metal support [TAB-146] (#391)
* feat: init commit adding llama-cpp-bindings

* add llama.cpp submodule

* add LlamaEngine to hold llama context / llama model

* add cxxbridge

* add basic greedy sampling

* move files

* make compile success

* connect TextGeneration with LlamaEngine

* experimental support llama.cpp

* add metal device

* add Accelerate

* fix namespace for llama-cpp-bindings

* fix lint

* move stepping logic to rust

* add stop words package

* use stop-words in ctranslate2-bindings

* use raw string for regex

* use Arc<Tokenizer> for sharing tokenizers

* refactor: remove useless stop_words_encoding_offset

* switch to tokenizers 0.13.4-rc.3

* fix lints in cpp

* simplify implementation of greedy decoding

* feat: split metal feature for llama backend

* add ci

* update ci

* build tabby bin in ci build
2023-09-03 09:59:07 +08:00