tabby/crates/llama-cpp-bindings/include/engine.h

#pragma once

#include "rust/cxx.h"
#include <memory>

namespace llama {
struct StepOutput;

class TextInferenceEngine {
 public:
  virtual ~TextInferenceEngine();

  virtual void add_request(uint32_t request_id, rust::Str text, size_t max_input_length) = 0;
  virtual void stop_request(uint32_t request_id) = 0;
  virtual rust::Vec<StepOutput> step() = 0;
};

std::unique_ptr<TextInferenceEngine> create_engine(
  bool use_gpu,
  rust::Str model_path,
  uint8_t paralellism,
  bool enable_prompt_lookup
);
}  // namespace
feat: llama.cpp for metal support [TAB-146] (#391) * feat: init commit adding llama-cpp-bindings * add llama.cpp submodule * add LlamaEngine to hold llama context / llama model * add cxxbridge * add basic greedy sampling * move files * make compile success * connect TextGeneration with LlamaEngine * experimental support llama.cpp * add metal device * add Accelerate * fix namespace for llama-cpp-bindings * fix lint * move stepping logic to rust * add stop words package * use stop-words in ctranslate2-bindings * use raw string for regex * use Arc<Tokenizer> for sharing tokenizers * refactor: remove useless stop_words_encoding_offset * switch to tokenizers 0.13.4-rc.3 * fix lints in cpp * simplify implementation of greedy decoding * feat: split metal feature for llama backend * add ci * update ci * build tabby bin in ci build 2023-09-03 01:59:07 +00:00			`#pragma once`

			`#include "rust/cxx.h"`
			`#include <memory>`

			`namespace llama {`
refactor: use llama.cpp tokenizer (#683) * refactor: switch to llama.cpp tokenizer to simplify implementation * refactor: remove tokenizer dependency from tabby * refactor: renaming decoding to stop condition * refactor: remove tokenizer dependency * refactor: remove submodule * chore: update formatting * move tokenization to c++ 2023-10-31 22:16:09 +00:00			`struct StepOutput;`
feat: llama.cpp for metal support [TAB-146] (#391) * feat: init commit adding llama-cpp-bindings * add llama.cpp submodule * add LlamaEngine to hold llama context / llama model * add cxxbridge * add basic greedy sampling * move files * make compile success * connect TextGeneration with LlamaEngine * experimental support llama.cpp * add metal device * add Accelerate * fix namespace for llama-cpp-bindings * fix lint * move stepping logic to rust * add stop words package * use stop-words in ctranslate2-bindings * use raw string for regex * use Arc<Tokenizer> for sharing tokenizers * refactor: remove useless stop_words_encoding_offset * switch to tokenizers 0.13.4-rc.3 * fix lints in cpp * simplify implementation of greedy decoding * feat: split metal feature for llama backend * add ci * update ci * build tabby bin in ci build 2023-09-03 01:59:07 +00:00
			`class TextInferenceEngine {`
			`public:`
			`virtual ~TextInferenceEngine();`

refactor: use llama.cpp tokenizer (#683) * refactor: switch to llama.cpp tokenizer to simplify implementation * refactor: remove tokenizer dependency from tabby * refactor: renaming decoding to stop condition * refactor: remove tokenizer dependency * refactor: remove submodule * chore: update formatting * move tokenization to c++ 2023-10-31 22:16:09 +00:00			`virtual void add_request(uint32_t request_id, rust::Str text, size_t max_input_length) = 0;`
feat: support continuous batching in llama.cpp backend (#659) * refactor: switch back to llama batch interface * feat: support cont batching 2023-10-29 06:37:05 +00:00			`virtual void stop_request(uint32_t request_id) = 0;`
refactor: use llama.cpp tokenizer (#683) * refactor: switch to llama.cpp tokenizer to simplify implementation * refactor: remove tokenizer dependency from tabby * refactor: renaming decoding to stop condition * refactor: remove tokenizer dependency * refactor: remove submodule * chore: update formatting * move tokenization to c++ 2023-10-31 22:16:09 +00:00			`virtual rust::Vec<StepOutput> step() = 0;`
feat: llama.cpp for metal support [TAB-146] (#391) * feat: init commit adding llama-cpp-bindings * add llama.cpp submodule * add LlamaEngine to hold llama context / llama model * add cxxbridge * add basic greedy sampling * move files * make compile success * connect TextGeneration with LlamaEngine * experimental support llama.cpp * add metal device * add Accelerate * fix namespace for llama-cpp-bindings * fix lint * move stepping logic to rust * add stop words package * use stop-words in ctranslate2-bindings * use raw string for regex * use Arc<Tokenizer> for sharing tokenizers * refactor: remove useless stop_words_encoding_offset * switch to tokenizers 0.13.4-rc.3 * fix lints in cpp * simplify implementation of greedy decoding * feat: split metal feature for llama backend * add ci * update ci * build tabby bin in ci build 2023-09-03 01:59:07 +00:00			`};`

update 2023-11-30 07:44:50 +00:00			`std::unique_ptr<TextInferenceEngine> create_engine(`
			`bool use_gpu,`
			`rust::Str model_path,`
			`uint8_t paralellism,`
			`bool enable_prompt_lookup`
			`);`
feat: llama.cpp for metal support [TAB-146] (#391) * feat: init commit adding llama-cpp-bindings * add llama.cpp submodule * add LlamaEngine to hold llama context / llama model * add cxxbridge * add basic greedy sampling * move files * make compile success * connect TextGeneration with LlamaEngine * experimental support llama.cpp * add metal device * add Accelerate * fix namespace for llama-cpp-bindings * fix lint * move stepping logic to rust * add stop words package * use stop-words in ctranslate2-bindings * use raw string for regex * use Arc<Tokenizer> for sharing tokenizers * refactor: remove useless stop_words_encoding_offset * switch to tokenizers 0.13.4-rc.3 * fix lints in cpp * simplify implementation of greedy decoding * feat: split metal feature for llama backend * add ci * update ci * build tabby bin in ci build 2023-09-03 01:59:07 +00:00			`} // namespace`