tabby/crates/llama-cpp-bindings/include/engine.h

#pragma once

#include "rust/cxx.h"
#include <memory>

namespace llama {

class TextInferenceEngine {
 public:
  virtual ~TextInferenceEngine();

  virtual uint32_t start(const rust::Str prompt, size_t max_input_length) const = 0;
  virtual uint32_t step(uint32_t next_token_id) const = 0;
  virtual void end() const = 0;

  virtual uint32_t eos_token() const = 0;
};

std::shared_ptr<TextInferenceEngine> create_engine(rust::Str model_path);
}  // namespace
feat: llama.cpp for metal support [TAB-146] (#391) * feat: init commit adding llama-cpp-bindings * add llama.cpp submodule * add LlamaEngine to hold llama context / llama model * add cxxbridge * add basic greedy sampling * move files * make compile success * connect TextGeneration with LlamaEngine * experimental support llama.cpp * add metal device * add Accelerate * fix namespace for llama-cpp-bindings * fix lint * move stepping logic to rust * add stop words package * use stop-words in ctranslate2-bindings * use raw string for regex * use Arc<Tokenizer> for sharing tokenizers * refactor: remove useless stop_words_encoding_offset * switch to tokenizers 0.13.4-rc.3 * fix lints in cpp * simplify implementation of greedy decoding * feat: split metal feature for llama backend * add ci * update ci * build tabby bin in ci build 2023-09-03 01:59:07 +00:00			`#pragma once`

			`#include "rust/cxx.h"`
			`#include <memory>`

			`namespace llama {`

			`class TextInferenceEngine {`
			`public:`
			`virtual ~TextInferenceEngine();`

feat: implement input truncation for llama-cpp-bindings (#416) * feat: implement input truncation for llama-cpp-bindings * set max input length to 1024 * fix: batching tokens with n_batches * fix batching 2023-09-08 16:20:51 +00:00			`virtual uint32_t start(const rust::Str prompt, size_t max_input_length) const = 0;`
feat: llama.cpp for metal support [TAB-146] (#391) * feat: init commit adding llama-cpp-bindings * add llama.cpp submodule * add LlamaEngine to hold llama context / llama model * add cxxbridge * add basic greedy sampling * move files * make compile success * connect TextGeneration with LlamaEngine * experimental support llama.cpp * add metal device * add Accelerate * fix namespace for llama-cpp-bindings * fix lint * move stepping logic to rust * add stop words package * use stop-words in ctranslate2-bindings * use raw string for regex * use Arc<Tokenizer> for sharing tokenizers * refactor: remove useless stop_words_encoding_offset * switch to tokenizers 0.13.4-rc.3 * fix lints in cpp * simplify implementation of greedy decoding * feat: split metal feature for llama backend * add ci * update ci * build tabby bin in ci build 2023-09-03 01:59:07 +00:00			`virtual uint32_t step(uint32_t next_token_id) const = 0;`
feat: tune llama metal backend performance (#393) * feat: support eos based stop * feat: print performance stats after each inference * update llama.cpp * update commits 2023-09-05 02:14:29 +00:00			`virtual void end() const = 0;`

			`virtual uint32_t eos_token() const = 0;`
feat: llama.cpp for metal support [TAB-146] (#391) * feat: init commit adding llama-cpp-bindings * add llama.cpp submodule * add LlamaEngine to hold llama context / llama model * add cxxbridge * add basic greedy sampling * move files * make compile success * connect TextGeneration with LlamaEngine * experimental support llama.cpp * add metal device * add Accelerate * fix namespace for llama-cpp-bindings * fix lint * move stepping logic to rust * add stop words package * use stop-words in ctranslate2-bindings * use raw string for regex * use Arc<Tokenizer> for sharing tokenizers * refactor: remove useless stop_words_encoding_offset * switch to tokenizers 0.13.4-rc.3 * fix lints in cpp * simplify implementation of greedy decoding * feat: split metal feature for llama backend * add ci * update ci * build tabby bin in ci build 2023-09-03 01:59:07 +00:00			`};`

			`std::shared_ptr<TextInferenceEngine> create_engine(rust::Str model_path);`
			`} // namespace`