Tabby v0.1.1: Metal inference and StarCoder supports in llama.cpp!

We are thrilled to announce the release of Tabby v0.1.1 👏🏻.

Apple M1/M2 Tabby users can now harness Metal inference support on Apple's M1 and M2 chips by using the --device metal flag, thanks to llama.cpp's awesome metal support.

The Tabby team made a contribution by adding support for the StarCoder series models (1B/3B/7B) in llama.cpp, enabling more appropriate model usage on the edge for completion use cases.

llama_print_timings:        load time =   105.15 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =    25.07 ms /     6 tokens (    4.18 ms per token,   239.36 tokens per second)
llama_print_timings:        eval time =   311.80 ms /    28 runs   (   11.14 ms per token,    89.80 tokens per second)
llama_print_timings:       total time =   340.25 ms

Inference benchmarking with StarCoder-1B on Apple M2 Max now takes approximately 340ms, compared to the previous time of around 1790ms. This represents a roughly 5x speed improvement.

This enhancement leads to a significant inference speed upgrade🚀, for example, It marks a meaningful milestone in Tabby's adoption on Apple devices. Check out our Model Directory to discover LLM models with Metal support! 🎁

:::tip Check out latest Tabby updates on Linkedin and Slack community! Our Tabby community is eager for your participation. ❤️ :::

2.0 KiB Raw Blame History

Tabby v0.1.1: Metal inference and StarCoder supports in llama.cpp!

2.0 KiB

Raw Blame History