diff --git a/website/blog/2023-09-13-release-0-1-0-metal/index.md b/website/blog/2023-09-13-release-0-1-0-metal/index.md deleted file mode 100644 index 22800fa..0000000 --- a/website/blog/2023-09-13-release-0-1-0-metal/index.md +++ /dev/null @@ -1,21 +0,0 @@ ---- -authors: [ meng ] ---- -# Highlights of Tabby v0.1.0: Apple M1/M2 Support -We are thrilled to announce the release of Tabby v0.1.0๐Ÿ‘๐Ÿป. - -Thanks to [llama.cpp](https://github.com/ggerganov/llama.cpp), Apple M1/M2 Tabby users can now harness Metal inference support on Apple's M1 and M2 chips by using the `--device metal` flag. - -This enhancement leads to a significant inference speed upgrade๐Ÿš€. It marks a meaningful milestone in Tabby's adoption on Apple devices. Check out our [Model Directory](/docs/models) to discover LLM models with Metal support! ๐ŸŽ - -
- -![Inference](./inference.png) - -*An example inference benchmarking with [CodeLlama-7B](https://huggingface.co/TabbyML/CodeLlama-7B) on Apple M2 Max, takes ~600ms.* - -
- -:::tip -Check out latest Tabby updates on [Linkedin](https://www.linkedin.com/company/tabbyml/) and [Slack community](https://join.slack.com/t/tabbycommunity/shared_invite/zt-1xeiddizp-bciR2RtFTaJ37RBxr8VxpA)! Our Tabby community is eager for your participation. โค๏ธ -::: diff --git a/website/blog/2023-09-13-release-0-1-0-metal/inference.png b/website/blog/2023-09-13-release-0-1-0-metal/inference.png deleted file mode 100644 index ae9048c..0000000 --- a/website/blog/2023-09-13-release-0-1-0-metal/inference.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:702e9b69b54a0b86731c23d199ffe454a2f03437b25f0fe8c25257e9c71b8877 -size 19495 diff --git a/website/blog/2023-09-18-release-0-1-1-metal/index.md b/website/blog/2023-09-18-release-0-1-1-metal/index.md new file mode 100644 index 0000000..bb453f0 --- /dev/null +++ b/website/blog/2023-09-18-release-0-1-1-metal/index.md @@ -0,0 +1,30 @@ +--- +authors: [ meng ] +--- +# Highlights of Tabby v0.1.1: Apple M1/M2 Support +We are thrilled to announce the release of Tabby [v0.1.1](https://github.com/TabbyML/tabby/releases/tag/v0.1.1) ๐Ÿ‘๐Ÿป. + +Apple M1/M2 Tabby users can now harness Metal inference support on Apple's M1 and M2 chips by using the `--device metal` flag, thanks to [llama.cpp](https://github.com/ggerganov/llama.cpp)'s awesome metal support. + +The Tabby team made a [contribution](https://github.com/ggerganov/llama.cpp/pull/3187) by adding support for the StarCoder series models (1B/3B/7B) in llama.cpp, enabling more appropriate model usage on the edge for completion use cases. + +
+ +``` +llama_print_timings: load time = 105.15 ms +llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_print_timings: prompt eval time = 25.07 ms / 6 tokens ( 4.18 ms per token, 239.36 tokens per second) +llama_print_timings: eval time = 311.80 ms / 28 runs ( 11.14 ms per token, 89.80 tokens per second) +llama_print_timings: total time = 340.25 ms +``` + +*Inference benchmarking with [StarCoder-1B](https://huggingface.co/TabbyML/StarCoder-1B) on Apple M2 Max now takes approximately 340ms, compared to the previous time of around 1790ms. This represents a roughly 5x speed improvement.* + +
+ + +This enhancement leads to a significant inference speed upgrade๐Ÿš€, for example, It marks a meaningful milestone in Tabby's adoption on Apple devices. Check out our [Model Directory](/docs/models) to discover LLM models with Metal support! ๐ŸŽ + +:::tip +Check out latest Tabby updates on [Linkedin](https://www.linkedin.com/company/tabbyml/) and [Slack community](https://join.slack.com/t/tabbycommunity/shared_invite/zt-1xeiddizp-bciR2RtFTaJ37RBxr8VxpA)! Our Tabby community is eager for your participation. โค๏ธ +:::