From 1fd3adbe0c1ff150ed9b268224e166cb892c72da Mon Sep 17 00:00:00 2001 From: Meng Zhang Date: Sat, 30 Sep 2023 18:05:12 -0700 Subject: [PATCH] docs: add a snippet explaining the streaming example --- website/blog/2023-09-30-stream-laziness-in-tabby/index.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/website/blog/2023-09-30-stream-laziness-in-tabby/index.md b/website/blog/2023-09-30-stream-laziness-in-tabby/index.md index 0f120c2..b19b91f 100644 --- a/website/blog/2023-09-30-stream-laziness-in-tabby/index.md +++ b/website/blog/2023-09-30-stream-laziness-in-tabby/index.md @@ -60,6 +60,7 @@ async function client() { // we know our stream is infinite, so there's no need to check `done`. const { value } = await reader.read(); console.log(`read ${value}`); + await sleep(10ms); } } @@ -67,6 +68,8 @@ server(llm()); client(); ``` +In this example, we are creating an async generator to mimic a LLM that produces string tokens. We then create an HTTP endpoint that wraps the generator, as well as a client that reads values from the HTTP stream. It's important to note that our generator logs `producing ${i}`, and our client logs `read ${value}`. The LLM inference could take an arbitrary amount of time to complete, simulated by a 1000ms sleep in the generator. + ## Stream Laziness If you were to run this program, you'd notice something interesting. We'll observe the LLM continuing to output `producing ${i}` even after the client has finished reading three times. This might seem obvious, given that the LLM is generating an infinite stream of integers. However, it represents a problem: our server must maintain an ever-expanding queue of items that have been pushed in but not pulled out.