docs: add a snippet explaining the streaming example

release-0.2
Meng Zhang 2023-09-30 18:05:12 -07:00 committed by GitHub
parent f7ecab5bca
commit 1fd3adbe0c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 0 deletions

View File

@ -60,6 +60,7 @@ async function client() {
// we know our stream is infinite, so there's no need to check `done`. // we know our stream is infinite, so there's no need to check `done`.
const { value } = await reader.read(); const { value } = await reader.read();
console.log(`read ${value}`); console.log(`read ${value}`);
await sleep(10ms);
} }
} }
@ -67,6 +68,8 @@ server(llm());
client(); client();
``` ```
In this example, we are creating an async generator to mimic a LLM that produces string tokens. We then create an HTTP endpoint that wraps the generator, as well as a client that reads values from the HTTP stream. It's important to note that our generator logs `producing ${i}`, and our client logs `read ${value}`. The LLM inference could take an arbitrary amount of time to complete, simulated by a 1000ms sleep in the generator.
## Stream Laziness ## Stream Laziness
If you were to run this program, you'd notice something interesting. We'll observe the LLM continuing to output `producing ${i}` even after the client has finished reading three times. This might seem obvious, given that the LLM is generating an infinite stream of integers. However, it represents a problem: our server must maintain an ever-expanding queue of items that have been pushed in but not pulled out. If you were to run this program, you'd notice something interesting. We'll observe the LLM continuing to output `producing ${i}` even after the client has finished reading three times. This might seem obvious, given that the LLM is generating an infinite stream of integers. However, it represents a problem: our server must maintain an ever-expanding queue of items that have been pushed in but not pulled out.