docs: add a snippet explaining the streaming example
parent
f7ecab5bca
commit
1fd3adbe0c
|
|
@ -60,6 +60,7 @@ async function client() {
|
|||
// we know our stream is infinite, so there's no need to check `done`.
|
||||
const { value } = await reader.read();
|
||||
console.log(`read ${value}`);
|
||||
await sleep(10ms);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -67,6 +68,8 @@ server(llm());
|
|||
client();
|
||||
```
|
||||
|
||||
In this example, we are creating an async generator to mimic a LLM that produces string tokens. We then create an HTTP endpoint that wraps the generator, as well as a client that reads values from the HTTP stream. It's important to note that our generator logs `producing ${i}`, and our client logs `read ${value}`. The LLM inference could take an arbitrary amount of time to complete, simulated by a 1000ms sleep in the generator.
|
||||
|
||||
## Stream Laziness
|
||||
|
||||
If you were to run this program, you'd notice something interesting. We'll observe the LLM continuing to output `producing ${i}` even after the client has finished reading three times. This might seem obvious, given that the LLM is generating an infinite stream of integers. However, it represents a problem: our server must maintain an ever-expanding queue of items that have been pushed in but not pulled out.
|
||||
|
|
|
|||
Loading…
Reference in New Issue