tabby/crates/ctranslate2-bindings/ctranslate2/docs/generation.md

# Text generation

CTranslate2 exposes high-level classes to run generative language models such as [GPT-2](https://github.com/openai/gpt-2). The main entrypoint is the [`Generator`](python/ctranslate2.Generator.rst) class which provides several methods:

| Method name | Description | Example |
| --- | --- | --- |
| `generate_batch` | Generate text from a batch of prompts or start tokens. | {ref}`guides/transformers:gpt-2` |
| `score_batch` | Compute the token-level log-likelihood and the sequence perplexity. | {ref}`guides/fairseq:wmt19 language model` |
| `generate_tokens` | Stream the generated tokens. | {ref}`generation:token streaming` |
| `forward_batch` | Get the full output logits (or log probs) for a sequence. | |

## Token streaming

`generate_tokens` is a convenience method to return tokens as they are generated by the model. This can be useful when running large models in an interactive environment.

The example below shows how to use this method and progressively decode SentencePiece tokens. It should be adapted if the model uses a different tokenizer or the generated language does not use a space to separate words.

```python
import ctranslate2
import sentencepiece as spm

generator = ctranslate2.Generator("ct2_model/")
sp = spm.SentencePieceProcessor("tokenizer.model")

prompt = "What is the meaning of life?"
prompt_tokens = sp.encode(prompt, out_type=str)

step_results = generator.generate_tokens(
    prompt_tokens,
    sampling_temperature=0.8,
    sampling_topk=20,
    max_length=1024,
)

output_ids = []

for step_result in step_results:
    is_new_word = step_result.token.startswith("▁")

    if is_new_word and output_ids:
        word = sp.decode(output_ids)
        print(word, end=" ", flush=True)
        output_ids = []

    output_ids.append(step_result.token_id)

if output_ids:
    word = sp.decode(output_ids)
    print(word)
```

```{tip}
To implement a similar mechanism for batch generation, you can use the arguments `callback` and `include_prompt_in_result=False` in the method `generate_batch`. This is what `generate_tokens` use internally.
```

## Special tokens

Special tokens such as the decoder start token `<s>` should be explicitly included in the input if required by the model. No special tokens are added by the generator methods.

```{note}
This is different from the translator methods which usually include these special tokens implicitly.
```