63 lines
2.4 KiB
Markdown
63 lines
2.4 KiB
Markdown
# Text generation
|
|
|
|
CTranslate2 exposes high-level classes to run generative language models such as [GPT-2](https://github.com/openai/gpt-2). The main entrypoint is the [`Generator`](python/ctranslate2.Generator.rst) class which provides several methods:
|
|
|
|
| Method name | Description | Example |
|
|
| --- | --- | --- |
|
|
| `generate_batch` | Generate text from a batch of prompts or start tokens. | {ref}`guides/transformers:gpt-2` |
|
|
| `score_batch` | Compute the token-level log-likelihood and the sequence perplexity. | {ref}`guides/fairseq:wmt19 language model` |
|
|
| `generate_tokens` | Stream the generated tokens. | {ref}`generation:token streaming` |
|
|
| `forward_batch` | Get the full output logits (or log probs) for a sequence. | |
|
|
|
|
## Token streaming
|
|
|
|
`generate_tokens` is a convenience method to return tokens as they are generated by the model. This can be useful when running large models in an interactive environment.
|
|
|
|
The example below shows how to use this method and progressively decode SentencePiece tokens. It should be adapted if the model uses a different tokenizer or the generated language does not use a space to separate words.
|
|
|
|
```python
|
|
import ctranslate2
|
|
import sentencepiece as spm
|
|
|
|
generator = ctranslate2.Generator("ct2_model/")
|
|
sp = spm.SentencePieceProcessor("tokenizer.model")
|
|
|
|
prompt = "What is the meaning of life?"
|
|
prompt_tokens = sp.encode(prompt, out_type=str)
|
|
|
|
step_results = generator.generate_tokens(
|
|
prompt_tokens,
|
|
sampling_temperature=0.8,
|
|
sampling_topk=20,
|
|
max_length=1024,
|
|
)
|
|
|
|
output_ids = []
|
|
|
|
for step_result in step_results:
|
|
is_new_word = step_result.token.startswith("▁")
|
|
|
|
if is_new_word and output_ids:
|
|
word = sp.decode(output_ids)
|
|
print(word, end=" ", flush=True)
|
|
output_ids = []
|
|
|
|
output_ids.append(step_result.token_id)
|
|
|
|
if output_ids:
|
|
word = sp.decode(output_ids)
|
|
print(word)
|
|
```
|
|
|
|
```{tip}
|
|
To implement a similar mechanism for batch generation, you can use the arguments `callback` and `include_prompt_in_result=False` in the method `generate_batch`. This is what `generate_tokens` use internally.
|
|
```
|
|
|
|
## Special tokens
|
|
|
|
Special tokens such as the decoder start token `<s>` should be explicitly included in the input if required by the model. No special tokens are added by the generator methods.
|
|
|
|
```{note}
|
|
This is different from the translator methods which usually include these special tokens implicitly.
|
|
```
|