2.4 KiB

Raw Blame History

Text generation

CTranslate2 exposes high-level classes to run generative language models such as GPT-2. The main entrypoint is the Generator class which provides several methods:

Method name	Description	Example
`generate_batch`	Generate text from a batch of prompts or start tokens.	{ref}`guides/transformers:gpt-2`
`score_batch`	Compute the token-level log-likelihood and the sequence perplexity.	{ref}`guides/fairseq:wmt19 language model`
`generate_tokens`	Stream the generated tokens.	{ref}`generation:token streaming`
`forward_batch`	Get the full output logits (or log probs) for a sequence.

Token streaming

generate_tokens is a convenience method to return tokens as they are generated by the model. This can be useful when running large models in an interactive environment.

The example below shows how to use this method and progressively decode SentencePiece tokens. It should be adapted if the model uses a different tokenizer or the generated language does not use a space to separate words.

import ctranslate2
import sentencepiece as spm

generator = ctranslate2.Generator("ct2_model/")
sp = spm.SentencePieceProcessor("tokenizer.model")

prompt = "What is the meaning of life?"
prompt_tokens = sp.encode(prompt, out_type=str)

step_results = generator.generate_tokens(
    prompt_tokens,
    sampling_temperature=0.8,
    sampling_topk=20,
    max_length=1024,
)

output_ids = []

for step_result in step_results:
    is_new_word = step_result.token.startswith("▁")

    if is_new_word and output_ids:
        word = sp.decode(output_ids)
        print(word, end=" ", flush=True)
        output_ids = []

    output_ids.append(step_result.token_id)

if output_ids:
    word = sp.decode(output_ids)
    print(word)

To implement a similar mechanism for batch generation, you can use the arguments `callback` and `include_prompt_in_result=False` in the method `generate_batch`. This is what `generate_tokens` use internally.

Special tokens

Special tokens such as the decoder start token <s> should be explicitly included in the input if required by the model. No special tokens are added by the generator methods.

This is different from the translator methods which usually include these special tokens implicitly.

2.4 KiB Raw Blame History

Text generation

Token streaming

Special tokens

2.4 KiB

Raw Blame History