Response Objects

Generated text from Non-Persistent Pipelines and Persistent Deployments are wrapped in the Response class.

class mii.batching.data_classes.Response(generated_text, prompt_length, generated_length, finish_reason)[source]

Response object returns from text-generation pipelines and persistent deployments.

generated_text: str: The generated text.

prompt_length: int: Number of tokens in the prompt.

generated_length: int: Number of generated tokens.

finish_reason: GenerationFinishReason: Reason for ending generation. One of mii.constants.GenerationFinishReason.

Printing a Response object will print only the generated_text attribute. Details about the generation can be accessed as python attributes of the class:

responses = pipeline(["DeepSpeed is", "Seattle is"], max_length=128)
for r in responses:
    print(f"generated length: {r.generated_length}, finish reason: {r.finish_reason}")

The reason that a text-generation request completed will be one of the values found in the GenerationFinishReason enum:

class mii.constants.GenerationFinishReason(value)[source]

Reason for text-generation to stop.

STOP = 'stop': Reached an EoS token.

LENGTH = 'length': Reached max_length or max_new_tokens.