API

DeepSpeed-MII provides a very simple API to deploy your LLM:

mii.pipeline(model_name_or_path='', model_config=None, all_rank_output=False, **kwargs)[source]

Creates a non-persistent MII model pipeline from a locally stored model path or HuggingFace model name.

Parameters:

model_name_or_path (str (default: '')) – HuggingFace model name or path to locally stored model. This must be provided here or in the model_config dictionary.
model_config (Optional[Dict] (default: None)) – Dictionary containing model configuration fields. See ModelConfig for a full list of options. Users can pass these options in a dictionary here or as keyword arguments to the function.
all_rank_output (bool (default: False)) – Whether to return generated text on all ranks (when using tensor_parallel>1). If True, all ranks will return the same output. If False, only rank 0 will return output and the rest will return None.

Raises:

UnknownArgument – Raised when provided keyword argument does not match any field in ModelConfig.

Return type:

MIIPipeline

Returns:

Non-persistent model pipeline using ragged batching and dynamic splitfuse.

The mii.pipeline() API is a great way to try DeepSpeed-MII with ragged batching and dynamic splitfuse. The pipeline is non-persistent and only exists for the lifetime of the python script where it is used. For examples of how to use mii.pipeline() please see Non-Persistent Pipelines.

mii.serve(model_name_or_path='', model_config=None, mii_config=None, **kwargs)[source]

Creates a persistent MII model deployment from a locally stored model path or HuggingFace model name.

Parameters:

model_name_or_path (str (default: '')) – HuggingFace model name or path to locally stored model. This must be provided here or in the model_config dictionary.
model_config (Optional[Dict] (default: None)) – Dictionary containing model configuration fields. See ModelConfig for a full list of options. Users can pass these options in a dictionary here or as keyword arguments to the function.
mii_config (Optional[Dict] (default: None)) – Dictionary containing DeepSpeed-MII configuration fields. See MIIConfig for a full list of options. Users can pass these options in a dictionary here or as keyword arguments to the function.

Raises:

UnknownArgument – Raised when provided keyword argument does not match any field in ModelConfig or MIIConfig.

Return type:

MIIClient

Returns:

Client object that can be used to interface with the deployed persistent model server, which uses ragged batching and dynamic splitfuse.

The mii.serve() API is intended for production use cases, where a persistent model deployment is necessary. The persistent deployment utilizes ragged batching and dynamic splitfuse to deliver high throughput and low latency to multiple clients in parallel. For examples of how to use mii.serve() please see Persistent Deployments.

mii.client(model_or_deployment_name)[source]

Creates a client object for interfacing with an existing persistent model deployment.

Parameters:: model_or_deployment_name (str) – Name of the HuggingFace model name for the persistent model deployment. If deployment_name was provided input to mii.serve(), users should provide the deployment_name string instead.
Raises:: UnknownArgument – Raised when provided keyword argument does not match any field in ModelConfig or MIIConfig.
Return type:: MIIClient
Returns:: Client object that can be used to interface with the deployed persistent model server, which uses ragged batching and dynamic splitfuse.

The mii.client() API allows multiple processes to connect to a persistent deployment created with mii.serve(). For examples of how to use mii.client() please see Persistent Deployments.