📄️ Input Format
The Input params are exactly the same as the
📄️ Output Format
Here's the exact json output and type you can expect from all litellm completion calls for all models
📄️ Streaming + Async
- Streaming Responses
📄️ Trimming Input Messages
Use litellm.trim_messages() to ensure messages does not exceed a model's token limit or specified max_tokens
📄️ Model Alias
The model name you show an end-user might be different from the one you pass to LiteLLM - e.g. Displaying GPT-3.5 while calling gpt-3.5-turbo-16k on the backend.
📄️ Reliability
LiteLLM supports the following functions for reliability:
📄️ Batching Completion() Calls
In the batch_completion method, you provide a list of messages where each sub-list of messages is passed to litellm.completion(), allowing you to process multiple prompts efficiently in a single API call.
📄️ Mock Requests
For testing purposes, you can use mock_completion() to mock calling the completion endpoint.