Skip to main content

Completion

Completion models are designed for single-turn tasks that require generating text based on a given prompt but don't necessitate maintaining a conversational history. These models excel in applications like text summarization, code generation, and translation, where the focus is on generating accurate and relevant content in one go, rather than engaging in back-and-forth dialogue. In contrast, Chat models are optimized for interactive, multi-turn conversations, and they are better at understanding and generating nuanced responses within a conversational context. While both types of models are capable of generating text, Completion models are generally more suited for tasks that don't require the complexities of conversational state and context.

This API is designed to be used across multiple providers, and certain parameters may work only with certain models or providers. Please consult the completion models page to find more details.

CompletionModel

Description

Handles predictions from a completion model.

__init__(self, model_name: str, **kwargs: Dict[str, Any])

Description

Initializes a CompletionModel instance.

Parameters

  • model_name (str): The name of the model.
  • **kwargs (Dict[str, Any]): Additional keyword arguments.

Method: complete(self, prompts: List[str], max_output_tokens: int = 1024, temperature: float = 0.2, **kwargs: Dict[str, Any]) -> CompletionModelResponse

Description

Makes a generation based on the prompts and parameters.

Parameters

  • prompts (List[str]): The list of prompts.
  • max_output_tokens (int): The maximum number of output tokens. Default is 1024.
  • temperature (float): Controls the randomness of the output. Default is 0.2.
  • **kwargs (Dict[str, Any]): Additional keyword arguments.

Returns

  • CompletionModelResponse: The response from the model.

Method: async_complete(self, prompts: List[str], max_output_tokens: int = 1024, temperature: float = 0.2, **kwargs) -> CompletionModelResponse

Description

Makes an asynchronous generation based on the prompts and parameters.

Parameters

  • prompts (List[str]): The list of prompts.
  • max_output_tokens (int): The maximum number of output tokens. Default is 1024.
  • temperature (float): Controls the randomness of the output. Default is 0.2.
  • **kwargs (Dict[str, Any]): Additional keyword arguments.

Returns

  • CompletionModelResponse: The response from the model.

Method: stream_complete(self, prompts: List[str], max_output_tokens: int = 1024, temperature: float = 0.2, **kwargs: Dict[str, Any]) -> Iterator[CompletionModelResponse]

Description

Streams generations based on the prompts and parameters.

Parameters

  • prompts (List[str]): The list of prompts.
  • max_output_tokens (int): The maximum number of output tokens. Default is 1024.
  • temperature (float): Controls the randomness of the output. Default is 0.2.
  • **kwargs (Dict[str, Any]): Additional keyword arguments.

Returns

  • Iterator[CompletionModelResponse]: An iterator of the responses from the model.

Method: async_stream_complete(self, prompts: List[str], max_output_tokens: int = 1024, temperature: float = 0.2, **kwargs: Dict[str, Any]) -> Iterator[CompletionModelResponse]

Description

Streams asynchronous predictions based on the prompts and parameters.

Parameters

  • prompts (List[str]): The list of prompts.
  • max_output_tokens (int): The maximum number of output tokens. Default is 1024.
  • temperature (float): Controls the randomness of the output. Default is 0.2.
  • **kwargs (Dict[str, Any]): Additional keyword arguments.

Returns

  • Iterator[CompletionModelResponse]: An iterator of the responses from the model.

Response

The resultant CompletionModelResponse has two main fields:

  • metadata containing metadata information for the call like the token count, char count etc.
  • responses containing the response choices, where each candidate has the response message with its content,  and additional metadata on the response provided the specific model used
print(response.model_dump())
>>> {
"metadata": {
"inputTokenCount": {
"billableTokens": 0,
"unbilledTokens": 7,
"billableCharacters": 23,
"unbilledCharacters": 0,
},
"outputTokenCount": {
"billableTokens": 0,
"unbilledTokens": 50,
"billableCharacters": 220,
"unbilledCharacters": 0,
},
},
"responses": [
{
"choices": [
{
"content": " The meaning of life is a deep question that has been pondered by philosophers, theologians, and artists for centuries. There is no one answer that is universally agreed upon, but some common themes that emerge include:\n\n* Finding happiness and fulfillment in one'",
"metadata": {
"safetyAttributes": {
"blocked": False,
"categories": ["Health", "Religion & Belief"],
"scores": [0.1, 0.7],
},
"citationMetadata": {"citations": []},
},
}
]
}
],
}

Examples

Synchronous

from replit.ai.modelfarm import CompletionModel, CompletionModelResponse

model: CompletionModel = CompletionModel("text-bison")

prompts = ["What is the meaning of life?"]

# synchronous, non-streaming call
response: CompletionModelResponse = model.complete(prompts,
max_output_tokens=50,
temperature=0.2)
print(response)

Syncronous - Streaming

from replit.ai.modelfarm import CompletionModel, CompletionModelResponse

model: CompletionModel = CompletionModel("text-bison")

prompts = ["What is the meaning of life?"]


responses: list[CompletionModelResponse] = model.stream_complete(
prompts, max_output_tokens=50, temperature=0.2)
for response in responses:
print(response)

Asynchronous

import asyncio
from replit.ai.modelfarm import CompletionModel, CompletionModelResponse

async def main():
model: CompletionModel = CompletionModel("text-bison")

prompt = "What is the meaning of life?"

# asychronous non-streaming call
responses = await model.async_complete([prompt])
for response in responses:
print(response)

asyncio.run(main())

Asynchronous - Streaming

import asyncio
from replit.ai.modelfarm import CompletionModel, CompletionModelResponse

async def main():
model: CompletionModel = CompletionModel("text-bison")

prompt = "What is the meaning of life?"

# asynchronous streaming call
responses = model.async_stream_complete([prompt])
async for async_response in responses:
# each response is a ChatModelResponse
print(async_response)

asyncio.run(main())
Was this helpful?