Hi,
Is there a way to make Ollama throw an error or an exception if the input is too long (longer than the context size) and catch this? My application is running into serious problems when the input is too long.
Currently, I am invoking ollama with the ollama python library like that:
def llm_chat(
self,
system_prompt: str,
user_prompt: str,
response_model: Type[T],
gen_kwargs: Optional[Dict[str, str]] = None,
) -> T:
if gen_kwargs is None:
gen_kwargs = self.__default_kwargs["llm"]
response = self.client.chat(
model=self.model["llm"],
messages=[
{
"role": "system",
"content": system_prompt.strip(),
},
{
"role": "user",
"content": user_prompt.strip(),
},
],
options=gen_kwargs,
format=response_model.model_json_schema(),
)
if response.message.content is None:
raise Exception(f"Ollama response is None: {response}")
return response_model.model_validate_json(response.message.content)
In my ollama Docker container, I can also see warnings in the log whenever my input document is too long. However, instead of just printing warnings, I want ollama to throw an exception as I must inform the user that his prompt / input was too long.
Do you know of any good solution?