Initialize self. See help(type(self)) for accurate signature.
ChatTurn
def ChatTurn( refresh_display:Callable=None):
Initialize self. See help(type(self)) for accurate signature.
turns = ChatTurns(refresh_notebook_display(hide_thinking=False, hide_tool_calls=False))turn = turns.new_turn()turn.append_thinking("I think a lot longer.\nIn sentences.\n\nWith line breaks.")time.sleep(0.5)turn.append_content("I need to call 2 tools.")time.sleep(0.5)turn.append_tool_call("myfunc", {"param1": "value1", "param2": "value2"})turn.start_tool_call("myfunc")time.sleep(0.5)turn.end_tool_call("myfunc", 17.43)turn1 = turnturn = turns.new_turn()turn.append_thinking("Ok, I got the first result, now call the second tool.")time.sleep(0.5)turn.append_tool_call("myfunc2", {})turn.start_tool_call("myfunc2")time.sleep(0.5)turn.end_tool_call("myfunc2", "The weather is nice today but clouds an wind are coming for tommorow and the rest of the week will be awful")turn = turns.new_turn()turn.append_thinking("Ok, I got the second result, now I can answer the question.")time.sleep(0.5)turn.append_content("This is the incredible result.")
[Thinking]
I think a lot longer. In sentences.
With line breaks.
I need to call 2 tools.
[Tool call] - model wants to call myfunc with parameters {'param1': 'value1', 'param2': 'value2'} - agent called myfunc at 22:02:46 - myfunc returned 17.43 in 0.503 sec
[Thinking]
Ok, I got the first result, now call the second tool.
[Tool call] - model wants to call myfunc2 with parameters {} - agent called myfunc2 at 22:02:47 - myfunc2 returned The weather is nice today but clouds an wind are coming for tommorow and the rest of the week wil... in 0.503 sec
[Thinking]
Ok, I got the second result, now I can answer the question.
This is the incredible result.
turn1.thinking, turn1.content
('I think a lot longer.\nIn sentences.\n\nWith line breaks.',
'I need to call 2 tools.')
[Tool call] - model wants to call myfunc with parameters {'param1': 'value1', 'param2': 'value2'} - agent called myfunc at 22:02:46 - myfunc returned 17.43 in 0.503 sec
Native tool calling
Use python functions as tools callable by Large Language Models.
The python functions must be fully documented: - type annotations are mandatory on all parameters and on the return type - a docstring after the function definition is mandatory, it should explain the return value - a descriptive comment after each parameter is also mandatory - the expected format is: one parameter by line, a traditional python comment at the end of the line
def add(a: int, # The first number b: int# The second number ) ->int: # The sum of the two numbers"""Add two numbers"""return a + bdef multiply(a: int, # The first number b: int# The second number ) ->int: # The product of the two numbers"""Multiply two numbers"""return a * b
Tool description format for ollama API
Here is the code used to process the tools parameter:
for unprocessed_tool in tools or []:yield convert_function_to_tool(unprocessed_tool) ifcallable(unprocessed_tool) else Tool.model_validate(unprocessed_tool)
So we can pass either a list of pyhton functions or a list of dictionaries conforming to a specific tool schema.
Here are the expectations for the python functions documentation:
def convert_function_to_tool(func: Callable) -> Tool:->def _parse_docstring(doc_string: Union[str, None]) ->dict[str, str]: ...for line in doc_string.splitlines(): ...if lowered_line.startswith('args:'): key ='args'elif lowered_line.startswith(('returns:', 'yields:', 'raises:')): key ='_' ...for line in parsed_docstring['args'].splitlines(): ...if':'in line:# Split the line on either:# 1. A parenthetical expression like (integer) - captured in group 1# 2. A colon :# Followed by optional whitespace. Only split on first occurrence. ...
This is much less robust and readable than what toolslm.funccall.get_schema does, so we will preprocess the list of python functions ourselves.
Now let’s see what tool description schema is expected by ollama.
pydantic Tool.model_validate() accepts: - dict - Pydantic model instances - Objects with attributes (ORM-style, if configured)
So this is the schema expected by the openai completions API, as we will see below.
Tool description formats for the openai API
The legacy openai completions API: client.chat.completions.create(...) expects tools to be described in a json format that uses the “wrapped function” schema:
“Execute tools implemented as python functions with Large Language Models. The python functions must be fully documented: - type annotations are mandatory on all parameters and on the return type - a docstring after the function definition is mandatory - a descriptive comment after each parameter and the return type is also mandatory - the expected format is: one parameter by line, a traditional python comment at the end of the line
images.add_web_url("https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09adff6c2e6fbc5fd0dfa.jpg")for image,is_url in images.get_base64_data_or_url():print(image[:50], is_url)
Helper class that provides a standard way to create an ABC using inheritance.
ollama model client
OllamaModelClient
def OllamaModelClient( model:str, context_size:int=32768, # This is the default value for the ollama server in wordslab-notebooks base_url:str='http://localhost:11434', api_key:Optional=None, # If not provided, the optional key will be pulled from WordslabEnv):
Helper class that provides a standard way to create an ABC using inheritance.
ollama: loading model qwen3:30b with context size 65000 ... ok
prompt ="In one sentence: why is the sky blue?"oclient(user_prompt=prompt, think=True, max_new_tokens=1000, seed=42, temperature=2)
[Thinking] … thought in 205 words
Sunlight scatters in Earth’s atmosphere, with shorter blue wavelengths scattering more effectively than other colors, causing the sky to appear blue.
system ="Talk like a pirate"prompt ="In one sentence: why is the sky blue?"oclient(system_prompt=system, user_prompt=prompt)
[Thinking] … thought in 264 words
Arrr! The sun’s light scatters in the air, and the short blue waves bounce all about, making the sky shine like a doubloon!
prompt ="In one sentence: why is the sky blue?"assistant ="Once upon a time "oclient(user_prompt=prompt, assistant_prefill=assistant)
Once upon a time 200 years ago, a scientist named Rayleigh discovered that the sky is blue because of a phenomenon called Rayleigh scattering, where the atmosphere scatters shorter blue wavelengths of sunlight more than longer red wavelengths, making the sky appear blue during the day.
prompt ="Using only the provided tools to make no mistake, what is (11545468+78782431)*418742?"tools = Tools([add, multiply])oclient(user_prompt=prompt, tools=tools, think=True)
[Thinking] … thought in 2925 words
[Tool call] … add returned 90327899
[Tool call] … multiply returned 37824085083058
[Thinking] … thought in 116 words
The result of \((11545468 + 78782431) \times 418742\) is 37824085083058.
class Book:def__init__(self, title: str, pages: int):self.title = titleself.pages = pagesdef__repr__(self):returnf"Book Title : {self.title}\nNumber of Pages : {self.pages}"book = Book("War and Peace", 950)book
Book Title : War and Peace
Number of Pages : 950
def find_page(book: Book, # The book to find the halfway point of percent: int, # Percent of a book to read to, e.g. halfway == 50, ) ->int:"The page number corresponding to `percent` completion of a book"returnround(book.pages * (percent /100.0))find_page(book, 50)
475
prompt ="Using only the provided tools to make no mistake, how many pages do I have to read to get halfway through my 950 page copy of War and Peace"tools = Tools([find_page])oclient(user_prompt=prompt, tools=tools, think=True)
[Thinking] … thought in 285 words
[Tool call] … find_page returned 475
[Thinking] … thought in 94 words
To reach the halfway point of your 950-page copy of War and Peace, you need to read 475 pages.
Images
model ="devstral-small-2:24b"
oclient = OllamaModelClient(model)
ollama: loading model devstral-small-2:24b with context size 32768 ... ok
prompt ="Describe this picture in a structured way"images = Images("puppy.jpg")oclient(user_prompt=prompt, user_images=images)
Here is a structured description of the image:
Subject:
A young Golden Retriever puppy.
Appearance:
Coat: Light golden fur, slightly wavy and fluffy.
Eyes: Dark, round, and expressive.
Ears: Floppy and medium-sized.
Mouth: Open slightly, showing a playful expression.
Body: Compact and sturdy, typical of a puppy.
Action:
The puppy appears to be in motion, possibly running or playing.
Front paws are lifted off the ground, suggesting movement.
Background:
Outdoor setting with a grassy field.
Blurred background, indicating focus on the puppy.
Lighting and Mood:
Natural daylight, likely during the day.
Warm and cheerful atmosphere, enhanced by the puppy’s joyful expression.
Additional Details:
A leash is visible around the puppy’s neck, suggesting it might be on a walk or playtime.
The grass is slightly tall and appears well-maintained.
This structured description captures the key elements of the image in a clear and organized manner.
prompt ="Describe both images in a short paragraph"images.add_web_url("https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09adff6c2e6fbc5fd0dfa.jpg")oclient(user_prompt=prompt, user_images=images)
The first image shows a golden retriever puppy standing in a grassy field, looking directly at the camera with a happy expression. The puppy has a fluffy coat and appears to be in motion, possibly running or playing. The second image features the same golden retriever puppy sitting in a grassy field during sunset, with the sun visible in the background. The puppy is looking up and to the side, also with a joyful expression, and its fur is illuminated by the warm, golden light of the setting sun. Both images capture the playful and cheerful nature of the puppy in different settings.
Structured outputs
class Pet(BaseModel): model_config = ConfigDict(extra="forbid") name: str animal: str age: int color: str|Noneclass PetList(BaseModel): model_config = ConfigDict(extra="forbid") pets: list[Pet]
prompt ="I have two cats named Luna and Loki, Luna is 2 years old an yellow, Loki is 2 years older and the same color as the sky"oclient(user_prompt=prompt, output_model=PetList)oclient.response
model = env.default_model_codeoclient = OllamaModelClient(model, context_size=65000)
ollama: loading model qwen3:30b with context size 65000 ... ok
prompt ="what are the features in the latest github relase of ollama"oclient(user_prompt=prompt, think=True, web_search=True)
[Thinking] … thought in 357 words
[Tool call] … web_search returned results=[WebSearchResult(content='Releases ·oll...
[Thinking] … thought in 585 words
The latest GitHub release of Ollama is v0.14.0-rc2 (pre-release), which includes the following key features:
New Features:
Experimental CLI:
ollama run --experimental now includes an agent loop and the bash tool for interactive workflows.
Anthropic API Compatibility:
Support for the /v1/messages API endpoint (compatible with Anthropic models).
Model Version Requirements:
New REQUIRES command in Modelfile to declare required Ollama version for a model.
VRAM Improvements:
Fixes for integer underflow on low VRAM systems during memory estimation.
More accurate VRAM measurements for AMD iGPUs.
App Enhancements:
Ollama’s app now highlights Swift source code.
Improved error handling for embeddings returning NaN or -Inf.
Linux Install Optimization:
Linux install bundles now use zst compression (smaller downloads).
Experimental Image Generation:
New support for image generation models via MLX (experimental).
Notes:
Stable Release: The latest stable release is v0.13.5 (Dec 18, 2025), which added support for bert architecture models, DeepSeek-V3.1 tool parsing, and Google’s FunctionGemma model.
Pre-release: v0.14.0-rc2 is the most recent release candidate, but it is not yet finalized (marked as “Pre-release” on GitHub).
prompt ="read https://docs.ollama.com/capabilities/web-search and summarize in one sentence what tools i can use to implement a search agent with ollama"oclient(user_prompt=prompt, think=True, web_search=True)
Ollama provides the web_search and web_fetch APIs as core tools to implement a search agent, enabling model-based web queries and page content retrieval for accurate, up-to-date information.
model ="google/gemini-3-flash-preview"messages = [{'role': 'user', 'content': 'What is the smallest number palindrome greater than 130?'}]stream = client.chat.completions.create(model=model, messages=messages, stream=True, extra_body={"reasoning": {"enabled": True}})for chunk in stream: delta = chunk.choices[0].deltaprint(delta)
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Identifying a Solution**\n\nI've homed in on the core challenge: pinpointing the smallest palindrome exceeding 130. The constraints are clear, and I'm strategizing how to efficiently generate and validate candidate numbers. I am starting by looking at the numbers directly after 130.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Identifying a Solution**\n\nI've homed in on the core challenge: pinpointing the smallest palindrome exceeding 130. The constraints are clear, and I'm strategizing how to efficiently generate and validate candidate numbers. I am starting by looking at the numbers directly after 130.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Determining the Answer**\n\nI've directly confirmed that 131 fulfills all criteria. No need to look further: it's the smallest palindrome that's larger than 130. I have confirmed that this number satisfies the relevant constraints, and I consider this the final result.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Determining the Answer**\n\nI've directly confirmed that 131 fulfills all criteria. No need to look further: it's the smallest palindrome that's larger than 130. I have confirmed that this number satisfies the relevant constraints, and I consider this the final result.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='The smallest palindrome number greater than 130 is **131**.\n\nA palindrome is a number that reads the same forwards and backwards. Since 131 reads as "1-3-1" in both directions and is the very next integer after 130 that follows this rule, it is the', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=None, reasoning_details=[], annotations=[])
ChoiceDelta(content=' correct answer.', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=None, reasoning_details=[{'index': 0, 'type': 'reasoning.encrypted', 'data': 'CiIBjz1rXx1wDxrqK4iEiYLEfo4BC0TbRcSKhvVO48q1Ge9HCmgBjz1rX93sCp42z//JmKxDWvv3kYw3fkxVXSsbAkJ9/OsR5GxzvX3NRxq1GmWU6PzNix6QQ9aFfZmZqEwc3u/anTQiQwCLH+N77Rh7vLxAvr0VCSVLJ83rqljqFi93fMyjr0pjX9OE5gpZAY89a1/PpcHwJxd6EjqwLvSFlebK6nVnQDn90G86P3/tRXy2sZxYogVEh44KsQZ4r+0B/2ClLCNJf+EvIBCEs6zXHTMLx6AJIDU6SumITgIppnPdYWwoUuoKeAGPPWtfeJ7lICOOm3dtb2YzhnJiSGPI5m96K1G/2lwo04wSze+2vbS0FGKQnAYkOR21tRqDV+JGOKon/MdNhKTfjVd+TNPUfcQCNQh+dmlVajIJRyql4sYeYSg6Wz+AA0A0dNv6GWIKaXmKCBS087b+FUT1ZMBuSgpgAY89a1+ZR6BSJsZTg5YUGg11YzeDIsaqhs+FvHy/1XB4uPp3SW+mqIfysxta4e4y7oeOuTNButgOKnDeaJfPG8VSjSh0lT0R1dHEifHAIFbiF2IKQJXW07ns6L1E9syYCn8Bjz1rXwn3gb5Q8D+qPic5yZTziFidiTxpKH3uhjyYqTrZR8hRGnmVVk05a8E/5J81UkxOevZ5yAxLiFCdimXI5yr7LriL4bDiHajdSXVxdOiKHzb6Pqx3MiLySYUt1ToD3XUAIZQBmfDVw+Qd6SQdagvsi86NKMv793xrUMlECroBAY89a18BugGYHx3fQYckccMCOS91fxOFH7hBj24O746sJhbrBMLmQSPk0du421Zmd8Lx1Ns21/SwpHCfnQ3AEA9BZ9XfahR51tE96d9DjXO7kMsgGDPoLsDApRTnRenQbPJnpXqoQYZbsnE/H1B6Tb4wcE5xJpcBKLRvXhg+c14T5NYpFEcE3dHaNUH6xXj68ZLBl2Q5FtzqOd6tyxSLRhQJfa00yDFnpn+Lbo0zLSRqbP+ovAWIq79BCpMBAY89a1/ENqGNTYpM7VJzpDZus8SoUClqlwaOwrOOHYfz6NaB3CP3coADgPnltU0hZrGfE0w4xHdM85pJaT/2HHv9GQH2o2R4nAGwT3KInxBeAVo0TtE2M9aN78wDhveiyqPUsHO1uybcRorF3uUwEIAT8NX93ykz3FwcVZvO/aYtE9A7mVN4qYgXfjvIFwIBjFJs', 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None)
messages = [{'role': 'user', 'content': "Using only the provided tools to make no mistake, what is (11545468+78782431)*418742?"}]tools = Tools([add, multiply])stream = client.chat.completions.create(model=model, messages=messages, tools = tools.get_schemas(), stream=True, extra_body={"reasoning": {"enabled": True}})for chunk in stream: delta = chunk.choices[0].deltaprint(delta)
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Calculating the Total Sum**\n\nI'm currently focused on the first step: summing the initial numbers. I've successfully employed the `add` function, and the intermediate result is now readily available. It's a significant figure, and I'm ready to proceed to the next stage after a brief review.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Calculating the Total Sum**\n\nI'm currently focused on the first step: summing the initial numbers. I've successfully employed the `add` function, and the intermediate result is now readily available. It's a significant figure, and I'm ready to proceed to the next stage after a brief review.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Initiating Multiplication Operations**\n\nI've got the total sum from the previous stage, a substantial number. Now, the plan is to multiply this sum by 418,742. I'm preparing to invoke the `multiply` function, and I'm ready to observe the final result soon.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Initiating Multiplication Operations**\n\nI've got the total sum from the previous stage, a substantial number. Now, the plan is to multiply this sum by 418,742. I'm preparing to invoke the `multiply` function, and I'm ready to observe the final result soon.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=[ChoiceDeltaToolCall(index=0, id='tool_add_bxj7qiqYR0YykpEmfzoC', function=ChoiceDeltaToolCallFunction(arguments='{"b":78782431,"a":11545468}', name='add'), type='function')], reasoning=None, reasoning_details=[{'index': 0, 'id': 'tool_add_bxj7qiqYR0YykpEmfzoC', 'type': 'reasoning.encrypted', 'data': 'CiQBjz1rX1bExybTM6VtK0kASX6FTxfBBwyi932MRRrEuYtvkdsKUQGPPWtfr55kYc/lU0ZhVkHjN6wZ74W1LVMPXkSF3tFEoDwQtfzpuzYnV7xzD1qz5CnzNvZm5i9eY0qQKJhzhXAC3FTmjZ6D2OdODRvDX7uUGwo0AY89a1/zpxr22ksclwFiLTfaAWYr57nJPyB5mHc8h6PIE3uu5ggyUE4VsZoaqVxv2q03jQp6AY89a187jC4w08IR8/fDOrJwQEOLQO5yNEUlQphL8lckJvAltj2ULGXY06WBV8yi1n3YGUZEAgLU4ihc/o8+/nVuk+OnKvicbju9XY82ZeP35D8yFVPEFMyHifVptjjPry8rCpr4N84UipqVfc0rmtFPBbE6+6TTFIUKaQGPPWtf6XkgaitG1kvzHAeW+I55A2dWbhyG/8Z15RjCkArCtKMEEf1r9v7G0ke7kNuIW8jhHF5H/wm/20m69zrsCp1obV+jbZR8T6lb3Km4EXJOldkm/U1lsHZRI8Lp2mkzreV1QbkqHwo2AY89a1/PJ5AClPojS2Y4jbD3oN8tblRCmUNTysi69fLd/4tqapYHz4CLAo+LMhFvzLN9N4qZ', 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None)
Openrouter API reference
Note: as of January 2026 - the OpenAI-compatible Responses API (Beta) is in beta stage and may have breaking changes. Use with caution in production environments.
=> we will use the competions API for now
https://openrouter.ai/docs/api/reference/overview
REQUEST SCHEMA
// Definitions of subtypes are belowtype Request = {// Either "messages" or "prompt" is required messages?: Message[]; prompt?:string;// If "model" is unspecified, uses the user's default model?:string;// See "Supported Models" section// Allows to force the model to produce specific output format.// See models page and note on this docs page for which models support it. response_format?: { type:'json_object' }; stop?:string|string[]; stream?:boolean;// Enable streaming// See LLM Parameters (openrouter.ai/docs/api/reference/parameters) max_tokens?:number;// Range: [1, context_length) temperature?:number;// Range: [0, 2]// Tool calling// Will be passed down as-is for providers implementing OpenAI's interface.// For providers with custom interfaces, we transform and map the properties.// Otherwise, we transform the tools into a YAML template. The model responds with an assistant message.// See models supporting tool calling: openrouter.ai/models?supported_parameters=tools tools?: Tool[]; tool_choice?: ToolChoice;// Advanced optional parameters seed?:number;// Integer only top_p?:number;// Range: (0, 1] top_k?:number;// Range: [1, Infinity) Not available for OpenAI models frequency_penalty?:number;// Range: [-2, 2] presence_penalty?:number;// Range: [-2, 2] repetition_penalty?:number;// Range: (0, 2] logit_bias?: { [key:number]:number }; top_logprobs:number;// Integer only min_p?:number;// Range: [0, 1] top_a?:number;// Range: [0, 1]// Reduce latency by providing the model with a predicted output// https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs prediction?: { type:'content'; content:string };// OpenRouter-only parameters// See "Prompt Transforms" section: openrouter.ai/docs/guides/features/message-transforms transforms?:string[];// See "Model Routing" section: openrouter.ai/docs/guides/features/model-routing models?:string[]; route?:'fallback';// See "Provider Routing" section: openrouter.ai/docs/guides/routing/provider-selection provider?: ProviderPreferences; user?:string;// A stable identifier for your end-users. Used to help detect and prevent abuse.// Debug options (streaming only) debug?: { echo_upstream_body?:boolean;// If true, returns the transformed request body sent to the provider };};// Subtypes:type TextContent = { type:'text'; text:string;};type ImageContentPart = { type:'image_url'; image_url: { url:string;// URL or base64 encoded image data detail?:string;// Optional, defaults to "auto" };};type ContentPart = TextContent | ImageContentPart;type Message =| { role:'user'|'assistant'|'system';// ContentParts are only for the "user" role: content:string| ContentPart[];// If "name" is included, it will be prepended like this// for non-OpenAI models: `{name}: {content}` name?:string; }| { role:'tool'; content:string; tool_call_id:string; name?:string; };type FunctionDescription = { description?:string; name:string; parameters:object;// JSON Schema object};type Tool = { type:'function';function: FunctionDescription;};type ToolChoice =|'none'|'auto'| { type:'function';function: { name:string; };
RESPONSE SCHEMA
// Definitions of subtypes are belowtype Response = { id:string;// Depending on whether you set "stream" to "true" and// whether you passed in "messages" or a "prompt", you// will get a different output shape choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[]; created:number;// Unix timestamp model:string; object:'chat.completion'|'chat.completion.chunk'; system_fingerprint?:string;// Only present if the provider supports it// Usage data is always returned for non-streaming.// When streaming, you will get one usage object at// the end accompanied by an empty choices array. usage?: ResponseUsage;};// If the provider returns usage, we pass it down// as-is. Otherwise, we count using the GPT-4 tokenizer.type ResponseUsage = {/** Including images and tools if any */ prompt_tokens:number;/** The tokens generated */ completion_tokens:number;/** Sum of the above two fields */ total_tokens:number;};// Subtypes:type NonChatChoice = { finish_reason:string|null; text:string; error?: ErrorResponse;};type NonStreamingChoice = { finish_reason:string|null; native_finish_reason:string|null; message: { content:string|null; role:string; tool_calls?: ToolCall[]; }; error?: ErrorResponse;};type StreamingChoice = { finish_reason:string|null; native_finish_reason:string|null; delta: { content:string|null; role?:string; tool_calls?: ToolCall[]; }; error?: ErrorResponse;};type ErrorResponse = { code:number;// See "Error Handling" section message:string; metadata?:Record<string,unknown>;// Contains additional error information such as provider details, the raw error message, etc.};type ToolCall = { id:string; type:'function';function: FunctionCall;};
OpenRouterModelClient
def OpenRouterModelClient( model:str, context_size:Optional=None, # For OpenRouter this parameter is ignored, we inherit the remote model config base_url:str='https://openrouter.ai/api/v1', api_key:Optional=None, # If not provided, the mandatory key will be pulled from WordslabEnv):
Helper class that provides a standard way to create an ABC using inheritance.
model ="anthropic/claude-sonnet-4.5"orclient = OpenRouterModelClient(model)
openrouter: testing model anthropic/claude-sonnet-4.5 ... ok
prompt ='Why is the sky blue?'orclient(user_prompt=prompt)
The sky is blue due to a phenomenon called Rayleigh scattering.
Here’s how it works:
Sunlight contains all colors - White sunlight is actually made up of all the colors of the rainbow (different wavelengths of light).
Light interacts with the atmosphere - When sunlight enters Earth’s atmosphere, it collides with gas molecules (mainly nitrogen and oxygen).
Blue light scatters more - Shorter wavelengths (blue and violet) scatter much more easily than longer wavelengths (red and orange). Blue light gets scattered in all directions throughout the sky.
Why not violet? - Violet scatters even more than blue, but our eyes are more sensitive to blue, and some violet light is absorbed by the upper atmosphere, so we perceive the sky as blue.
At sunset/sunrise, the sky turns red/orange because sunlight travels through more atmosphere to reach your eyes, scattering away most of the blue light and leaving the longer red and orange wavelengths visible.
prompt ='Why is the sky blue?'orclient(system_prompt="Talk like a pirate", user_prompt=prompt)
Arrr, ye be askin’ a fine question there, matey!
The sky be blue because of a bit o’ science called “Rayleigh scatterin’,” savvy? When the sun’s light comes sailin’ through our atmosphere, it be carryin’ all the colors o’ the rainbow mixed together, aye.
Now here be the trick - the tiny molecules in the air be like little scallywags that scatter the light in all directions. But the blue light, bein’ shorter and choppier like waves in a storm, gets scattered MORE than the other colors. Red and orange light? Those long wavelengths sail right on through like a ship with the wind at its back!
So when ye look up at the heavens, yer eyes be seein’ all that scattered blue light bouncin’ around the sky from every direction. That be why the whole sky looks blue instead of just where the sun be!
At sunset though, arrr, the light travels through more atmosphere - like a longer voyage across the seas - and all that blue gets scattered away completely, leavin’ only the reds and oranges to paint the sky. Beautiful as a Caribbean sunset, it is!
Tips tricorn hat
Any more questions for this old sea dog? ⚓
prompt ='Why is the sky blue?'orclient(user_prompt=prompt, assistant_prefill="Once upon a time ")
Once upon a time in the kingdom of Light, photons embarked on a journey from the Sun to Earth. As they traveled through the atmosphere, they encountered tiny molecules of nitrogen and oxygen.
These molecules were much smaller than the wavelengths of visible light, which caused something magical called Rayleigh scattering. This type of scattering has a special property: it scatters shorter wavelengths (blue and violet light) much more effectively than longer wavelengths (red and orange light) — specifically, shorter wavelengths scatter about 10 times more!
Here’s what happens:
🔵 Blue light gets scattered in all directions by air molecules 🟣 Violet light scatters even more, but our eyes are less sensitive to it 🔴 Red/orange light passes through with less scattering
So when you look up at the sky (away from the Sun), you’re seeing all that scattered blue light coming from every direction. This creates the beautiful blue canopy above us.
Fun fact: At sunrise and sunset, sunlight travels through more atmosphere to reach your eyes. Most of the blue light gets scattered away before it reaches you, leaving the warm reds and oranges — creating those stunning twilight colors! 🌅
prompt ='What is the smallest number palindrome greater than 130?'orclient(user_prompt=prompt, think=1024, max_new_tokens=2000, seed=42, temperature=0.7)
[Thinking] … thought in 92 words
Looking at numbers greater than 130:
131: reads as 1-3-1 → This is the same forwards and backwards ✓
131 is the smallest palindrome greater than 130.
prompt ="Using only the provided tools to make no mistake, what is (11545468+78782431)*418742?"tools = Tools([add, multiply])orclient(user_prompt=prompt, tools=tools, think=2014)
[Thinking] … thought in 35 words
I’ll solve this step by step using the provided tools.
First, let me add 11545468 and 78782431:
[Tool call] … add returned 90327899
Now let me multiply the result by 418742:
[Tool call] … multiply returned 37824085083058
The answer is 37,824,085,083,058.
Images
model ="openai/gpt-5.2"orclient = OpenRouterModelClient(model)
openrouter: testing model openai/gpt-5.2 ... ok
prompt ="Describe this picture in a structured way"images = Images("puppy.jpg")orclient(user_prompt=prompt, user_images=images)
Structured Description of the Image
1) Overview
Scene type: Outdoor animal portrait/action shot
Main subject: A light-colored puppy (appears to be a golden retriever-type) running toward the camera
Setting: Open grassy field with a softly blurred background
2) Subject Details
Animal: Young dog/puppy
Fur color & texture: Cream to pale golden, fluffy coat
Face: Dark eyes, black nose, mouth open with tongue visible (panting/happy expression)
Ears: Floppy, slightly darker golden tone than the face
Accessories: Thin collar/strap visible around the neck area
3) Action & Pose
Motion: Running forward toward the viewer
Body position: Front paw lifted mid-step; posture suggests energetic movement
Overall tone: Warm and gentle, with a natural outdoor feel
7) Mood / Impression
Mood: Joyful, lively, friendly
Implied context: A puppy playing or running freely in a field
prompt ="Describe both images in a short paragraph"images.add_web_url("https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09adff6c2e6fbc5fd0dfa.jpg")orclient(user_prompt=prompt, user_images=images)
Both images feature a fluffy golden retriever puppy outdoors in a grassy field. In the first image, the puppy is running toward the camera with its mouth open and tongue out, looking playful and energetic against a softly blurred green background. In the second image, the puppy is sitting calmly in warm golden-hour light, gazing upward with a relaxed expression while the sun hangs low in an orange sky behind it.
Structured outputs
prompt ="I have two cats named Luna and Loki, Luna is 2 years old an yellow, Loki is 2 years older and the same color as the sky"orclient(user_prompt=prompt, output_model=PetList)orclient.response
If you tell me whether you’re on macOS/Windows/Linux (and whether you want stable only), I can highlight which of these changes you’ll actually notice day-to-day.
Models providers
Design concepts
User centric workflow
identify your self-hosted inference or inference as a service options
understand your task type, properties, privacy needs and scale
find the best model for your task, given your constraints
prepare and start your self hosted inference or connect to your inference as a service provider
monitor your resource usage and cost
Self-hosted inference or inference as a service
Model families - architecture name - parameter size - training type: base / instruct / thinking - version: relase date - quantization
Model constraints - model capabilities - modalities in/out - context length - instruction - thinking - tools - model usage - prompt template and special tokens - languages supported - recommended use cases - prompting guidelines - model license - use case restrictions - commercial usage restrictions - outputs usage restrictions - model transparency
Self-hosted inference constraints - model requirements - size on disk -> download time / load time in vram - size in vram -> max context length / num parallel sequence - tensor flops -> input tokens/sec - memory bandwidth -> output tokens/sec - inference machine constraints - download speed - disk size and speed - GPU vram, memory bandwidth, tensor flops - rented machine constraints - GPU availability - price when you use per GPU - price when you don’t use per GB (storage)
Inference as a service constraints - router constraints - … same as provider constraints below … - provider constraints - terms of service - privacy options - inference quotas - service availability - per model provider constraints - model capabilities exposed - input/output tokens cost - input/output tokens/sec
As of december 2025, there is no API to get the ollama catalog of models, web scraping is the only solution.
import httpximport refrom html import unescapedef updated_to_months(updated):""" Convert strings like: "1 year ago", "2 years ago", "1 month ago", "3 weeks ago", "7 days ago", "yesterday", "4 hours ago" into integer months. """ifnot updated:returnNone updated = updated.lower().strip()# handle 'yesterday' explicitlyif updated =="yesterday":return0# years → months m = re.match(r'(\d+)\s+year', updated)if m: years =int(m.group(1))return years *12# months m = re.match(r'(\d+)\s+month', updated)if m:returnint(m.group(1))# weeks m = re.match(r'(\d+)\s+week', updated)if m: weeks =int(m.group(1))returnmax(0, weeks //4)# days m = re.match(r'(\d+)\s+day', updated)if m:return0# hours / minutes / seconds → treat as < 1 monthifany(unit in updated for unit in ["hour", "minute", "second"]):return0returnNonedef pulls_to_int(pulls_str):""" Convert a pulls string like: '5M', '655.8K', '49K', '73.7M', '957.4K', '27.7M' into an integer. """ifnot pulls_str:returnNone pulls_str = pulls_str.strip().upper() match = re.match(r'([\d,.]+)\s*([KM]?)', pulls_str)ifnot match:returnNone number, suffix = match.groups()# Remove commas and convert to float number =float(number.replace(',', ''))if suffix =='M': number *=1_000_000elif suffix =='K': number *=1_000returnint(number)def parse_model_list_regex(html): models = []# --- Extract each <li x-test-model>...</li> block --- li_blocks = re.findall(r'<li[^>]*x-test-model[^>]*>(.*?)</li>', html, flags=re.DOTALL )for block in li_blocks:# name from <a href="/library/..."> name =None m = re.search(r'href="/library/([^"]+)"', block)if m: name = m.group(1)# description <p class="max-w-lg ...">...</p> description ="" m = re.search(r'<p[^>]*text-neutral-800[^>]*>(.*?)</p>', block, flags=re.DOTALL )if m: description = re.sub(r'<.*?>', '', m.group(1)).strip() description = unescape(description)# capabilities (x-test-capability) capabilities = re.findall(r'<span[^>]*x-test-capability[^>]*>(.*?)</span>', block, flags=re.DOTALL ) capabilities = [c.strip() for c in capabilities]# check for the special 'cloud' span cloud =Falseif re.search(r'<span[^>]*>cloud</span>', block, flags=re.DOTALL ): cloud =True# sizes (x-test-size) sizes = re.findall(r'<span[^>]*x-test-size[^>]*>(.*?)</span>', block, flags=re.DOTALL ) sizes = [s.strip() for s in sizes]# pulls <span x-test-pull-count>5M</span> pulls =None m = re.search(r'<span[^>]*x-test-pull-count[^>]*>(.*?)</span>', block )if m: pulls = m.group(1).strip()# tag count <span x-test-tag-count>5</span> tag_count =None m = re.search(r'<span[^>]*x-test-tag-count[^>]*>(.*?)</span>', block )if m: tag_count = m.group(1).strip()# updated text <span x-test-updated>...</span> updated =None m = re.search(r'<span[^>]*x-test-updated[^>]*>(.*?)</span>', block )if m: updated = m.group(1).strip() models.append({"name": name,"description": description,"capabilities": capabilities,"cloud": cloud,"sizes": sizes,"pulls": pulls_to_int(pulls),"tag_count": int(tag_count),"updated_months": updated_to_months(updated),"url": f"https://ollama.com/library/{name}"if name elseNone })return models def list_models(contains=None):""" Extract model names and properties from https://ollama.com/library Optionally filter by substring. """ html = httpx.get("https://ollama.com/library").text models = parse_model_list_regex(html)if contains: models = [ m for m in modelsif contains.lower() in m["name"].lower() ] models =sorted(models, key=lambda m:m["name"])return modelsdef list_recent_models_from_family(familyfilter):return [f"{m['name']}{m['capabilities'] iflen(m['capabilities'])>0else''}{m['sizes'] iflen(m['sizes'])>0else''}{' [cloud]'if m['cloud'] else''}"for m in list_models(familyfilter) if m["updated_months"] isnotNoneand m["updated_months"]<12]def list_tags(model):""" Extract valid quantized tags only, without HTML noise, and apply the same exclusions as original greps. """ html = httpx.get(f"https://ollama.com/library/{model}/tags").text# Capture ONLY the tag part after model:..., e.g. 3b-instruct-q4_K_M raw_tags = re.findall(rf'{re.escape(model)}:([A-Za-z0-9._-]*q[A-Za-z0-9._-]*)', html )# Re-add full prefix model:<tag> tags = [f"{model}:{t}"for t in raw_tags]# Exclude text|base|fp|q4_[01]|q5_[01] tags = [ t for t in tagsifnot re.search(r'(text|base|fp|q[45]_[01])', t) ]# Deduplicatereturnset(tags)
list_models()[:5]
[{'name': 'gpt-oss',
'description': 'OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.',
'capabilities': ['tools', 'thinking'],
'cloud': True,
'sizes': ['20b', '120b'],
'pulls': 5000000,
'tag_count': 5,
'updated_months': 1,
'url': 'https://ollama.com/library/gpt-oss'},
{'name': 'qwen3-vl',
'description': 'The most powerful vision-language model in the Qwen model family to date.',
'capabilities': ['vision', 'tools'],
'cloud': True,
'sizes': ['2b', '4b', '8b', '30b', '32b', '235b'],
'pulls': 656300,
'tag_count': 59,
'updated_months': 1,
'url': 'https://ollama.com/library/qwen3-vl'},
{'name': 'ministral-3',
'description': 'The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.',
'capabilities': ['vision', 'tools'],
'cloud': True,
'sizes': ['3b', '8b', '14b'],
'pulls': 49100,
'tag_count': 16,
'updated_months': 0,
'url': 'https://ollama.com/library/ministral-3'},
{'name': 'deepseek-r1',
'description': 'DeepSeek-R1 is a family of open reasoning models with performance approaching that of leading models, such as O3 and Gemini 2.5 Pro.',
'capabilities': ['tools', 'thinking'],
'cloud': False,
'sizes': ['1.5b', '7b', '8b', '14b', '32b', '70b', '671b'],
'pulls': 73700000,
'tag_count': 35,
'updated_months': 5,
'url': 'https://ollama.com/library/deepseek-r1'},
{'name': 'qwen3-coder',
'description': "Alibaba's performant long context models for agentic and coding tasks.",
'capabilities': ['tools'],
'cloud': True,
'sizes': ['30b', '480b'],
'pulls': 958100,
'tag_count': 10,
'updated_months': 2,
'url': 'https://ollama.com/library/qwen3-coder'}]
Certain endpoints stream responses as JSON objects. Streaming can be disabled by providing {“stream”: false} for these endpoints.
Structured outputs
Structured outputs are supported by providing a JSON schema in the format parameter. The model will generate a response that matches the schema. See the structured outputs example below.
JSON mode
Enable JSON mode by setting the format parameter to json. This will structure the response as a valid JSON object.
Parameters - model: (required) the model name - prompt: the prompt to generate a response for - suffix: the text after the model response - images: (optional) a list of base64-encoded images (for multimodal models such as llava) - think: (for thinking models) should the model think before responding?
Advanced parameters (optional): - format: the format to return a response in. Format can be json or a JSON schema - options: additional model parameters listed in the documentation for the Modelfile such as temperature - system: system message to (overrides what is defined in the Modelfile) - template: the prompt template to use (overrides what is defined in the Modelfile) - stream: if false the response will be returned as a single response object, rather than a stream of objects - raw: if true no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your request to the API - keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)
The final response in the stream also includes additional data about the generation: - total_duration: time spent generating the response - load_duration: time spent in nanoseconds loading the model - prompt_eval_count: number of tokens in the prompt - prompt_eval_duration: time spent in nanoseconds evaluating the prompt - eval_count: number of tokens in the response - eval_duration: time in nanoseconds spent generating the response - response: empty if the response was streamed, if not streamed, this will contain the full response
A response can be received in one reply when streaming is off.
To calculate how fast the response is generated in tokens per second (token/s), divide eval_count / eval_duration * 10^9.
Images
To submit images to multimodal models, provide a list of base64-encoded images:
ollama.generate(model='gemma3', prompt='Why is the sky blue?')
ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')
ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
ollama.ps()
ProcessResponse(models=[])
ollama.web_search??
Signature: ollama.web_search(query: str, max_results: int = 3) -> ollama._types.WebSearchResponse
Source:def web_search(self, query: str, max_results: int = 3) -> WebSearchResponse:
""" Performs a web search Args: query: The query to search for max_results: The maximum number of results to return (default: 3) Returns: WebSearchResponse with the search results Raises: ValueError: If OLLAMA_API_KEY environment variable is not set """ifnot self._client.headers.get('authorization', '').startswith('Bearer '):
raise ValueError('Authorization header with Bearer token is required for web search')
return self._request(
WebSearchResponse,
'POST',
'https://ollama.com/api/web_search',
json=WebSearchRequest(
query=query,
max_results=max_results,
).model_dump(exclude_none=True),
)
File: /home/workspace/wordslab-notebooks-lib/.venv/lib/python3.12/site-packages/ollama/_client.py
Type: method
ollama.web_fetch??
Signature: ollama.web_fetch(url: str) -> ollama._types.WebFetchResponse
Source:def web_fetch(self, url: str) -> WebFetchResponse:
""" Fetches the content of a web page for the provided URL. Args: url: The URL to fetch Returns: WebFetchResponse with the fetched result """ifnot self._client.headers.get('authorization', '').startswith('Bearer '):
raise ValueError('Authorization header with Bearer token is required for web fetch')
return self._request(
WebFetchResponse,
'POST',
'https://ollama.com/api/web_fetch',
json=WebFetchRequest(
url=url,
).model_dump(exclude_none=True),
)
File: /home/workspace/wordslab-notebooks-lib/.venv/lib/python3.12/site-packages/ollama/_client.py
Type: method