wordslab-notebooks-lib.chat

Chat with local and remote LLMs in the context of the wordslab-notebooks environment

Observable conversation turns

refresh_notebook_display


def refresh_notebook_display(
    hide_thinking:bool=True, hide_tool_calls:bool=True
):

ChatTurns


def ChatTurns(
    refresh_display:Callable=None
):

Initialize self. See help(type(self)) for accurate signature.

ChatTurn


def ChatTurn(
    refresh_display:Callable=None
):

Initialize self. See help(type(self)) for accurate signature.

turns = ChatTurns(refresh_notebook_display(hide_thinking=False, hide_tool_calls=False))

turn = turns.new_turn()
turn.append_thinking("I think a lot longer.\nIn sentences.\n\nWith line breaks.")
time.sleep(0.5)
turn.append_content("I need to call 2 tools.")
time.sleep(0.5)
turn.append_tool_call("myfunc", {"param1": "value1", "param2": "value2"})
turn.start_tool_call("myfunc")
time.sleep(0.5)
turn.end_tool_call("myfunc", 17.43)
turn1 = turn

turn = turns.new_turn()
turn.append_thinking("Ok, I got the first result, now call the second tool.")
time.sleep(0.5)
turn.append_tool_call("myfunc2", {})
turn.start_tool_call("myfunc2")
time.sleep(0.5)
turn.end_tool_call("myfunc2", "The weather is nice today but clouds an wind are coming for tommorow and the rest of the week will be awful")

turn = turns.new_turn()
turn.append_thinking("Ok, I got the second result, now I can answer the question.")
time.sleep(0.5)
turn.append_content("This is the incredible result.")

[Thinking]

I think a lot longer. In sentences.

With line breaks.

I need to call 2 tools.

[Tool call] - model wants to call myfunc with parameters {'param1': 'value1', 'param2': 'value2'} - agent called myfunc at 22:02:46 - myfunc returned 17.43 in 0.503 sec

[Thinking]

Ok, I got the first result, now call the second tool.

[Tool call] - model wants to call myfunc2 with parameters {} - agent called myfunc2 at 22:02:47 - myfunc2 returned The weather is nice today but clouds an wind are coming for tommorow and the rest of the week wil... in 0.503 sec

[Thinking]

Ok, I got the second result, now I can answer the question.

This is the incredible result.

turn1.thinking, turn1.content

('I think a lot longer.\nIn sentences.\n\nWith line breaks.',
 'I need to call 2 tools.')

Markdown(turn1.to_markdown())

[Thinking] … thought in 10 words

I need to call 2 tools.

[Tool call] … myfunc returned 17.43

Markdown(turn1.to_markdown(hide_thinking=False, hide_tool_calls=False))

[Thinking]

I think a lot longer. In sentences.

With line breaks.

I need to call 2 tools.

[Tool call] - model wants to call myfunc with parameters {'param1': 'value1', 'param2': 'value2'} - agent called myfunc at 22:02:46 - myfunc returned 17.43 in 0.503 sec

Native tool calling

Use python functions as tools callable by Large Language Models.

The python functions must be fully documented: - type annotations are mandatory on all parameters and on the return type - a docstring after the function definition is mandatory, it should explain the return value - a descriptive comment after each parameter is also mandatory - the expected format is: one parameter by line, a traditional python comment at the end of the line

def add(a: int,  # The first number
        b: int   # The second number
       ) -> int: # The sum of the two numbers
  """Add two numbers"""
  return a + b


def multiply(a: int,  # The first number 
             b: int   # The second number
            ) -> int: # The product of the two numbers
  """Multiply two numbers"""
  return a * b

Tool description format for ollama API

Here is the code used to process the tools parameter:

for unprocessed_tool in tools or []:
    yield convert_function_to_tool(unprocessed_tool) if callable(unprocessed_tool) else Tool.model_validate(unprocessed_tool)

So we can pass either a list of pyhton functions or a list of dictionaries conforming to a specific tool schema.

Here are the expectations for the python functions documentation:

def convert_function_to_tool(func: Callable) -> Tool:
 
  -> def _parse_docstring(doc_string: Union[str, None]) -> dict[str, str]:
  ...
  for line in doc_string.splitlines():
    ...
    if lowered_line.startswith('args:'):
      key = 'args'
    elif lowered_line.startswith(('returns:', 'yields:', 'raises:')):
      key = '_'
  ...
  for line in parsed_docstring['args'].splitlines():
    ...
    if ':' in line:
      # Split the line on either:
      # 1. A parenthetical expression like (integer) - captured in group 1
      # 2. A colon :
      # Followed by optional whitespace. Only split on first occurrence.
      ...

This is much less robust and readable than what toolslm.funccall.get_schema does, so we will preprocess the list of python functions ourselves.

Now let’s see what tool description schema is expected by ollama.

pydantic Tool.model_validate() accepts: - dict - Pydantic model instances - Objects with attributes (ORM-style, if configured)

Here is the ollama schema:

class Tool(SubscriptableBaseModel):
  type: Optional[str] = 'function'

  class Function(SubscriptableBaseModel):
    name: Optional[str] = None
    description: Optional[str] = None

    class Parameters(SubscriptableBaseModel):
      model_config = ConfigDict(populate_by_name=True)
      type: Optional[Literal['object']] = 'object'
      defs: Optional[Any] = Field(None, alias='$defs')
      items: Optional[Any] = None
      required: Optional[Sequence[str]] = None

So this is the schema expected by the openai completions API, as we will see below.

Tool description formats for the openai API

The legacy openai completions API: client.chat.completions.create(...) expects tools to be described in a json format that uses the “wrapped function” schema:

tool = {
  type: "function",
  function: {
    name,
    description,
    parameters
  }
}

This is the canonical format for Chat Completions and is what OpenAI examples historically used.

This format does not work for the new API: client.responses.create(...)

The Responses API uses a flattened tool schema.

{
  "type": "function",
  "name": "...",
  "description": "..",
  "parameters": {
    "type": "object",
    "properties": {
      "a": {"type": "integer"},
      "b": {"type": "integer"}
    },
    "required": ["a", "b"]
  }
}

If you pass your wrapped version (function: {…}) to responses.create, you’ll get a schema validation error.

Chat Completions treats tools as message-level actions → nested function
Responses API treats tools as first-class model capabilities → flattened schema
The Responses API also supports non-function tools (web search, file search, computer use), which drove the redesign

If you want maximum forward compatibility: - Use the flattened format - Even when working with Chat Completions, it’s easy to convert

get_tools_schemas_and_functions


def get_tools_schemas_and_functions(
    funcs:Sequence, responsesAPIFormat:bool=False
):

Get a dictionary of json schemas and callable functions which can be used for native tool calling.

get_tools_schemas_and_functions([add, multiply])

{'add': ({'type': 'function',
   'function': {'name': 'add',
    'description': 'Add two numbers\n\nReturns:\n- type: integer',
    'parameters': {'type': 'object',
     'properties': {'a': {'type': 'integer',
       'description': 'The first number'},
      'b': {'type': 'integer', 'description': 'The second number'}},
     'required': ['a', 'b']}}},
  <function __main__.add(a: int, b: int) -> int>),
 'multiply': ({'type': 'function',
   'function': {'name': 'multiply',
    'description': 'Multiply two numbers\n\nReturns:\n- type: integer',
    'parameters': {'type': 'object',
     'properties': {'a': {'type': 'integer',
       'description': 'The first number'},
      'b': {'type': 'integer', 'description': 'The second number'}},
     'required': ['a', 'b']}}},
  <function __main__.multiply(a: int, b: int) -> int>)}

get_tools_schemas_and_functions([add, multiply], responsesAPIFormat=True)

{'add': ({'type': 'function',
   'name': 'add',
   'description': 'Add two numbers\n\nReturns:\n- type: integer',
   'parameters': {'type': 'object',
    'properties': {'a': {'type': 'integer', 'description': 'The first number'},
     'b': {'type': 'integer', 'description': 'The second number'}},
    'required': ['a', 'b']}},
  <function __main__.add(a: int, b: int) -> int>),
 'multiply': ({'type': 'function',
   'name': 'multiply',
   'description': 'Multiply two numbers\n\nReturns:\n- type: integer',
   'parameters': {'type': 'object',
    'properties': {'a': {'type': 'integer', 'description': 'The first number'},
     'b': {'type': 'integer', 'description': 'The second number'}},
    'required': ['a', 'b']}},
  <function __main__.multiply(a: int, b: int) -> int>)}

Tools


def Tools(
    python_functions:Sequence, responsesAPIFormat:bool=False
):

“Execute tools implemented as python functions with Large Language Models. The python functions must be fully documented: - type annotations are mandatory on all parameters and on the return type - a docstring after the function definition is mandatory - a descriptive comment after each parameter and the return type is also mandatory - the expected format is: one parameter by line, a traditional python comment at the end of the line

ToolExecutionError


def ToolExecutionError(
    args:VAR_POSITIONAL, kwargs:VAR_KEYWORD
):

Raised when a tool cannot be executed safely.

tools = Tools([add,multiply])

tools.has_tool("add"), tools.has_tool("toto")

(True, False)

tools.get_schemas()

[{'type': 'function',
  'function': {'name': 'add',
   'description': 'Add two numbers\n\nReturns:\n- type: integer',
   'parameters': {'type': 'object',
    'properties': {'a': {'type': 'integer', 'description': 'The first number'},
     'b': {'type': 'integer', 'description': 'The second number'}},
    'required': ['a', 'b']}}},
 {'type': 'function',
  'function': {'name': 'multiply',
   'description': 'Multiply two numbers\n\nReturns:\n- type: integer',
   'parameters': {'type': 'object',
    'properties': {'a': {'type': 'integer', 'description': 'The first number'},
     'b': {'type': 'integer', 'description': 'The second number'}},
    'required': ['a', 'b']}}}]

tools.get_schema("add")

{'type': 'function',
 'function': {'name': 'add',
  'description': 'Add two numbers\n\nReturns:\n- type: integer',
  'parameters': {'type': 'object',
   'properties': {'a': {'type': 'integer', 'description': 'The first number'},
    'b': {'type': 'integer', 'description': 'The second number'}},
   'required': ['a', 'b']}}}

tools.get_functions()

[<function __main__.add(a: int, b: int) -> int>,
 <function __main__.multiply(a: int, b: int) -> int>]

tools.get_function("add")

<function __main__.add(a: int, b: int) -> int>

tools.call("add", {"a": 1, "b": 2})

Images


def Images(
    image:Union=None
):

Initialize self. See help(type(self)) for accurate signature.

images = Images("puppy.jpg")
for image in images.get_base64_data():
    print(image[:50])

/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQ

images = Images("https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09adff6c2e6fbc5fd0dfa.jpg")
for image in images.get_base64_data():
    print(image[:50])

/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAYEBQYFBAYGBQYHBw

for image,is_url in images.get_base64_data_or_url():
    print(image[:50], is_url)

https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09ad True

image_bytes = Path("puppy.jpg").read_bytes()
images = Images(image_bytes)
for image in images.get_base64_data():
    print(image[:50])

/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQ

images.add_web_url("https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09adff6c2e6fbc5fd0dfa.jpg")
for image,is_url in images.get_base64_data_or_url():
    print(image[:50], is_url)

/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQ False
https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09ad True

Model client

ModelClient


def ModelClient(
    model:str, context_size:Optional=None, base_url:Optional=None, api_key:Optional=None
):

Helper class that provides a standard way to create an ABC using inheritance.

ollama model client

OllamaModelClient


def OllamaModelClient(
    model:str, context_size:int=32768, # This is the default value for the ollama server in wordslab-notebooks
    base_url:str='http://localhost:11434',
    api_key:Optional=None, # If not provided, the optional key will be pulled from WordslabEnv
):

Helper class that provides a standard way to create an ABC using inheritance.

model = env.default_model_code
model

'qwen3:30b'

oclient = OllamaModelClient(model, context_size=65000)

ollama: loading model qwen3:30b with context size 65000 ... ok

prompt = "In one sentence: why is the sky blue?"
oclient(user_prompt=prompt, think=True, max_new_tokens=1000, seed=42, temperature=2)

[Thinking] … thought in 205 words

Sunlight scatters in Earth’s atmosphere, with shorter blue wavelengths scattering more effectively than other colors, causing the sky to appear blue.

system = "Talk like a pirate"
prompt = "In one sentence: why is the sky blue?"
oclient(system_prompt=system, user_prompt=prompt)

[Thinking] … thought in 264 words

Arrr! The sun’s light scatters in the air, and the short blue waves bounce all about, making the sky shine like a doubloon!

prompt = "In one sentence: why is the sky blue?"
assistant = "Once upon a time "
oclient(user_prompt=prompt, assistant_prefill=assistant)

Once upon a time 200 years ago, a scientist named Rayleigh discovered that the sky is blue because of a phenomenon called Rayleigh scattering, where the atmosphere scatters shorter blue wavelengths of sunlight more than longer red wavelengths, making the sky appear blue during the day.

prompt = "Using only the provided tools to make no mistake, what is (11545468+78782431)*418742?"
tools = Tools([add, multiply])
oclient(user_prompt=prompt, tools=tools, think=True)

[Thinking] … thought in 2925 words

[Tool call] … add returned 90327899

[Tool call] … multiply returned 37824085083058

[Thinking] … thought in 116 words

The result of \((11545468 + 78782431) \times 418742\) is 37824085083058.

class Book:
    def __init__(self, title: str, pages: int):
        self.title = title
        self.pages = pages

    def __repr__(self):
        return f"Book Title : {self.title}\nNumber of Pages : {self.pages}"

book = Book("War and Peace", 950)
book

Book Title : War and Peace
Number of Pages : 950

def find_page(book: Book, # The book to find the halfway point of
              percent: int, # Percent of a book to read to, e.g. halfway == 50, 
) -> int:
    "The page number corresponding to `percent` completion of a book"
    return round(book.pages * (percent / 100.0))

find_page(book, 50)

prompt = "Using only the provided tools to make no mistake, how many pages do I have to read to get halfway through my 950 page copy of War and Peace"
tools = Tools([find_page])
oclient(user_prompt=prompt, tools=tools, think=True)

[Thinking] … thought in 285 words

[Tool call] … find_page returned 475

[Thinking] … thought in 94 words

To reach the halfway point of your 950-page copy of War and Peace, you need to read 475 pages.

Images

model = "devstral-small-2:24b"

oclient = OllamaModelClient(model)

ollama: loading model devstral-small-2:24b with context size 32768 ... ok

prompt = "Describe this picture in a structured way"
images = Images("puppy.jpg")
oclient(user_prompt=prompt, user_images=images)

Here is a structured description of the image:

Subject:
- A young Golden Retriever puppy.
Appearance:
- Coat: Light golden fur, slightly wavy and fluffy.
- Eyes: Dark, round, and expressive.
- Ears: Floppy and medium-sized.
- Mouth: Open slightly, showing a playful expression.
- Body: Compact and sturdy, typical of a puppy.
Action:
- The puppy appears to be in motion, possibly running or playing.
- Front paws are lifted off the ground, suggesting movement.
Background:
- Outdoor setting with a grassy field.
- Blurred background, indicating focus on the puppy.
Lighting and Mood:
- Natural daylight, likely during the day.
- Warm and cheerful atmosphere, enhanced by the puppy’s joyful expression.
Additional Details:
- A leash is visible around the puppy’s neck, suggesting it might be on a walk or playtime.
- The grass is slightly tall and appears well-maintained.

This structured description captures the key elements of the image in a clear and organized manner.

prompt = "Describe both images in a short paragraph"
images.add_web_url("https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09adff6c2e6fbc5fd0dfa.jpg")
oclient(user_prompt=prompt, user_images=images)

The first image shows a golden retriever puppy standing in a grassy field, looking directly at the camera with a happy expression. The puppy has a fluffy coat and appears to be in motion, possibly running or playing. The second image features the same golden retriever puppy sitting in a grassy field during sunset, with the sun visible in the background. The puppy is looking up and to the side, also with a joyful expression, and its fur is illuminated by the warm, golden light of the setting sun. Both images capture the playful and cheerful nature of the puppy in different settings.

Structured outputs

class Pet(BaseModel):
    model_config = ConfigDict(extra="forbid")    
    name: str
    animal: str
    age: int
    color: str | None

class PetList(BaseModel):
    model_config = ConfigDict(extra="forbid")
    pets: list[Pet]

prompt = "I have two cats named Luna and Loki, Luna is 2 years old an yellow, Loki is 2 years older and the same color as the sky"
oclient(user_prompt=prompt, output_model=PetList)

oclient.response

{ “pets”: [ { “name”: “Luna”, “animal”: “cat”, “age”: 2, “color”: “yellow” }, { “name”: “Loki”, “animal”: “cat”, “age”: 4, “color”: “blue” } ] }

PetList(pets=[Pet(name='Luna', animal='cat', age=2, color='yellow'), Pet(name='Loki', animal='cat', age=4, color='blue')])

Web search

model = env.default_model_code
oclient = OllamaModelClient(model, context_size=65000)

ollama: loading model qwen3:30b with context size 65000 ... ok

prompt = "what are the features in the latest github relase of ollama"
oclient(user_prompt=prompt, think=True, web_search=True)

[Thinking] … thought in 357 words

[Tool call] … web_search returned results=[WebSearchResult(content='Releases ·oll...

[Thinking] … thought in 585 words

The latest GitHub release of Ollama is v0.14.0-rc2 (pre-release), which includes the following key features:

New Features:

Experimental CLI:
- ollama run --experimental now includes an agent loop and the bash tool for interactive workflows.
Anthropic API Compatibility:
- Support for the /v1/messages API endpoint (compatible with Anthropic models).
Model Version Requirements:
- New REQUIRES command in Modelfile to declare required Ollama version for a model.
VRAM Improvements:
- Fixes for integer underflow on low VRAM systems during memory estimation.
- More accurate VRAM measurements for AMD iGPUs.
App Enhancements:
- Ollama’s app now highlights Swift source code.
- Improved error handling for embeddings returning NaN or -Inf.
Linux Install Optimization:
- Linux install bundles now use zst compression (smaller downloads).
Experimental Image Generation:
- New support for image generation models via MLX (experimental).

Notes:

Stable Release: The latest stable release is v0.13.5 (Dec 18, 2025), which added support for bert architecture models, DeepSeek-V3.1 tool parsing, and Google’s FunctionGemma model.
Pre-release: v0.14.0-rc2 is the most recent release candidate, but it is not yet finalized (marked as “Pre-release” on GitHub).

For the most up-to-date features, check the Ollama GitHub Releases page.

prompt = "read https://docs.ollama.com/capabilities/web-search and summarize in one sentence what tools i can use to implement a search agent with ollama"
oclient(user_prompt=prompt, think=True, web_search=True)

[Thinking] … thought in 424 words

[Tool call] … web_fetch returned title='Web search - Ollama' content='Web search...

[Thinking] … thought in 224 words

Ollama provides the web_search and web_fetch APIs as core tools to implement a search agent, enabling model-based web queries and page content retrieval for accurate, up-to-date information.

openrouter chat client

env = WordslabEnv()

client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=env.cloud_openrouter_api_key)

model = "google/gemini-3-flash-preview"
messages = [{'role': 'user', 'content': 'What is the smallest number palindrome greater than 130?'}]
stream = client.chat.completions.create(model=model, messages=messages, stream=True, extra_body={"reasoning": {"enabled": True}})
for chunk in stream:
    delta = chunk.choices[0].delta
    print(delta)

ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Identifying a Solution**\n\nI've homed in on the core challenge: pinpointing the smallest palindrome exceeding 130. The constraints are clear, and I'm strategizing how to efficiently generate and validate candidate numbers. I am starting by looking at the numbers directly after 130.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Identifying a Solution**\n\nI've homed in on the core challenge: pinpointing the smallest palindrome exceeding 130. The constraints are clear, and I'm strategizing how to efficiently generate and validate candidate numbers. I am starting by looking at the numbers directly after 130.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Determining the Answer**\n\nI've directly confirmed that 131 fulfills all criteria. No need to look further: it's the smallest palindrome that's larger than 130. I have confirmed that this number satisfies the relevant constraints, and I consider this the final result.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Determining the Answer**\n\nI've directly confirmed that 131 fulfills all criteria. No need to look further: it's the smallest palindrome that's larger than 130. I have confirmed that this number satisfies the relevant constraints, and I consider this the final result.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='The smallest palindrome number greater than 130 is **131**.\n\nA palindrome is a number that reads the same forwards and backwards. Since 131 reads as "1-3-1" in both directions and is the very next integer after 130 that follows this rule, it is the', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=None, reasoning_details=[], annotations=[])
ChoiceDelta(content=' correct answer.', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=None, reasoning_details=[{'index': 0, 'type': 'reasoning.encrypted', 'data': 'CiIBjz1rXx1wDxrqK4iEiYLEfo4BC0TbRcSKhvVO48q1Ge9HCmgBjz1rX93sCp42z//JmKxDWvv3kYw3fkxVXSsbAkJ9/OsR5GxzvX3NRxq1GmWU6PzNix6QQ9aFfZmZqEwc3u/anTQiQwCLH+N77Rh7vLxAvr0VCSVLJ83rqljqFi93fMyjr0pjX9OE5gpZAY89a1/PpcHwJxd6EjqwLvSFlebK6nVnQDn90G86P3/tRXy2sZxYogVEh44KsQZ4r+0B/2ClLCNJf+EvIBCEs6zXHTMLx6AJIDU6SumITgIppnPdYWwoUuoKeAGPPWtfeJ7lICOOm3dtb2YzhnJiSGPI5m96K1G/2lwo04wSze+2vbS0FGKQnAYkOR21tRqDV+JGOKon/MdNhKTfjVd+TNPUfcQCNQh+dmlVajIJRyql4sYeYSg6Wz+AA0A0dNv6GWIKaXmKCBS087b+FUT1ZMBuSgpgAY89a1+ZR6BSJsZTg5YUGg11YzeDIsaqhs+FvHy/1XB4uPp3SW+mqIfysxta4e4y7oeOuTNButgOKnDeaJfPG8VSjSh0lT0R1dHEifHAIFbiF2IKQJXW07ns6L1E9syYCn8Bjz1rXwn3gb5Q8D+qPic5yZTziFidiTxpKH3uhjyYqTrZR8hRGnmVVk05a8E/5J81UkxOevZ5yAxLiFCdimXI5yr7LriL4bDiHajdSXVxdOiKHzb6Pqx3MiLySYUt1ToD3XUAIZQBmfDVw+Qd6SQdagvsi86NKMv793xrUMlECroBAY89a18BugGYHx3fQYckccMCOS91fxOFH7hBj24O746sJhbrBMLmQSPk0du421Zmd8Lx1Ns21/SwpHCfnQ3AEA9BZ9XfahR51tE96d9DjXO7kMsgGDPoLsDApRTnRenQbPJnpXqoQYZbsnE/H1B6Tb4wcE5xJpcBKLRvXhg+c14T5NYpFEcE3dHaNUH6xXj68ZLBl2Q5FtzqOd6tyxSLRhQJfa00yDFnpn+Lbo0zLSRqbP+ovAWIq79BCpMBAY89a1/ENqGNTYpM7VJzpDZus8SoUClqlwaOwrOOHYfz6NaB3CP3coADgPnltU0hZrGfE0w4xHdM85pJaT/2HHv9GQH2o2R4nAGwT3KInxBeAVo0TtE2M9aN78wDhveiyqPUsHO1uybcRorF3uUwEIAT8NX93ykz3FwcVZvO/aYtE9A7mVN4qYgXfjvIFwIBjFJs', 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None)

messages = [{'role': 'user', 'content': "Using only the provided tools to make no mistake, what is (11545468+78782431)*418742?"}]
tools = Tools([add, multiply])
stream = client.chat.completions.create(model=model, messages=messages, tools = tools.get_schemas(), stream=True, extra_body={"reasoning": {"enabled": True}})
for chunk in stream:
    delta = chunk.choices[0].delta
    print(delta)

ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Calculating the Total Sum**\n\nI'm currently focused on the first step: summing the initial numbers. I've successfully employed the `add` function, and the intermediate result is now readily available. It's a significant figure, and I'm ready to proceed to the next stage after a brief review.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Calculating the Total Sum**\n\nI'm currently focused on the first step: summing the initial numbers. I've successfully employed the `add` function, and the intermediate result is now readily available. It's a significant figure, and I'm ready to proceed to the next stage after a brief review.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="**Initiating Multiplication Operations**\n\nI've got the total sum from the previous stage, a substantial number. Now, the plan is to multiply this sum by 418,742. I'm preparing to invoke the `multiply` function, and I'm ready to observe the final result soon.\n\n\n", reasoning_details=[{'index': 0, 'type': 'reasoning.text', 'text': "**Initiating Multiplication Operations**\n\nI've got the total sum from the previous stage, a substantial number. Now, the plan is to multiply this sum by 418,742. I'm preparing to invoke the `multiply` function, and I'm ready to observe the final result soon.\n\n\n", 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=[ChoiceDeltaToolCall(index=0, id='tool_add_bxj7qiqYR0YykpEmfzoC', function=ChoiceDeltaToolCallFunction(arguments='{"b":78782431,"a":11545468}', name='add'), type='function')], reasoning=None, reasoning_details=[{'index': 0, 'id': 'tool_add_bxj7qiqYR0YykpEmfzoC', 'type': 'reasoning.encrypted', 'data': 'CiQBjz1rX1bExybTM6VtK0kASX6FTxfBBwyi932MRRrEuYtvkdsKUQGPPWtfr55kYc/lU0ZhVkHjN6wZ74W1LVMPXkSF3tFEoDwQtfzpuzYnV7xzD1qz5CnzNvZm5i9eY0qQKJhzhXAC3FTmjZ6D2OdODRvDX7uUGwo0AY89a1/zpxr22ksclwFiLTfaAWYr57nJPyB5mHc8h6PIE3uu5ggyUE4VsZoaqVxv2q03jQp6AY89a187jC4w08IR8/fDOrJwQEOLQO5yNEUlQphL8lckJvAltj2ULGXY06WBV8yi1n3YGUZEAgLU4ihc/o8+/nVuk+OnKvicbju9XY82ZeP35D8yFVPEFMyHifVptjjPry8rCpr4N84UipqVfc0rmtFPBbE6+6TTFIUKaQGPPWtf6XkgaitG1kvzHAeW+I55A2dWbhyG/8Z15RjCkArCtKMEEf1r9v7G0ke7kNuIW8jhHF5H/wm/20m69zrsCp1obV+jbZR8T6lb3Km4EXJOldkm/U1lsHZRI8Lp2mkzreV1QbkqHwo2AY89a1/PJ5AClPojS2Y4jbD3oN8tblRCmUNTysi69fLd/4tqapYHz4CLAo+LMhFvzLN9N4qZ', 'format': 'google-gemini-v1'}], annotations=[])
ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None)

Openrouter API reference

Note: as of January 2026 - the OpenAI-compatible Responses API (Beta) is in beta stage and may have breaking changes. Use with caution in production environments.

=> we will use the competions API for now

https://openrouter.ai/docs/api/reference/overview

REQUEST SCHEMA

// Definitions of subtypes are below
type Request = {
  // Either "messages" or "prompt" is required
  messages?: Message[];
  prompt?: string;
  // If "model" is unspecified, uses the user's default
  model?: string; // See "Supported Models" section
  // Allows to force the model to produce specific output format.
  // See models page and note on this docs page for which models support it.
  response_format?: { type: 'json_object' };
  stop?: string | string[];
  stream?: boolean; // Enable streaming
  // See LLM Parameters (openrouter.ai/docs/api/reference/parameters)
  max_tokens?: number; // Range: [1, context_length)
  temperature?: number; // Range: [0, 2]
  // Tool calling
  // Will be passed down as-is for providers implementing OpenAI's interface.
  // For providers with custom interfaces, we transform and map the properties.
  // Otherwise, we transform the tools into a YAML template. The model responds with an assistant message.
  // See models supporting tool calling: openrouter.ai/models?supported_parameters=tools
  tools?: Tool[];
  tool_choice?: ToolChoice;
  // Advanced optional parameters
  seed?: number; // Integer only
  top_p?: number; // Range: (0, 1]
  top_k?: number; // Range: [1, Infinity) Not available for OpenAI models
  frequency_penalty?: number; // Range: [-2, 2]
  presence_penalty?: number; // Range: [-2, 2]
  repetition_penalty?: number; // Range: (0, 2]
  logit_bias?: { [key: number]: number };
  top_logprobs: number; // Integer only
  min_p?: number; // Range: [0, 1]
  top_a?: number; // Range: [0, 1]
  // Reduce latency by providing the model with a predicted output
  // https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs
  prediction?: { type: 'content'; content: string };
  // OpenRouter-only parameters
  // See "Prompt Transforms" section: openrouter.ai/docs/guides/features/message-transforms
  transforms?: string[];
  // See "Model Routing" section: openrouter.ai/docs/guides/features/model-routing
  models?: string[];
  route?: 'fallback';
  // See "Provider Routing" section: openrouter.ai/docs/guides/routing/provider-selection
  provider?: ProviderPreferences;
  user?: string; // A stable identifier for your end-users. Used to help detect and prevent abuse.
  
  // Debug options (streaming only)
  debug?: {
    echo_upstream_body?: boolean; // If true, returns the transformed request body sent to the provider
  };
};

// Subtypes:
type TextContent = {
  type: 'text';
  text: string;
};
type ImageContentPart = {
  type: 'image_url';
  image_url: {
    url: string; // URL or base64 encoded image data
    detail?: string; // Optional, defaults to "auto"
  };
};
type ContentPart = TextContent | ImageContentPart;
type Message =
  | {
      role: 'user' | 'assistant' | 'system';
      // ContentParts are only for the "user" role:
      content: string | ContentPart[];
      // If "name" is included, it will be prepended like this
      // for non-OpenAI models: `{name}: {content}`
      name?: string;
    }
  | {
      role: 'tool';
      content: string;
      tool_call_id: string;
      name?: string;
    };
type FunctionDescription = {
  description?: string;
  name: string;
  parameters: object; // JSON Schema object
};
type Tool = {
  type: 'function';
  function: FunctionDescription;
};
type ToolChoice =
  | 'none'
  | 'auto'
  | {
      type: 'function';
      function: {
        name: string;
      };

RESPONSE SCHEMA

// Definitions of subtypes are below
type Response = {
  id: string;
  // Depending on whether you set "stream" to "true" and
  // whether you passed in "messages" or a "prompt", you
  // will get a different output shape
  choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[];
  created: number; // Unix timestamp
  model: string;
  object: 'chat.completion' | 'chat.completion.chunk';
  system_fingerprint?: string; // Only present if the provider supports it
  // Usage data is always returned for non-streaming.
  // When streaming, you will get one usage object at
  // the end accompanied by an empty choices array.
  usage?: ResponseUsage;
};
// If the provider returns usage, we pass it down
// as-is. Otherwise, we count using the GPT-4 tokenizer.
type ResponseUsage = {
  /** Including images and tools if any */
  prompt_tokens: number;
  /** The tokens generated */
  completion_tokens: number;
  /** Sum of the above two fields */
  total_tokens: number;
};


// Subtypes:
type NonChatChoice = {
  finish_reason: string | null;
  text: string;
  error?: ErrorResponse;
};
type NonStreamingChoice = {
  finish_reason: string | null;
  native_finish_reason: string | null;
  message: {
    content: string | null;
    role: string;
    tool_calls?: ToolCall[];
  };
  error?: ErrorResponse;
};
type StreamingChoice = {
  finish_reason: string | null;
  native_finish_reason: string | null;
  delta: {
    content: string | null;
    role?: string;
    tool_calls?: ToolCall[];
  };
  error?: ErrorResponse;
};
type ErrorResponse = {
  code: number; // See "Error Handling" section
  message: string;
  metadata?: Record<string, unknown>; // Contains additional error information such as provider details, the raw error message, etc.
};
type ToolCall = {
  id: string;
  type: 'function';
  function: FunctionCall;
};

OpenRouterModelClient


def OpenRouterModelClient(
    model:str,
    context_size:Optional=None, # For OpenRouter this parameter is ignored, we inherit the remote model config
    base_url:str='https://openrouter.ai/api/v1',
    api_key:Optional=None, # If not provided, the mandatory key will be pulled from WordslabEnv
):

Helper class that provides a standard way to create an ABC using inheritance.

model = "anthropic/claude-sonnet-4.5"
orclient = OpenRouterModelClient(model)

openrouter: testing model anthropic/claude-sonnet-4.5 ... ok

prompt = 'Why is the sky blue?'
orclient(user_prompt=prompt)

The sky is blue due to a phenomenon called Rayleigh scattering.

Here’s how it works:

Sunlight contains all colors - White sunlight is actually made up of all the colors of the rainbow (different wavelengths of light).
Light interacts with the atmosphere - When sunlight enters Earth’s atmosphere, it collides with gas molecules (mainly nitrogen and oxygen).
Blue light scatters more - Shorter wavelengths (blue and violet) scatter much more easily than longer wavelengths (red and orange). Blue light gets scattered in all directions throughout the sky.
Why not violet? - Violet scatters even more than blue, but our eyes are more sensitive to blue, and some violet light is absorbed by the upper atmosphere, so we perceive the sky as blue.

At sunset/sunrise, the sky turns red/orange because sunlight travels through more atmosphere to reach your eyes, scattering away most of the blue light and leaving the longer red and orange wavelengths visible.

prompt = 'Why is the sky blue?'
orclient(system_prompt="Talk like a pirate", user_prompt=prompt)

Arrr, ye be askin’ a fine question there, matey!

The sky be blue because of a bit o’ science called “Rayleigh scatterin’,” savvy? When the sun’s light comes sailin’ through our atmosphere, it be carryin’ all the colors o’ the rainbow mixed together, aye.

Now here be the trick - the tiny molecules in the air be like little scallywags that scatter the light in all directions. But the blue light, bein’ shorter and choppier like waves in a storm, gets scattered MORE than the other colors. Red and orange light? Those long wavelengths sail right on through like a ship with the wind at its back!

So when ye look up at the heavens, yer eyes be seein’ all that scattered blue light bouncin’ around the sky from every direction. That be why the whole sky looks blue instead of just where the sun be!

At sunset though, arrr, the light travels through more atmosphere - like a longer voyage across the seas - and all that blue gets scattered away completely, leavin’ only the reds and oranges to paint the sky. Beautiful as a Caribbean sunset, it is!

Tips tricorn hat

Any more questions for this old sea dog? ⚓

prompt = 'Why is the sky blue?'
orclient(user_prompt=prompt, assistant_prefill="Once upon a time ")

Once upon a time in the kingdom of Light, photons embarked on a journey from the Sun to Earth. As they traveled through the atmosphere, they encountered tiny molecules of nitrogen and oxygen.

These molecules were much smaller than the wavelengths of visible light, which caused something magical called Rayleigh scattering. This type of scattering has a special property: it scatters shorter wavelengths (blue and violet light) much more effectively than longer wavelengths (red and orange light) — specifically, shorter wavelengths scatter about 10 times more!

Here’s what happens:

🔵 Blue light gets scattered in all directions by air molecules 🟣 Violet light scatters even more, but our eyes are less sensitive to it 🔴 Red/orange light passes through with less scattering

So when you look up at the sky (away from the Sun), you’re seeing all that scattered blue light coming from every direction. This creates the beautiful blue canopy above us.

Fun fact: At sunrise and sunset, sunlight travels through more atmosphere to reach your eyes. Most of the blue light gets scattered away before it reaches you, leaving the warm reds and oranges — creating those stunning twilight colors! 🌅

prompt = 'What is the smallest number palindrome greater than 130?'
orclient(user_prompt=prompt, think=1024, max_new_tokens=2000, seed=42, temperature=0.7)

[Thinking] … thought in 92 words

Looking at numbers greater than 130:

131: reads as 1-3-1 → This is the same forwards and backwards ✓

131 is the smallest palindrome greater than 130.

prompt = "Using only the provided tools to make no mistake, what is (11545468+78782431)*418742?"
tools = Tools([add, multiply])
orclient(user_prompt=prompt, tools=tools, think=2014)

[Thinking] … thought in 35 words

I’ll solve this step by step using the provided tools.

First, let me add 11545468 and 78782431:

[Tool call] … add returned 90327899

Now let me multiply the result by 418742:

[Tool call] … multiply returned 37824085083058

The answer is 37,824,085,083,058.

Images

model = "openai/gpt-5.2"
orclient = OpenRouterModelClient(model)

openrouter: testing model openai/gpt-5.2 ... ok

prompt = "Describe this picture in a structured way"
images = Images("puppy.jpg")
orclient(user_prompt=prompt, user_images=images)

Structured Description of the Image

1) Overview

Scene type: Outdoor animal portrait/action shot
Main subject: A light-colored puppy (appears to be a golden retriever-type) running toward the camera
Setting: Open grassy field with a softly blurred background

2) Subject Details

Animal: Young dog/puppy
Fur color & texture: Cream to pale golden, fluffy coat
Face: Dark eyes, black nose, mouth open with tongue visible (panting/happy expression)
Ears: Floppy, slightly darker golden tone than the face
Accessories: Thin collar/strap visible around the neck area

3) Action & Pose

Motion: Running forward toward the viewer
Body position: Front paw lifted mid-step; posture suggests energetic movement
Expression/emotion: Playful, cheerful, excited (open mouth “smile,” tongue out)

4) Environment / Background

Foreground: Short grass with a few small stems/seedheads visible
Background: Strong blur (shallow depth of field), suggesting distance and open space; colors imply a natural landscape (field/meadow)

5) Composition & Camera Perspective

Framing: Subject centered and dominant in the frame
Angle: Low, near ground level, emphasizing the puppy’s approach
Focus: Sharp focus on the puppy; background is heavily defocused (bokeh), drawing attention to the subject

6) Lighting & Color Palette

Lighting: Soft natural daylight, evenly illuminating the puppy
Dominant colors: Cream/golden (puppy), green/yellow-brown (grass and background), muted neutral sky tones
Overall tone: Warm and gentle, with a natural outdoor feel

7) Mood / Impression

Mood: Joyful, lively, friendly
Implied context: A puppy playing or running freely in a field

prompt = "Describe both images in a short paragraph"
images.add_web_url("https://i.pinimg.com/736x/3c/fa/a2/3cfaa27aeff09adff6c2e6fbc5fd0dfa.jpg")
orclient(user_prompt=prompt, user_images=images)

Both images feature a fluffy golden retriever puppy outdoors in a grassy field. In the first image, the puppy is running toward the camera with its mouth open and tongue out, looking playful and energetic against a softly blurred green background. In the second image, the puppy is sitting calmly in warm golden-hour light, gazing upward with a relaxed expression while the sun hangs low in an orange sky behind it.

Structured outputs

prompt = "I have two cats named Luna and Loki, Luna is 2 years old an yellow, Loki is 2 years older and the same color as the sky"
orclient(user_prompt=prompt, output_model=PetList)
orclient.response

[Thinking] … thought in 71 words

{“pets”:[{“name”:“Luna”,“animal”:“cat”,“age”:2,“color”:“yellow”},{“name”:“Loki”,“animal”:“cat”,“age”:4,“color”:“blue”}]}

PetList(pets=[Pet(name='Luna', animal='cat', age=2, color='yellow'), Pet(name='Loki', animal='cat', age=4, color='blue')])

Web search

prompt = "what are the features in the latest github relase of ollama"
orclient(user_prompt=prompt, web_search=True)

[Thinking] … thought in 473 words

As of January 10, 2026, the newest item on Ollama’s GitHub “Releases” page is v0.14.0-rc2 (pre-release). (github.com)

v0.14.0-rc2 (pre-release) — main new features / changes

New experimental CLI mode: ollama run --experimental adds a new Ollama CLI with an agent loop and a built-in bash tool. (github.com)
Anthropic API compatibility: support for the /v1/messages API. (github.com)
Modelfile version gating: new REQUIRES command to declare the minimum Ollama version a model needs. (github.com)
Low-VRAM safety fix: avoids an integer underflow during memory estimation on low-VRAM systems (for older models). (github.com)
Better AMD iGPU reporting: more accurate VRAM measurements. (github.com)
App improvement: highlights Swift source code. (github.com)
Embedding robustness: returns an error if embeddings contain NaN or -Inf. (github.com)
Linux installer packaging: bundles now use zst compression. (github.com)
Experimental image generation: adds experimental support for image generation models, powered by MLX. (github.com)

If you meant the latest stable (non-pre-release) release: v0.13.5

New model: Google’s FunctionGemma (Gemma 3 270M tuned for function calling). (github.com)
Engine support: bert-architecture models now run on Ollama’s engine. (github.com)
DeepSeek-V3.1 improvements: built-in renderer and tool parsing. (github.com)
Tool rendering fix: nested tool properties render correctly. (github.com)

If you tell me whether you’re on macOS/Windows/Linux (and whether you want stable only), I can highlight which of these changes you’ll actually notice day-to-day.

Models providers

Design concepts

User centric workflow

identify your self-hosted inference or inference as a service options
understand your task type, properties, privacy needs and scale
find the best model for your task, given your constraints
prepare and start your self hosted inference or connect to your inference as a service provider
monitor your resource usage and cost

Self-hosted inference or inference as a service

Model families - architecture name - parameter size - training type: base / instruct / thinking - version: relase date - quantization

Model constraints - model capabilities - modalities in/out - context length - instruction - thinking - tools - model usage - prompt template and special tokens - languages supported - recommended use cases - prompting guidelines - model license - use case restrictions - commercial usage restrictions - outputs usage restrictions - model transparency

Self-hosted inference constraints - model requirements - size on disk -> download time / load time in vram - size in vram -> max context length / num parallel sequence - tensor flops -> input tokens/sec - memory bandwidth -> output tokens/sec - inference machine constraints - download speed - disk size and speed - GPU vram, memory bandwidth, tensor flops - rented machine constraints - GPU availability - price when you use per GPU - price when you don’t use per GB (storage)

Inference as a service constraints - router constraints - … same as provider constraints below … - provider constraints - terms of service - privacy options - inference quotas - service availability - per model provider constraints - model capabilities exposed - input/output tokens cost - input/output tokens/sec

List, download and load models

Explore ollama API

Get ollama version

Request
curl http://localhost:11434/api/version
Response
{
  "version": "0.5.1"
}

List remote models

As of december 2025, there is no API to get the ollama catalog of models, web scraping is the only solution.

import httpx
import re
from html import unescape

def updated_to_months(updated):
    """
    Convert strings like:
      "1 year ago", "2 years ago",
      "1 month ago", "3 weeks ago",
      "7 days ago", "yesterday",
      "4 hours ago"
    into integer months.
    """
    if not updated:
        return None

    updated = updated.lower().strip()

    # handle 'yesterday' explicitly
    if updated == "yesterday":
        return 0

    # years → months
    m = re.match(r'(\d+)\s+year', updated)
    if m:
        years = int(m.group(1))
        return years * 12

    # months
    m = re.match(r'(\d+)\s+month', updated)
    if m:
        return int(m.group(1))

    # weeks
    m = re.match(r'(\d+)\s+week', updated)
    if m:
        weeks = int(m.group(1))
        return max(0, weeks // 4)

    # days
    m = re.match(r'(\d+)\s+day', updated)
    if m:
        return 0

    # hours / minutes / seconds → treat as < 1 month
    if any(unit in updated for unit in ["hour", "minute", "second"]):
        return 0

    return None

def pulls_to_int(pulls_str):
    """
    Convert a pulls string like:
        '5M', '655.8K', '49K', '73.7M', '957.4K', '27.7M'
    into an integer.
    """
    if not pulls_str:
        return None

    pulls_str = pulls_str.strip().upper()

    match = re.match(r'([\d,.]+)\s*([KM]?)', pulls_str)
    if not match:
        return None

    number, suffix = match.groups()
    # Remove commas and convert to float
    number = float(number.replace(',', ''))

    if suffix == 'M':
        number *= 1_000_000
    elif suffix == 'K':
        number *= 1_000

    return int(number)

def parse_model_list_regex(html):
    models = []

    # --- Extract each <li x-test-model>...</li> block ---
    li_blocks = re.findall(
        r'<li[^>]*x-test-model[^>]*>(.*?)</li>',
        html,
        flags=re.DOTALL
    )

    for block in li_blocks:

        # name from <a href="/library/...">
        name = None
        m = re.search(r'href="/library/([^"]+)"', block)
        if m:
            name = m.group(1)

        # description <p class="max-w-lg ...">...</p>
        description = ""
        m = re.search(
            r'<p[^>]*text-neutral-800[^>]*>(.*?)</p>',
            block,
            flags=re.DOTALL
        )
        if m:
            description = re.sub(r'<.*?>', '', m.group(1)).strip()
            description = unescape(description)

        # capabilities (x-test-capability)
        capabilities = re.findall(
            r'<span[^>]*x-test-capability[^>]*>(.*?)</span>',
            block,
            flags=re.DOTALL
        )
        capabilities = [c.strip() for c in capabilities]

        # check for the special 'cloud' span 
        cloud = False
        if re.search(
            r'<span[^>]*>cloud</span>',
            block,
            flags=re.DOTALL
        ):
            cloud = True

        # sizes (x-test-size)
        sizes = re.findall(
            r'<span[^>]*x-test-size[^>]*>(.*?)</span>',
            block,
            flags=re.DOTALL
        )
        sizes = [s.strip() for s in sizes]

        # pulls <span x-test-pull-count>5M</span>
        pulls = None
        m = re.search(
            r'<span[^>]*x-test-pull-count[^>]*>(.*?)</span>',
            block
        )
        if m:
            pulls = m.group(1).strip()

        # tag count <span x-test-tag-count>5</span>
        tag_count = None
        m = re.search(
            r'<span[^>]*x-test-tag-count[^>]*>(.*?)</span>',
            block
        )
        if m:
            tag_count = m.group(1).strip()

        # updated text <span x-test-updated>...</span>
        updated = None
        m = re.search(
            r'<span[^>]*x-test-updated[^>]*>(.*?)</span>',
            block
        )
        if m:
            updated = m.group(1).strip()

        models.append({
            "name": name,
            "description": description,
            "capabilities": capabilities,
            "cloud": cloud,
            "sizes": sizes,
            "pulls": pulls_to_int(pulls),
            "tag_count": int(tag_count),
            "updated_months": updated_to_months(updated),
            "url": f"https://ollama.com/library/{name}" if name else None
        })

    return models   

def list_models(contains=None):
    """
    Extract model names and properties from https://ollama.com/library
    Optionally filter by substring.
    """

    html = httpx.get("https://ollama.com/library").text
    models = parse_model_list_regex(html)

    if contains:
        models = [
            m for m in models
            if contains.lower() in m["name"].lower()
        ]
        models = sorted(models, key=lambda m:m["name"])

    return models

def list_recent_models_from_family(familyfilter):
    return [f"{m['name']} {m['capabilities'] if len(m['capabilities'])>0 else ''} {m['sizes'] if len(m['sizes'])>0 else ''}{' [cloud]' if m['cloud'] else ''}" for m in list_models(familyfilter) if m["updated_months"] is not None and m["updated_months"]<12]

def list_tags(model):
    """
    Extract valid quantized tags only, without HTML noise,
    and apply the same exclusions as original greps.
    """
    html = httpx.get(f"https://ollama.com/library/{model}/tags").text

    # Capture ONLY the tag part after model:..., e.g. 3b-instruct-q4_K_M
    raw_tags = re.findall(
        rf'{re.escape(model)}:([A-Za-z0-9._-]*q[A-Za-z0-9._-]*)',
        html
    )

    # Re-add full prefix model:<tag>
    tags = [f"{model}:{t}" for t in raw_tags]

    # Exclude text|base|fp|q4_[01]|q5_[01]
    tags = [
        t for t in tags
        if not re.search(r'(text|base|fp|q[45]_[01])', t)
    ]

    # Deduplicate
    return set(tags)

list_models()[:5]

[{'name': 'gpt-oss',
  'description': 'OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.',
  'capabilities': ['tools', 'thinking'],
  'cloud': True,
  'sizes': ['20b', '120b'],
  'pulls': 5000000,
  'tag_count': 5,
  'updated_months': 1,
  'url': 'https://ollama.com/library/gpt-oss'},
 {'name': 'qwen3-vl',
  'description': 'The most powerful vision-language model in the Qwen model family to date.',
  'capabilities': ['vision', 'tools'],
  'cloud': True,
  'sizes': ['2b', '4b', '8b', '30b', '32b', '235b'],
  'pulls': 656300,
  'tag_count': 59,
  'updated_months': 1,
  'url': 'https://ollama.com/library/qwen3-vl'},
 {'name': 'ministral-3',
  'description': 'The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.',
  'capabilities': ['vision', 'tools'],
  'cloud': True,
  'sizes': ['3b', '8b', '14b'],
  'pulls': 49100,
  'tag_count': 16,
  'updated_months': 0,
  'url': 'https://ollama.com/library/ministral-3'},
 {'name': 'deepseek-r1',
  'description': 'DeepSeek-R1 is a family of open reasoning models with performance approaching that of leading models, such as O3 and Gemini 2.5 Pro.',
  'capabilities': ['tools', 'thinking'],
  'cloud': False,
  'sizes': ['1.5b', '7b', '8b', '14b', '32b', '70b', '671b'],
  'pulls': 73700000,
  'tag_count': 35,
  'updated_months': 5,
  'url': 'https://ollama.com/library/deepseek-r1'},
 {'name': 'qwen3-coder',
  'description': "Alibaba's performant long context models for agentic and coding tasks.",
  'capabilities': ['tools'],
  'cloud': True,
  'sizes': ['30b', '480b'],
  'pulls': 958100,
  'tag_count': 10,
  'updated_months': 2,
  'url': 'https://ollama.com/library/qwen3-coder'}]

list_recent_models_from_family("qwen")

["qwen2.5-coder ['tools'] ['0.5b', '1.5b', '3b', '7b', '14b', '32b']",
 "qwen2.5vl ['vision'] ['3b', '7b', '32b', '72b']",
 "qwen3 ['tools', 'thinking'] ['0.6b', '1.7b', '4b', '8b', '14b', '30b', '32b', '235b']",
 "qwen3-coder ['tools'] ['30b', '480b'] [cloud]",
 "qwen3-embedding ['embedding'] ['0.6b', '4b', '8b']",
 "qwen3-vl ['vision', 'tools'] ['2b', '4b', '8b', '30b', '32b', '235b'] [cloud]"]

list_recent_models_from_family("gemma")

["embeddinggemma ['embedding'] ['300m']",
 "gemma3 ['vision'] ['270m', '1b', '4b', '12b', '27b'] [cloud]",
 "gemma3n  ['e2b', 'e4b']"]

list_recent_models_from_family("stral")

["devstral ['tools'] ['24b']",
 "magistral ['tools', 'thinking'] ['24b']",
 "ministral-3 ['vision', 'tools'] ['3b', '8b', '14b'] [cloud]",
 "mistral ['tools'] ['7b']",
 'mistral-large-3   [cloud]',
 "mistral-nemo ['tools'] ['12b']",
 "mistral-small ['tools'] ['22b', '24b']",
 "mistral-small3.1 ['vision', 'tools'] ['24b']",
 "mistral-small3.2 ['vision', 'tools'] ['24b']"]

list_recent_models_from_family("gpt")

["gpt-oss ['tools', 'thinking'] ['20b', '120b'] [cloud]",
 "gpt-oss-safeguard ['tools', 'thinking'] ['20b', '120b']"]

list_recent_models_from_family("deepseek")

["deepseek-ocr ['vision'] ['3b']",
 "deepseek-r1 ['tools', 'thinking'] ['1.5b', '7b', '8b', '14b', '32b', '70b', '671b']",
 "deepseek-v3  ['671b']",
 "deepseek-v3.1 ['tools', 'thinking'] ['671b'] [cloud]"]

list_recent_models_from_family("glm")

['glm-4.6   [cloud]']

list_recent_models_from_family("granite")

["granite-embedding ['embedding'] ['30m', '278m']",
 "granite3.1-dense ['tools'] ['2b', '8b']",
 "granite3.1-moe ['tools'] ['1b', '3b']",
 "granite3.2 ['tools'] ['2b', '8b']",
 "granite3.2-vision ['vision', 'tools'] ['2b']",
 "granite3.3 ['tools'] ['2b', '8b']",
 "granite4 ['tools'] ['350m', '1b', '3b']"]

list_recent_models_from_family("llama")

["llama3.2-vision ['vision'] ['11b', '90b']",
 "llama4 ['vision', 'tools'] ['16x17b', '128x17b']"]

list_recent_models_from_family("phi")

["dolphin-mixtral  ['8x7b', '8x22b']",
 "dolphin3  ['8b']",
 "phi4  ['14b']",
 "phi4-mini ['tools'] ['3.8b']",
 "phi4-mini-reasoning  ['3.8b']",
 "phi4-reasoning  ['14b']"]

list_recent_models_from_family("hermes")

["hermes3 ['tools'] ['3b', '8b', '70b', '405b']",
 "nous-hermes2-mixtral  ['8x7b']"]

list_recent_models_from_family("olmo")

["olmo2  ['7b', '13b']"]

list_recent_models_from_family("embed")

["embeddinggemma ['embedding'] ['300m']",
 "granite-embedding ['embedding'] ['30m', '278m']",
 "qwen3-embedding ['embedding'] ['0.6b', '4b', '8b']"]

list_tags("ministral-3")

{'ministral-3:14b-instruct-2512-q4_K_M',
 'ministral-3:14b-instruct-2512-q8_0',
 'ministral-3:3b-instruct-2512-q4_K_M',
 'ministral-3:3b-instruct-2512-q8_0',
 'ministral-3:8b-instruct-2512-q4_K_M',
 'ministral-3:8b-instruct-2512-q8_0'}

list_tags("mistral-small3.2")

{'mistral-small3.2:24b-instruct-2506-q4_K_M',
 'mistral-small3.2:24b-instruct-2506-q8_0'}

list_tags("qwen3-vl")

{'qwen3-vl:235b-a22b-instruct-q4_K_M',
 'qwen3-vl:235b-a22b-instruct-q8_0',
 'qwen3-vl:235b-a22b-thinking-q4_K_M',
 'qwen3-vl:235b-a22b-thinking-q8_0',
 'qwen3-vl:2b-instruct-q4_K_M',
 'qwen3-vl:2b-instruct-q8_0',
 'qwen3-vl:2b-thinking-q4_K_M',
 'qwen3-vl:2b-thinking-q8_0',
 'qwen3-vl:30b-a3b-instruct-q4_K_M',
 'qwen3-vl:30b-a3b-instruct-q8_0',
 'qwen3-vl:30b-a3b-thinking-q4_K_M',
 'qwen3-vl:30b-a3b-thinking-q8_0',
 'qwen3-vl:32b-instruct-q4_K_M',
 'qwen3-vl:32b-instruct-q8_0',
 'qwen3-vl:32b-thinking-q4_K_M',
 'qwen3-vl:32b-thinking-q8_0',
 'qwen3-vl:4b-instruct-q4_K_M',
 'qwen3-vl:4b-instruct-q8_0',
 'qwen3-vl:4b-thinking-q4_K_M',
 'qwen3-vl:4b-thinking-q8_0',
 'qwen3-vl:8b-instruct-q4_K_M',
 'qwen3-vl:8b-instruct-q8_0',
 'qwen3-vl:8b-thinking-q4_K_M',
 'qwen3-vl:8b-thinking-q8_0'}

https://github.com/ollama/ollama/blob/main/docs/api.md#list-local-models

ollama.list().models -> list(ollama._types.ListResponse.Model)

ollama._types.ListResponse.Model
- model: str 'qwen3:4b'
- modified_at: datetime.datetime datetime(2025, 11, 22, 18, 53, 11)
- digest: str '359d7dd4bcdab3d86b87d73ac27966f4dbb9f5efdfcc75d34a8764a09474fae7'
- size: pydantic.types.ByteSize 2497293931
- details: ollama._types.ModelDetails
  - parent_model: str ''
  - format: str 'gguf'
  - family: str 'qwen3'
  - families: Sequence[str] ['qwen3']
  - parameter_size: str '4.0B'
  - quantization_level: str 'Q4_K_M'

ollama.list().models[0]

Model(model='qwen3:4b', modified_at=datetime.datetime(2025, 11, 22, 18, 53, 11, 586211, tzinfo=TzInfo(3600)), digest='359d7dd4bcdab3d86b87d73ac27966f4dbb9f5efdfcc75d34a8764a09474fae7', size=2497293931, details=ModelDetails(parent_model='', format='gguf', family='qwen3', families=['qwen3'], parameter_size='4.0B', quantization_level='Q4_K_M'))

https://github.com/ollama/ollama/blob/main/docs/api.md#show-model-information

ollama._types.ShowResponse
- modified_at: datetime.datetime datetime.datetime(2025, 11, 22, 18, 53, 11)
- template: str '{{- $lastUserIdx := -1 -}}...\n{{- end }}'
- modelfile: str '...'
- license: str '...'
- details: ollama._types.ModelDetails -> see above
- model_info: Mapping[str, Any]
  -'general.architecture': 'qwen3'
  -'general.basename': 'Qwen3' 
  -'general.file_type': 15
  -'general.finetune': 'Thinking' 
  -'general.license': 'apache-2.0'
  -'general.license.link': 'https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507/blob/main/LICENSE'
  -'general.parameter_count': 4022468096
  -'general.quantization_version': 2, 
  -'general.size_label': '4B'
  -'general.tags': None
  -'general.type': 'model'
  -'general.version': '2507'
  -'qwen3.attention.head_count': 32
  -'qwen3.attention.head_count_kv': 8
  -'qwen3.attention.key_length': 128
  -'qwen3.attention.layer_norm_rms_epsilon': 1e-06
  -'qwen3.attention.value_length': 128
  -'qwen3.block_count': 36
  -'qwen3.context_length': 262144
  -'qwen3.embedding_length': 2560
  -'qwen3.feed_forward_length': 9728
  -'qwen3.rope.freq_base': 5000000
  -'tokenizer.ggml.add_bos_token': False
  -'tokenizer.ggml.bos_token_id': 151643
  -'tokenizer.ggml.eos_token_id': 151645
  -'tokenizer.ggml.merges': None
  -'tokenizer.ggml.model': 'gpt2'
  -'tokenizer.ggml.padding_token_id': 151643
  -'tokenizer.ggml.pre': 'qwen2'
  -'tokenizer.ggml.token_type': None
  -'tokenizer.ggml.tokens': None
- parameters: str 'top_p 0.95\n repeat_penalty 1\n stop "<|im_start|>"\n stop "<|im_end|>"\n temperature 0.6\ n top_k 20'
- capabilities: List[str] ['completion', 'tools', 'thinking']

ollama.show('gemma3:4b').capabilities, ollama.show('gemma3:4b').modelinfo

(['completion', 'vision'],
 {'gemma3.attention.head_count': 8,
  'gemma3.attention.head_count_kv': 4,
  'gemma3.attention.key_length': 256,
  'gemma3.attention.sliding_window': 1024,
  'gemma3.attention.value_length': 256,
  'gemma3.block_count': 34,
  'gemma3.context_length': 131072,
  'gemma3.embedding_length': 2560,
  'gemma3.feed_forward_length': 10240,
  'gemma3.mm.tokens_per_image': 256,
  'gemma3.vision.attention.head_count': 16,
  'gemma3.vision.attention.layer_norm_epsilon': 1e-06,
  'gemma3.vision.block_count': 27,
  'gemma3.vision.embedding_length': 1152,
  'gemma3.vision.feed_forward_length': 4304,
  'gemma3.vision.image_size': 896,
  'gemma3.vision.num_channels': 3,
  'gemma3.vision.patch_size': 14,
  'general.architecture': 'gemma3',
  'general.file_type': 15,
  'general.parameter_count': 4299915632,
  'general.quantization_version': 2,
  'tokenizer.ggml.add_bos_token': True,
  'tokenizer.ggml.add_eos_token': False,
  'tokenizer.ggml.add_padding_token': False,
  'tokenizer.ggml.add_unknown_token': False,
  'tokenizer.ggml.bos_token_id': 2,
  'tokenizer.ggml.eos_token_id': 1,
  'tokenizer.ggml.merges': None,
  'tokenizer.ggml.model': 'llama',
  'tokenizer.ggml.padding_token_id': 0,
  'tokenizer.ggml.pre': 'default',
  'tokenizer.ggml.scores': None,
  'tokenizer.ggml.token_type': None,
  'tokenizer.ggml.tokens': None,
  'tokenizer.ggml.unknown_token_id': 3})

ollama.pull??

Signature: ollama.pull(model: str, *, insecure: bool = False, stream: bool = False) -> Union[ollama._types.ProgressResponse, collections.abc.Iterator[ollama._types.ProgressResponse]]
Source:   
  def pull(
    self,
    model: str,
    *,
    insecure: bool = False,
    stream: bool = False,
  ) -> Union[ProgressResponse, Iterator[ProgressResponse]]:
    """
    Raises `ResponseError` if the request could not be fulfilled.
    Returns `ProgressResponse` if `stream` is `False`, otherwise returns a `ProgressResponse` generator.
    """
    return self._request(
      ProgressResponse,
      'POST',
      '/api/pull',
      json=PullRequest(
        model=model,
        insecure=insecure,
        stream=stream,
      ).model_dump(exclude_none=True),
      stream=stream,
    )
File:      /home/workspace/wordslab-notebooks-lib/.venv/lib/python3.12/site-packages/ollama/_client.py
Type:      method

ollama.delete??

Signature: ollama.delete(model: str) -> ollama._types.StatusResponse
Docstring: <no docstring>
Source:   
  def delete(self, model: str) -> StatusResponse:
    r = self._request_raw(
      'DELETE',
      '/api/delete',
      json=DeleteRequest(
        model=model,
      ).model_dump(exclude_none=True),
    )
    return StatusResponse(
      status='success' if r.status_code == 200 else 'error',
    )
File:      /home/workspace/wordslab-notebooks-lib/.venv/lib/python3.12/site-packages/ollama/_client.py
Type:      method

Streaming responses

Certain endpoints stream responses as JSON objects. Streaming can be disabled by providing {“stream”: false} for these endpoints.

Structured outputs

Structured outputs are supported by providing a JSON schema in the format parameter. The model will generate a response that matches the schema. See the structured outputs example below.

JSON mode

Enable JSON mode by setting the format parameter to json. This will structure the response as a valid JSON object.

https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion

Parameters - model: (required) the model name - prompt: the prompt to generate a response for - suffix: the text after the model response - images: (optional) a list of base64-encoded images (for multimodal models such as llava) - think: (for thinking models) should the model think before responding?

Advanced parameters (optional): - format: the format to return a response in. Format can be json or a JSON schema - options: additional model parameters listed in the documentation for the Modelfile such as temperature - system: system message to (overrides what is defined in the Modelfile) - template: the prompt template to use (overrides what is defined in the Modelfile) - stream: if false the response will be returned as a single response object, rather than a stream of objects - raw: if true no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your request to the API - keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)

Response

A stream of JSON objects is returned:

{ “model”: “llama3.2”, “created_at”: “2023-08-04T08:52:19.385406455-07:00”, “response”: “The”, “done”: false }

The final response in the stream also includes additional data about the generation: - total_duration: time spent generating the response - load_duration: time spent in nanoseconds loading the model - prompt_eval_count: number of tokens in the prompt - prompt_eval_duration: time spent in nanoseconds evaluating the prompt - eval_count: number of tokens in the response - eval_duration: time in nanoseconds spent generating the response - response: empty if the response was streamed, if not streamed, this will contain the full response

A response can be received in one reply when streaming is off.

To calculate how fast the response is generated in tokens per second (token/s), divide eval_count / eval_duration * 10^9.

Images

To submit images to multimodal models, provide a list of base64-encoded images:

“images”: [“iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBI…”]

ollama.generate(model='gemma3', prompt='Why is the sky blue?')

ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')

ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])

ollama.ps()

ProcessResponse(models=[])

ollama.web_search??

Signature: ollama.web_search(query: str, max_results: int = 3) -> ollama._types.WebSearchResponse
Source:   
  def web_search(self, query: str, max_results: int = 3) -> WebSearchResponse:
    """
    Performs a web search
    Args:
      query: The query to search for
      max_results: The maximum number of results to return (default: 3)
    Returns:
      WebSearchResponse with the search results
    Raises:
      ValueError: If OLLAMA_API_KEY environment variable is not set
    """
    if not self._client.headers.get('authorization', '').startswith('Bearer '):
      raise ValueError('Authorization header with Bearer token is required for web search')
    return self._request(
      WebSearchResponse,
      'POST',
      'https://ollama.com/api/web_search',
      json=WebSearchRequest(
        query=query,
        max_results=max_results,
      ).model_dump(exclude_none=True),
    )
File:      /home/workspace/wordslab-notebooks-lib/.venv/lib/python3.12/site-packages/ollama/_client.py
Type:      method

ollama.web_fetch??

Signature: ollama.web_fetch(url: str) -> ollama._types.WebFetchResponse
Source:   
  def web_fetch(self, url: str) -> WebFetchResponse:
    """
    Fetches the content of a web page for the provided URL.
    Args:
      url: The URL to fetch
    Returns:
      WebFetchResponse with the fetched result
    """
    if not self._client.headers.get('authorization', '').startswith('Bearer '):
      raise ValueError('Authorization header with Bearer token is required for web fetch')
    return self._request(
      WebFetchResponse,
      'POST',
      'https://ollama.com/api/web_fetch',
      json=WebFetchRequest(
        url=url,
      ).model_dump(exclude_none=True),
    )
File:      /home/workspace/wordslab-notebooks-lib/.venv/lib/python3.12/site-packages/ollama/_client.py
Type:      method