User Documentation

HPRC User Guide

Everything you need to write templates that embed LLM prompts and render them to HTML — with runnable examples throughout.

What & why

Most “LLM + web” code today is imperative glue: you receive a request, hand-build prompt strings in Python, manually sequence dependent calls, await them, and splice the text into a template. The orchestration lives far from the page it produces.

HPRC inverts that. The template is the source of truth:

The page author writes prompts inline, next to where their output will appear.
The application developer supplies only data and policy — a bindings dict, named rules, allowlisted tools, an LLM provider and a cache — via one HPRCConfig.
HPRC does the orchestration: condition evaluation, fill resolution, dependency ordering, concurrency, caching and serialization.

Mental model. <prompt> blocks are tacit (they execute but never render). <response> placeholders are where answers appear. Everything else is ordinary HTML.

Install

Install from PyPI:

shellbash

pip install hprc-framework                # core (pydantic only) — then: import hprc
pip install "hprc-framework[anthropic]"   # + Claude provider
pip install "hprc-framework[openai]"      # + OpenAI / Ollama provider
pip install "hprc-framework[fastapi]"     # + FastAPI/uvicorn for the web example
pip install "hprc-framework[all]"         # + every provider SDK

The only hard runtime dependency is pydantic>=2; provider SDKs are optional and lazily imported. For development, clone the repo and install editable:

shellbash

git clone https://github.com/HPRCFramework/hprc-framework
cd hprc-framework
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]" && python -m pytest -q

Quick start

A complete, offline example using the deterministic MockLLMClient (no API key needed).

quickstart.pypython

import asyncio
import hprc
from hprc import HPRCConfig, MockLLMClient, MemoryCache

TEMPLATE = """<!DOCTYPE html>
<html><body>
  <h1>Hello <fill>customer.name</fill></h1>

  <prompt id="summary" model="gpt-5" condition="is_premium" cache="24h">
    Customer <fill>customer.name</fill> is interested in <param>product</param>.
    Write a one-line account summary.
  </prompt>

  <prompt id="upsell" model="gpt-5">
    Given: <include response="summary"/>
    Suggest one upsell.
  </prompt>

  <section><h2>Summary</h2><response id="summary"/></section>
  <section><h2>Upsell</h2><response id="upsell"/></section>
</body></html>"""

async def main():
    config = HPRCConfig(
        llm_client=MockLLMClient(),
        rules={"is_premium": lambda ctx: ctx["customer"]["tier"] == "premium"},
        cache=MemoryCache(),
    )
    html = await hprc.render_template_string(
        template_html=TEMPLATE,
        request={"query": {"product": "WidgetPro"}, "path": {}, "method": "GET"},
        bindings={"customer": {"name": "Ada", "tier": "premium"}},
        config=config,
    )
    print(html)

asyncio.run(main())

HPRC runs summary first (because upsell includes its response), feeds the answer into upsell, and renders both — while the <prompt> blocks themselves never appear in the output.

Entry points

All three are coroutines — drive them from an event loop.

Function	Use when
`render_template(template_path, request, bindings, config)`	You have a `.sprep.html` file on disk.
`render_template_string(template_html, request, bindings, config)`	You have the template as a string.
`render_string(template, request, bindings, config)`	You already parsed a `TemplateDefinition` (e.g. to cache parsing).

request and bindings default to empty; config defaults to a HPRCConfig() (Mock client + MemoryCache).

Template syntax reference

`<prompt>` — executable, tacit

Defines a prompt to send to the model. Never rendered. Its body is the prompt text, with embedded <fill>, <param> and <include> directives resolved at render time.

all attributeshtml

<prompt
    id="summary"           <!-- required, unique within the template -->
    model="gpt-5"          <!-- passed to the LLM client -->
    condition="is_premium" <!-- named rule; prompt skipped if false -->
    temperature="0.2"      <!-- float, passed through -->
    max_tokens="500"       <!-- int, passed through -->
    async="yes"            <!-- "no" = run sequentially, not concurrently -->
    cache="24h"            <!-- TTL: s/m/h/d/w, int seconds, or "0" to disable -->
    tools="crm_lookup,pricing_engine">  <!-- allowlisted tool names -->

    Customer: <fill>customer.name</fill>
    Product:  <param>product</param>
    Summarize the account.
</prompt>

Attribute	Default	Meaning
`id`	— (required)	Links this prompt to its `<response>` and to `<include>`s.
`model`	`None`	Model identifier handed to the client.
`condition`	none	Named rule gating execution (see Rules).
`temperature`	`None`	Float generation parameter.
`max_tokens`	`None`	Integer token cap.
`async`	`no`	Sequential by default; `async="yes"` opts this prompt into concurrent execution with other async prompts in its level.
`cache`	none	TTL spec; absent/empty/`0` means no caching.
`tools`	none	Comma-separated allowlist of registered tool names.

`<response>` — placeholder, with `render`

Marks where a prompt's generated text is inserted, bound by id.

htmlhtml

<response id="summary" render="yes"></response>   <!-- visible (default) -->
<response id="summary" render="no"/>             <!-- generated but hidden -->

Hidden but available. A render="no" response is still executed and can be pulled into other prompts via <include response="…"/> — perfect for an intermediate generation you don't want on the page. Truthy values for render: yes/true/1/on; everything else is hidden.

`<fill>` — data, dot-notation

Resolves a dotted path against the bindings (and the request namespace). Missing paths resolve to an empty string. In rendered HTML the value is HTML-escaped; inside a prompt body it is inserted raw.

html → resolveshtml

<fill>customer.name</fill>            → bindings["customer"]["name"]
<fill>customer.profile.tier</fill>    → nested dict / object attribute
<fill>items.0</fill>                  → list index 0

`<param>` — request query shortcut

Shorthand for a request query parameter. These two are equivalent:

htmlhtml

<param>product</param>
<fill>request.query.product</fill>

`request` namespace

HPRC normalizes any request object into a stable shape addressable from templates:

htmlhtml

<fill>request.query.product</fill>       <!-- ?product=... -->
<fill>request.path.customer_id</fill>    <!-- /customer/{customer_id} -->
<fill>request.method</fill>              <!-- "GET" -->

Accepted request inputs: a FastAPI/Starlette Request (uses query_params/path_params/method); a plain dict {"query":…, "path":…, "method":…}; any object exposing those attributes; or None.

`<include>` — compose prompts

Pulls one prompt's output into another prompt's text. This is how you express dependencies; HPRC detects them and orders execution automatically.

htmlhtml

<include prompt="summary"/>     <!-- inserts summary's CONSTRUCTED PROMPT TEXT -->
<include response="summary"/>   <!-- inserts summary's GENERATED RESPONSE -->

Validated at parse time. An <include> pointing at an undefined prompt id raises a ValueError when the template is parsed — typos fail loudly rather than silently dropping.

Rules — conditional execution

Templates reference rules by name only — there is no expression language. You register the predicates in Python; each receives the bindings and returns a bool.

pythonpython

RULES = {
    "is_premium_customer": lambda ctx: ctx["customer"]["tier"] == "premium",
    "has_balance":         lambda ctx: ctx["account"]["balance"] > 0,
}
config = HPRCConfig(llm_client=MockLLMClient(), rules=RULES)

A rule is a per-prompt gate — it does not decide whether the renderer runs (that always happens); it decides whether one prompt runs. At the start of a render, for a prompt with condition="is_premium_customer" HPRC looks that name up in HPRCConfig.rules, calls it with the bindings dict (the function's ctx argument is your bindings, not the request), and coerces the result to a bool. Truthy → the prompt runs and its <response> is filled. Falsy → the prompt is skipped (no model call) and its <response> renders empty (and any <include response> of it resolves to ""). A blank/absent condition always runs.

Each prompt has at most one condition, evaluated independently — there is no and/or in templates. Put compound logic inside one rule function (lambda ctx: ctx["a"] and ctx["b"]); the rules dict can hold many named rules that different prompts select by name.

A missing/unregistered rule name fails loudly — the render raises RuleError naming the prompt and rule, so a typo in condition= is caught rather than silently skipped. A rule that runs but raises (e.g. the bindings lack a key it reads) is treated as "condition not met" and skips just that prompt, so a data gap degrades gracefully.

Tools — the model can call your functions

Register tools by name; a prompt opts into a subset via its tools attribute. When that prompt runs, HPRC executes a single tool iteration: it calls the model with the tools; if the model asks to call one or more, HPRC runs your registered function(s) with the model's arguments, feeds the result(s) back, and asks the model once more — that response is rendered. If the model is still calling a tool after that iteration, the prompt renders empty. Implemented for OpenAI/Ollama and Anthropic; Gemini forwards schemas but doesn't run the iteration yet. (A multi-step agent loop is on the roadmap.)

pythonpython

def crm_lookup(customer: str) -> str:
    "Look up CRM notes for a customer."      # docstring → tool description
    return f"CRM notes for {customer}"

TOOLS = {"crm_lookup": crm_lookup, "pricing_engine": pricing_engine}
config = HPRCConfig(llm_client=MockLLMClient(), tools=TOOLS)
# template:  <prompt id="s" tools="crm_lookup,pricing_engine"> ... </prompt>

A tool value may be a bare callable (its docstring becomes the description) or a fully built ToolDefinition(name, func, description, parameters).

Dependency graphs

HPRC scans each prompt body for <include> directives, builds a directed graph, and computes ordered execution levels. You never wire ordering by hand.

pythonpython

from hprc import parse, build_graph, topological_levels

tpl = parse('<prompt id="a">A</prompt>'
            '<prompt id="b">uses <include response="a"/></prompt>')
build_graph(tpl.prompts)          # {'a': set(), 'b': {'a'}}
topological_levels(build_graph(tpl.prompts))   # [['a'], ['b']]

A cycle raises DependencyError. Independent prompts land in the same level and run concurrently.

Async execution

Prompts run sequentially by default — one at a time, which is simple and predictable. To run independent prompts together, opt them in with async="yes". Within each dependency level, the renderer runs the async="yes" prompts concurrently with asyncio.gather while the rest run one by one — so a level can be sequential, concurrent, or a mix:

conceptpython

for level in topological_levels(graph):
    concurrent = []
    for pid in level:
        if prompt.is_async:        # async="yes" → run concurrently
            concurrent.append(execute(prompt))
        else:                       # default → awaited one at a time
            await execute(prompt)
    if concurrent:
        await asyncio.gather(*concurrent)

Dependent prompts always wait for the responses they include, regardless of async.

Cache support

Per-prompt caching is opt-in via the cache attribute with a human-friendly TTL.

htmlhtml

<prompt id="summary" cache="24h">...</prompt>   <!-- 30m, 24h, 2d, 1w, "3600" -->
<prompt id="live"    cache="0">...</prompt>     <!-- "0" / absent = no caching -->

The cache key is a SHA-256 over everything that can change the output: the fully-resolved prompt text (which already embeds fills, params and included responses), model, temperature, max_tokens, and the sorted tool names (so tool order doesn't matter). On a hit, the LLM is not called.

python — second render is a cache hitpython

client = MockLLMClient()
cfg = HPRCConfig(llm_client=client, cache=MemoryCache())
tpl = '<prompt id="a" cache="24h">hello</prompt><x><response id="a"/></x>'

await hprc.render_template_string(tpl, config=cfg)
await hprc.render_template_string(tpl, config=cfg)
assert len(client.calls) == 1   # second render served from cache

Backends implement the Cache ABC (async get(key), async set(key, value, ttl)):

MemoryCache — in-process TTL cache; accepts an injectable time_func for deterministic expiry in tests; has clear().
NullCache — stores nothing; every lookup misses.

The abstraction is deliberately minimal so a Redis-backed cache can be dropped in without touching the renderer.

Providers

Every backend implements one coroutine. The renderer is provider-blind.

the interfacepython

class LLMClient(ABC):
    @abstractmethod
    async def generate(self, prompt, model=None, temperature=None,
                       max_tokens=None, tools=None) -> str: ...

Client = provider, model = sub-selection. The client you put in HPRCConfig fixes the provider; a prompt's model="…" only picks the variant within it. Each provider SDK is imported lazily, so import hprc needs none of them — install only the extras you use.

Shipped providers

Client	Backend	Extra
`MockLLMClient`	deterministic, offline; echoes the request, records `.calls`; accepts a `responder`	—
`OpenAIClient(api_key, default_model)`	official async `openai` client; tools → OpenAI function-tool schemas	`openai`
`AnthropicClient(api_key, default_model, default_max_tokens)`	`anthropic` Messages API (Claude); supplies the required `max_tokens`; joins text blocks	`anthropic`
`GeminiClient(api_key, default_model)`	`google-genai` async API; params under `config`, `max_tokens` → `max_output_tokens`	`gemini`
`OllamaClient(base_url, default_model)`	local OpenAI-compatible endpoint (Ollama / LM Studio)	—
`MultiProviderClient({name: client}, default=…)`	routes by a `"provider:model"` prefix	—

pick a providerpython

from hprc import OpenAIClient, AnthropicClient, GeminiClient, OllamaClient

HPRCConfig(llm_client=OpenAIClient(api_key=os.environ["OPENAI_API_KEY"]))
HPRCConfig(llm_client=AnthropicClient(api_key=os.environ["ANTHROPIC_API_KEY"]))
HPRCConfig(llm_client=GeminiClient(api_key=os.environ["GOOGLE_API_KEY"]))
HPRCConfig(llm_client=OllamaClient(base_url="http://localhost:11434/v1",
                                   default_model="llama3"))

Routing by `model` + portable aliases

To let the model value also choose the provider, use MultiProviderClient (prefix routing) and/or model_aliases (logical names resolved before each call — and folded into the cache key):

pythonpython

from hprc import MultiProviderClient, OpenAIClient, AnthropicClient

config = HPRCConfig(
    llm_client=MultiProviderClient(
        {"openai": OpenAIClient(), "anthropic": AnthropicClient()},
        default="openai",
    ),
    model_aliases={"summarizer": "anthropic:claude-sonnet-4-6"},
)
# template:  <prompt model="summarizer">  -> routed to Anthropic as claude-sonnet-4-6
#            <prompt model="gpt-5">        -> no prefix -> default (OpenAI)

Adding another provider

Subclass LLMClient, implement generate, and lazy-import the SDK inside _get_client — no changes to the renderer or templates. The mocked conformance suite in tests/test_providers.py shows the contract every client must satisfy (request mapping + text extraction); copy a case for your adapter.

`HPRCConfig` API

One object bundles the four pluggable seams. All fields have sensible defaults.

Field	Type	Default
`llm_client`	`LLMClient`	`MockLLMClient()`
`rules`	`{name: predicate(bindings)->bool}`	`{}`
`tools`	`{name: callable \| ToolDefinition}`	`{}` (normalized on init)
`cache`	`Cache`	`MemoryCache()`
`model_aliases`	`{logical: concrete}`	`{}`

Bare callables in tools are auto-wrapped into ToolDefinitions; passing cache=None falls back to a fresh MemoryCache; model_aliases maps logical model names to concrete ones (optionally "provider:model" for a MultiProviderClient), resolved before each call.

FastAPI example

The shipped examples/fastapi_app.py shows the full developer workflow. Run it:

shellbash

pip install -e ".[fastapi]"
uvicorn examples.fastapi_app:app --reload
# open http://127.0.0.1:8000/customer/42?product=WidgetPro

the controller (essence)python

@app.get("/customer/{customer_id}", response_class=HTMLResponse)
async def customer_page(customer_id: str, request: Request):
    bindings = {
        "customer": load_customer(customer_id),
        "account":  load_account(customer_id),
    }
    config = HPRCConfig(
        llm_client=MockLLMClient(),                 # or OpenAIClient(...)
        rules={"is_premium_customer":
               lambda ctx: ctx["customer"]["tier"] == "premium"},
        tools={"crm_lookup": crm_lookup, "pricing_engine": pricing_engine},
        cache=MemoryCache(),
    )
    html = await hprc.render_template(
        template_path="examples/templates/customer.sprep.html",
        request=request, bindings=bindings, config=config,
    )
    return HTMLResponse(html)

If OPENAI_API_KEY is set, the example automatically uses OpenAIClient; otherwise it falls back to the offline mock so the demo runs with no key. Note: the developer writes no prompt orchestration — HPRC runs summary then upsell for you.

Standalone (no framework)

shellbash

python examples/standalone.py

Demonstrates the entire flow — bindings, a named rule, two dependent prompts and a hidden response — with the offline mock client.

Running tests

shellbash

pip install -e ".[dev]"
python -m pytest -q
# 66 passed

Coverage spans parsing, fill/param/rule resolution, the dependency graph, async concurrency (including async="no"), cache behaviour (including TTL expiry and cache="0"), hidden responses, tacit prompts, tool registration, and a mocked provider conformance suite (OpenAI / Anthropic / Gemini / Ollama / multi-provider routing / aliases). asyncio_mode = "auto" is set, so async tests need no decorator.

Gotchas & FAQ

Why does my prompt text appear in the output?

The <prompt> block itself is never rendered, but its <response> is — and the MockLLMClient echoes the prompt as its answer. With a real provider (or a scripted responder) you'll see the model's output instead.

Can two `<response>` elements share an id?

Yes — rendering the same response in two places is supported and simply emits the same text twice. (Duplicate prompt ids, however, are rejected.)

What happens to a skipped prompt's response?

It resolves to an empty string, so its <response> and any <include response> of it render as empty.

Is output escaped?

Values emitted by <fill>/<param> in the document are HTML-escaped. Model responses inserted at <response> are emitted as-is — sanitize upstream if responses may contain untrusted markup.

Does HPRC call my tools?

Yes — when a prompt declares tools and the model asks to call one or more, HPRC runs your registered function(s) with the model's arguments and feeds the result(s) back, then renders the model's next response. It's a single iteration, not an open-ended agent framework. (Gemini's isn't implemented yet; OpenAI/Ollama and Anthropic are. A multi-step agent loop is on the roadmap.)

Status: open source (Apache-2.0), early release v0.1.0 (Alpha). Created by Rajesh Ramani.

HPRC Framework · SPREP Templates (Simple Prompt Response Embedded Pages) · User Guide · see also Architecture & Design.

HPRC User Guide

What & why

Install

Quick start

Entry points

Template syntax reference

<prompt> — executable, tacit

<response> — placeholder, with render

<fill> — data, dot-notation

<param> — request query shortcut

request namespace

<include> — compose prompts

Rules — conditional execution

Tools — the model can call your functions

Dependency graphs

Async execution

Cache support

Providers

Shipped providers

Routing by model + portable aliases

Adding another provider

HPRCConfig API

FastAPI example

Standalone (no framework)

Running tests

Gotchas & FAQ

Why does my prompt text appear in the output?

Can two <response> elements share an id?

What happens to a skipped prompt's response?

Is output escaped?

Does HPRC call my tools?

`<prompt>` — executable, tacit

`<response>` — placeholder, with `render`

`<fill>` — data, dot-notation

`<param>` — request query shortcut

`request` namespace

`<include>` — compose prompts

Routing by `model` + portable aliases

`HPRCConfig` API

Can two `<response>` elements share an id?