Architecture & Design

How HPRC works

A guided tour of the framework: what you provide, what HPRC does for you, how a page is rendered, and how the pieces fit together.

1 · The big picture

HPRC turns a SPREP template — an HTML page with embedded <prompt> and <response> elements — into finished HTML, running the prompts through a language model along the way. There are two roles:

You (the application): write the template, supply your data and a little configuration, and call the renderer.
HPRC (the framework): does all the orchestration — resolving data, deciding which prompts run and in what order, calling the model, caching, and assembling the final HTML.

How HPRC works: you provide a SPREP template, bindings, a web request and an HPRCConfig; the HPRC Renderer orchestrates everything — calling the language model (via an adapter) and returning the final HTML with responses inserted.

You supply data + a template + config; HPRC drives the model and returns HTML.

The key inversion: instead of writing Python that builds prompt strings and splices model output into a page, you put the prompts in the page and let HPRC run them. The <prompt> blocks are tacit — they execute but never appear in the output; only their <response> answers do.

2 · Who provides what

You provide a handful of clearly-separated things; HPRC handles the rest. The most important one for newcomers is bindings — the plain dictionary of your own data that becomes available to both the page and the prompts.

Piece	Who provides it	What it is / what it's for
SPREP template (`.sprep.html`)	You (the page author)	The HTML page with `<prompt>` (what to ask), `<response>` (where the answer goes), and `<fill>` (where data goes).
`bindings` (your data)	You (your Python code)	A plain dictionary your business logic produces — e.g. `{"customer": {"age": 55}}`. Read in both the HTML and the prompts via `<fill>customer.age</fill>`. This is how your own data gets into a page.
The web request	Your web framework	The incoming request (query string, path params, method). HPRC normalizes it into a `request.*` namespace, read via `<fill>request.query.product</fill>` or the shortcut `<param>product</param>`.
Model & settings	You (in the template)	Per prompt: `model`, `temperature`, `max_tokens` — written as attributes on each `<prompt>`.
Rules (who gets what)	You (Python functions)	Named yes/no functions registered in `HPRCConfig.rules`. A prompt references one by name in `condition="is_premium_customer"` to decide whether it runs. The logic lives in Python, not the template.
Tools (extra abilities)	You (Python functions)	Named callables registered in `HPRCConfig.tools`. A prompt lists the ones it's allowed to use in `tools="crm_lookup"`; HPRC passes those to the model.
LLM client (the model adapter)	You (in `HPRCConfig`)	The object that actually talks to a model provider. HPRC ships ready-made ones (OpenAI, Claude, Gemini, local, mock) — see §4.
Cache	You (in `HPRCConfig`)	Where prompt responses are stored to avoid repeat model calls. Defaults to an in-memory cache; pluggable (e.g. Redis).
Everything else	HPRC	Parsing, deciding which prompts run, working out their order, running them, caching, and assembling the final HTML — you don't write any of this.

3 · Inside one render (the internal mechanism)

You don't call these steps. You call one function — render_template(...) — and HPRC runs the pipeline below internally. It's documented here so you understand how a page is produced, not because you invoke any of it directly.

The render pipeline, seven steps: 1 parse the template (find prompts, responses, fills); 2 normalize the request (query/path/method); 3 evaluate rules (skip prompts whose condition is false); 4 build a dependency graph (which prompts feed which, via include); 5 order into levels (independent prompts can share a level); 6 run each prompt (resolve data, check cache, call model); 7 serialize (insert responses, drop prompts, final HTML).

One render pass, start to finish. Steps 4–6 are what remove the need for hand-written orchestration.

Parse — read the template into a tree and find every <prompt> and <response>. Ordinary HTML is kept as-is.
Normalize the request — turn whatever request object your web framework passed (or a plain dict) into a uniform request.* namespace.
Evaluate rules — for each prompt that has a condition, run the named Python rule. If it returns false, that prompt is skipped entirely.
Build a dependency graph — a prompt that pulls in another's answer (<include response="X"/>) depends on it. HPRC discovers these links automatically; you never sequence prompts by hand.
Order into levels — group prompts so that everything in a level is independent. Levels run in order; a cycle is reported as an error.
Run each prompt — fill in its data (<fill>, <param>, <include>), look in the cache, and if needed call the model through the adapter. Prompts are sequential by default; async="yes" opts them into concurrency within a level.
Serialize — walk the page and emit the final HTML: drop the <prompt> blocks, insert each answer at its <response>, and fill remaining <fill>/<param> values (HTML-escaped).

The renderer (hprc/renderer.py) coordinates all of this; everything passing through it is plain data described in hprc/models.py.

4 · LLM adapters (works with any provider)

HPRC never hard-codes a model vendor. Every provider is reached through one small interface — LLMClient — with a single method, generate(...). The renderer only ever calls that method, so it doesn't know or care which provider is behind it.

Ready-made adapters ship for the common providers; you can write your own for anything else.

One interface, any model: the Renderer calls generate(prompt, model, temperature, tools) on a single LLMClient interface. Behind it sit self-contained adapters — OpenAIClient, AnthropicClient (Claude), GeminiClient, OllamaClient (local / LM Studio), MockLLMClient (offline, for tests), and YourClient (a ~30-line custom adapter). Swap, add, or implement new providers without changing the renderer.

Each box is one self-contained adapter. The renderer talks only to the interface.

Adapter (shipped)	Talks to	Install extra
`OpenAIClient`	OpenAI (GPT models)	`pip install "hprc-framework[openai]"`
`AnthropicClient`	Anthropic (Claude)	`…[anthropic]`
`GeminiClient`	Google (Gemini)	`…[gemini]`
`OllamaClient`	local models (Ollama / LM Studio)	— (OpenAI-compatible)
`MockLLMClient`	nothing — deterministic offline echo	— (built in)
`MultiProviderClient`	routes to several of the above by a `"provider:model"` prefix	—

You pick one when you build the config — that fixes the provider; a prompt's model="…" then selects the specific model within it:

choose a providerpython

from hprc import HPRCConfig, OpenAIClient, AnthropicClient, GeminiClient

HPRCConfig(llm_client=OpenAIClient(api_key=...))      # GPT
HPRCConfig(llm_client=AnthropicClient(api_key=...))   # Claude
HPRCConfig(llm_client=GeminiClient(api_key=...))      # Gemini

Writing an adapter for another provider

Subclass LLMClient and implement one method: take HPRC's standard inputs, call the vendor's API, return the text. Lazy-import the SDK so HPRC keeps no hard dependency on it. Nothing else in HPRC changes.

my_provider.pypython

from hprc import LLMClient

class MyProviderClient(LLMClient):
    def __init__(self, api_key, default_model="my-model-1"):
        self._api_key, self._default, self._client = api_key, default_model, None

    def _get_client(self):
        if self._client is None:
            from myprovider import AsyncClient   # imported only when used
            self._client = AsyncClient(api_key=self._api_key)
        return self._client

    async def generate(self, prompt, model=None, temperature=None,
                       max_tokens=None, tools=None) -> str:
        resp = await self._get_client().complete(
            model=model or self._default, input=prompt,
            temperature=temperature, max_tokens=max_tokens,
        )
        return resp.text            # return the model's text

# then:  HPRCConfig(llm_client=MyProviderClient(api_key=...))

The contract every adapter must satisfy (request mapping + text extraction) is exercised by tests/test_providers.py against fake SDKs — copy a case for your adapter.

5 · Tools — the model can call your functions

Tools let a prompt give the model abilities it can call while the page renders — look something up, run a calculation, hit an API. The current architecture supports a single tool iteration: the model may request one or more tool calls, HPRC executes them and feeds the results back, and the model's next response is rendered. (A multi-step / agent loop is on the roadmap.)

How you wire tools

python + templatepython

def crm_lookup(customer_id: str) -> str:            # a normal Python function
    return "no open tickets, renewed last month"

config = HPRCConfig(tools={"crm_lookup": crm_lookup})   # name -> function

# in the template, a prompt allow-lists the tools it may call:
#   <prompt id="summary" tools="crm_lookup"> … </prompt>

How the loop runs (during the render)

When a prompt declares tools, HPRC calls the model with the prompt + the tool schemas (each adapter maps them to that vendor's format). Then:

Single tool iteration: HPRC sends the prompt plus available tools to the language model; the model asks HPRC to call crm_lookup(customer_id=1001); HPRC runs your registered tool function, gets the result, sends it back, and the model returns the final text. One iteration — execute the tool call(s), feed back, render the next turn.

HPRC executes the tool with the model's arguments and feeds the result back, then renders the model's final text.

If the model just answers (no tool call), that text is the response — one model call, done.
If the model calls one or more tools, HPRC runs your registered function(s) with the arguments the model supplied, returns the result(s), and asks the model once more — that next response is the answer. This is a single iteration of tool execution.
If the model is still calling a tool after that iteration, HPRC renders nothing for that prompt (an empty response) rather than show a half-finished turn.

Details worth knowing

Allow-list enforced: a prompt can only call the tools it lists. If the model asks for one that isn't allowed, HPRC returns an error string to the model (it can recover) rather than execute it.
No caching: tool-executing prompts are not cached, since tool outputs are dynamic.
Provider support: implemented for OpenAIClient (and OllamaClient) and AnthropicClient. GeminiClient forwards tool schemas but its execution loop isn't implemented yet (it falls back to a single text generation).
Single iteration, not an agent: HPRC runs exactly one round of tool execution per prompt — it does not keep looping. A multi-step agent loop and MCP support are on the roadmap.

Why bounded — render-time vs. live. HPRC's renderer is synchronous: a request comes in, the page is produced, and the HTML is returned. That render must terminate quickly and predictably — you can't return a page while a loop is still running, and an unbounded agent in the render path would blow up latency, timeouts, and server load. So the render path stays bounded (fills, prompts, a single tool iteration). Anything long-running — a widget that keeps making model/tool calls in real time — belongs to a separate execution model (a "live region" streamed over SSE/WebSocket after the page is sent), not the renderer. Those future tracks — a bounded multi-step agent loop, live/streaming regions, and progressive render — are recorded in the project roadmap.

6 · Design principles (and why they matter)

No code or expressions in templates

A SPREP template can't contain logic — there are no if statements, no expressions, no function calls. When the page needs a decision ("only run this prompt for premium customers"), the template just names a rule (condition="is_premium_customer") and you define that rule as a Python function. Tools work the same way: the template lists allowed names; the actual functions live in your app. Why: templates stay simple and safe to read or hand to a designer, and your business logic stays in Python where it can be tested.

A single tool iteration, not an agent

When a prompt lists tools, HPRC executes the ones the model calls in one iteration, feeds the results back, and renders the model's next response — it does not keep looping (see §5). It is not an autonomous agent. Why: every render terminates predictably and you stay in control. A multi-step agent loop and MCP are on the roadmap.

Layered code with no cycles

The library is organized in layers that only depend downward, so there are no circular dependencies. At the bottom are plain data structures (models.py). Small, independent services sit on top — the parser, cache, rules, tools, and the provider adapters — and each depends only on those data structures, not on each other. The renderer is the single coordinator that ties them together. Why: every piece is easy to understand, test, and swap out on its own (you can replace the cache or add a provider without touching the renderer).

7 · Module layout

Arrows point from a module to what it depends on. Notice that everything ultimately rests on the data models, and only the renderer wires the services together.

flowchart TD
    INIT["__init__.py
(public API)"] --> REN
    REN["renderer.py
orchestrator"] --> CFG["config.py
HPRCConfig"]
    REN --> PAR["parser.py"]
    REN --> DG["dependency_graph.py"]
    REN --> RUL["rules.py"]
    REN --> RC["request_context.py"]
    REN --> TLS["tools.py"]
    REN --> CA["cache.py"]
    CFG --> LLM["llm.py
(adapters)"]
    CFG --> CA
    CFG --> TLS
    PAR --> MOD["models.py
(pure data)"]
    DG --> MOD
    LLM --> MOD
    TLS --> MOD
    CA --> MOD
    RC --> MOD

Data models at the base · leaf services in the middle · the renderer on top.

8 · Worked example — two dependent prompts

A customer page that (1) summarizes the account, but only for premium customers, and (2) suggests an upsell based on that summary. The second prompt includes the first's answer, so HPRC runs them in order automatically.

customer.sprep.html (excerpt)html

<prompt id="summary" model="gpt-4o" condition="is_premium_customer"
        cache="24h" tools="crm_lookup">
  You are a concise account analyst. In two sentences, summarize this customer's
  account and flag anything notable.
  Name: <fill>customer.name</fill> · Tier: <fill>customer.tier</fill>
  Balance: <fill>account.balance</fill> · Last order: <fill>account.last_order</fill>
</prompt>

<prompt id="upsell" model="gpt-4o">
  Based on this account summary:
  <include response="summary"/>
  Recommend exactly one upsell relevant to "<param>product</param>", in one sentence.
</prompt>

<section><h2>Account summary</h2><response id="summary"/></section>
<section><h2>Suggested next step</h2><response id="upsell"/></section>

Dependency graph: summary → upsell. (If the model decides to call crm_lookup, HPRC executes it and feeds the result back before producing the summary — see §5.) Here is what happens at render time:

sequenceDiagram
    autonumber
    participant App as Your app
    participant R as HPRC Renderer
    participant Rule as is_premium_customer
    participant LLM as LLM adapter
    App->>R: render_template(template, bindings, request, config)
    R->>Rule: does this customer qualify?
    Rule-->>R: yes → run "summary"
    Note over R: order = summary, then upsell (upsell includes summary)
    R->>LLM: generate(summary prompt, model=gpt-4o, tools=[crm_lookup])
    LLM-->>R: "Premium account in good standing; large order last week."
    R->>LLM: generate(upsell prompt, with the summary text embedded)
    LLM-->>R: "Suggest the Pro plan to match their growing usage."
    R-->>App: final HTML — prompts removed, both answers inserted

HPRC discovered the order from the include — you never sequenced the calls.

Had a third, independent prompt been present, it could share summary's level and (with async="yes") run at the same time.

Roadmap

The render path stays synchronous and bounded; these are the planned additions, grouped by theme. Recently shipped: provider adapters, single-iteration tool execution, response gathering, and the context → bindings rename.

Tooling & MCP

A multi-step (bounded) agent loop — beyond today's single tool iteration.
MCP (Model Context Protocol) integration — discover & execute MCP tools via the same allow-list.
Gemini tool execution; tool results as embeddable content.

Conversational / multi-turn app owns persistence

prior_context — pass in an existing message chain (roles preserved).
<system> role — a system-message directive prepended to prompt calls.
RenderResult.messages — return the turns so the app can persist history and replay it. HPRC itself stays stateless.

Response shaping

Structured output → fills — a prompt returns JSON whose fields fill many placeholders (<fill>card.title</fill>); the template owns layout, values stay escaped.
format attribute — text (safe default), markdown (→ sanitized HTML), html (sanitized, opt-in).
<each> — iteration for rendering lists.

Data composition

Raw/trusted HTML injection (<fill raw> + sanitizer).
Async data providers in the dependency graph (DB/API fetches running alongside prompts).
retrieved_context (RAG) + a retriever seam.

Execution models — render-time vs. live

Live regions / real-time widgets — a separate post-render model: the render emits a placeholder + client hook, and a companion endpoint runs the loop and streams updates (SSE/WebSocket). See the §5 callout for why this stays off the render path.
Progressive / streaming render — flush each <response> as it completes.

Naming taxonomy (adopted): bindings = page data · prior_context = conversation · retrieved_context = RAG · request = the web request. Principle: *_context feeds the model's context window; bindings is page data.

Source map

File	Responsibility
`hprc/models.py`	Plain data structures (parsed template, prompt/response definitions, render bindings).
`hprc/parser.py`	Turn template text into that data; find prompts/responses; validate.
`hprc/request_context.py`	Normalize the web request; resolve dotted paths like `customer.age`.
`hprc/rules.py`	Look up and run named rules (no expressions in templates).
`hprc/dependency_graph.py`	Find `include` links, build the graph, order into levels.
`hprc/tools.py`	Resolve allow-listed tool names to registered functions.
`hprc/cache.py`	TTL parsing, cache keys, in-memory cache (pluggable).
`hprc/llm.py`	The `LLMClient` interface and the shipped adapters.
`hprc/config.py`	`HPRCConfig` — bundles client, rules, tools, cache.
`hprc/renderer.py`	The orchestrator + the public `render_*` entry points.

HPRC Framework · PREP Templates (Simple Prompt Response Embedded Pages) · Architecture & Design · see also the User Guide.