User Documentation

HPRC User Guide

Everything you need to write templates that embed LLM prompts and render them to HTML — with runnable examples throughout.

What & why

Most “LLM + web” code today is imperative glue: you receive a request, hand-build prompt strings in Python, manually sequence dependent calls, await them, and splice the text into a template. The orchestration lives far from the page it produces.

HPRC inverts that. The template is the source of truth:

  • The page author writes prompts inline, next to where their output will appear.
  • The application developer supplies only data and policy — a bindings dict, named rules, allowlisted tools, an LLM provider and a cache — via one HPRCConfig.
  • HPRC does the orchestration: condition evaluation, fill resolution, dependency ordering, concurrency, caching and serialization.

Mental model. <prompt> blocks are tacit (they execute but never render). <response> placeholders are where answers appear. Everything else is ordinary HTML.

Install

Install from PyPI:

shellbash
pip install hprc-framework                # core (pydantic only) — then: import hprc
pip install "hprc-framework[anthropic]"   # + Claude provider
pip install "hprc-framework[openai]"      # + OpenAI / Ollama provider
pip install "hprc-framework[fastapi]"     # + FastAPI/uvicorn for the web example
pip install "hprc-framework[all]"         # + every provider SDK

The only hard runtime dependency is pydantic>=2; provider SDKs are optional and lazily imported. For development, clone the repo and install editable:

shellbash
git clone https://github.com/HPRCFramework/hprc-framework
cd hprc-framework
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]" && python -m pytest -q

Quick start

A complete, offline example using the deterministic MockLLMClient (no API key needed).

quickstart.pypython
import asyncio
import hprc
from hprc import HPRCConfig, MockLLMClient, MemoryCache

TEMPLATE = """<!DOCTYPE html>
<html><body>
  <h1>Hello <fill>customer.name</fill></h1>

  <prompt id="summary" model="gpt-5" condition="is_premium" cache="24h">
    Customer <fill>customer.name</fill> is interested in <param>product</param>.
    Write a one-line account summary.
  </prompt>

  <prompt id="upsell" model="gpt-5">
    Given: <include response="summary"/>
    Suggest one upsell.
  </prompt>

  <section><h2>Summary</h2><response id="summary"/></section>
  <section><h2>Upsell</h2><response id="upsell"/></section>
</body></html>"""

async def main():
    config = HPRCConfig(
        llm_client=MockLLMClient(),
        rules={"is_premium": lambda ctx: ctx["customer"]["tier"] == "premium"},
        cache=MemoryCache(),
    )
    html = await hprc.render_template_string(
        template_html=TEMPLATE,
        request={"query": {"product": "WidgetPro"}, "path": {}, "method": "GET"},
        bindings={"customer": {"name": "Ada", "tier": "premium"}},
        config=config,
    )
    print(html)

asyncio.run(main())

HPRC runs summary first (because upsell includes its response), feeds the answer into upsell, and renders both — while the <prompt> blocks themselves never appear in the output.

Entry points

All three are coroutines — drive them from an event loop.

FunctionUse when
render_template(template_path, request, bindings, config)You have a .sprep.html file on disk.
render_template_string(template_html, request, bindings, config)You have the template as a string.
render_string(template, request, bindings, config)You already parsed a TemplateDefinition (e.g. to cache parsing).

request and bindings default to empty; config defaults to a HPRCConfig() (Mock client + MemoryCache).

Template syntax reference

<prompt> — executable, tacit

Defines a prompt to send to the model. Never rendered. Its body is the prompt text, with embedded <fill>, <param> and <include> directives resolved at render time.

all attributeshtml
<prompt
    id="summary"           <!-- required, unique within the template -->
    model="gpt-5"          <!-- passed to the LLM client -->
    condition="is_premium" <!-- named rule; prompt skipped if false -->
    temperature="0.2"      <!-- float, passed through -->
    max_tokens="500"       <!-- int, passed through -->
    async="yes"            <!-- "no" = run sequentially, not concurrently -->
    cache="24h"            <!-- TTL: s/m/h/d/w, int seconds, or "0" to disable -->
    tools="crm_lookup,pricing_engine">  <!-- allowlisted tool names -->

    Customer: <fill>customer.name</fill>
    Product:  <param>product</param>
    Summarize the account.
</prompt>
AttributeDefaultMeaning
id— (required)Links this prompt to its <response> and to <include>s.
modelNoneModel identifier handed to the client.
conditionnoneNamed rule gating execution (see Rules).
temperatureNoneFloat generation parameter.
max_tokensNoneInteger token cap.
asyncnoSequential by default; async="yes" opts this prompt into concurrent execution with other async prompts in its level.
cachenoneTTL spec; absent/empty/0 means no caching.
toolsnoneComma-separated allowlist of registered tool names.

<response> — placeholder, with render

Marks where a prompt's generated text is inserted, bound by id.

htmlhtml
<response id="summary" render="yes"></response>   <!-- visible (default) -->
<response id="summary" render="no"/>             <!-- generated but hidden -->

Hidden but available. A render="no" response is still executed and can be pulled into other prompts via <include response="…"/> — perfect for an intermediate generation you don't want on the page. Truthy values for render: yes/true/1/on; everything else is hidden.

<fill> — data, dot-notation

Resolves a dotted path against the bindings (and the request namespace). Missing paths resolve to an empty string. In rendered HTML the value is HTML-escaped; inside a prompt body it is inserted raw.

html → resolveshtml
<fill>customer.name</fill>            → bindings["customer"]["name"]
<fill>customer.profile.tier</fill>    → nested dict / object attribute
<fill>items.0</fill>                  → list index 0

<param> — request query shortcut

Shorthand for a request query parameter. These two are equivalent:

htmlhtml
<param>product</param>
<fill>request.query.product</fill>

request namespace

HPRC normalizes any request object into a stable shape addressable from templates:

htmlhtml
<fill>request.query.product</fill>       <!-- ?product=... -->
<fill>request.path.customer_id</fill>    <!-- /customer/{customer_id} -->
<fill>request.method</fill>              <!-- "GET" -->

Accepted request inputs: a FastAPI/Starlette Request (uses query_params/path_params/method); a plain dict {"query":…, "path":…, "method":…}; any object exposing those attributes; or None.

<include> — compose prompts

Pulls one prompt's output into another prompt's text. This is how you express dependencies; HPRC detects them and orders execution automatically.

htmlhtml
<include prompt="summary"/>     <!-- inserts summary's CONSTRUCTED PROMPT TEXT -->
<include response="summary"/>   <!-- inserts summary's GENERATED RESPONSE -->

Validated at parse time. An <include> pointing at an undefined prompt id raises a ValueError when the template is parsed — typos fail loudly rather than silently dropping.

Rules — conditional execution

Templates reference rules by name only — there is no expression language. You register the predicates in Python; each receives the bindings and returns a bool.

pythonpython
RULES = {
    "is_premium_customer": lambda ctx: ctx["customer"]["tier"] == "premium",
    "has_balance":         lambda ctx: ctx["account"]["balance"] > 0,
}
config = HPRCConfig(llm_client=MockLLMClient(), rules=RULES)

A rule is a per-prompt gate — it does not decide whether the renderer runs (that always happens); it decides whether one prompt runs. At the start of a render, for a prompt with condition="is_premium_customer" HPRC looks that name up in HPRCConfig.rules, calls it with the bindings dict (the function's ctx argument is your bindings, not the request), and coerces the result to a bool. Truthy → the prompt runs and its <response> is filled. Falsy → the prompt is skipped (no model call) and its <response> renders empty (and any <include response> of it resolves to ""). A blank/absent condition always runs.

Each prompt has at most one condition, evaluated independently — there is no and/or in templates. Put compound logic inside one rule function (lambda ctx: ctx["a"] and ctx["b"]); the rules dict can hold many named rules that different prompts select by name.

A missing/unregistered rule name fails loudly — the render raises RuleError naming the prompt and rule, so a typo in condition= is caught rather than silently skipped. A rule that runs but raises (e.g. the bindings lack a key it reads) is treated as "condition not met" and skips just that prompt, so a data gap degrades gracefully.

Tools — the model can call your functions

Register tools by name; a prompt opts into a subset via its tools attribute. When that prompt runs, HPRC executes a single tool iteration: it calls the model with the tools; if the model asks to call one or more, HPRC runs your registered function(s) with the model's arguments, feeds the result(s) back, and asks the model once more — that response is rendered. If the model is still calling a tool after that iteration, the prompt renders empty. Implemented for OpenAI/Ollama and Anthropic; Gemini forwards schemas but doesn't run the iteration yet. (A multi-step agent loop is on the roadmap.)

pythonpython
def crm_lookup(customer: str) -> str:
    "Look up CRM notes for a customer."      # docstring → tool description
    return f"CRM notes for {customer}"

TOOLS = {"crm_lookup": crm_lookup, "pricing_engine": pricing_engine}
config = HPRCConfig(llm_client=MockLLMClient(), tools=TOOLS)
# template:  <prompt id="s" tools="crm_lookup,pricing_engine"> ... </prompt>

A tool value may be a bare callable (its docstring becomes the description) or a fully built ToolDefinition(name, func, description, parameters).

Dependency graphs

HPRC scans each prompt body for <include> directives, builds a directed graph, and computes ordered execution levels. You never wire ordering by hand.

pythonpython
from hprc import parse, build_graph, topological_levels

tpl = parse('<prompt id="a">A</prompt>'
            '<prompt id="b">uses <include response="a"/></prompt>')
build_graph(tpl.prompts)          # {'a': set(), 'b': {'a'}}
topological_levels(build_graph(tpl.prompts))   # [['a'], ['b']]

A cycle raises DependencyError. Independent prompts land in the same level and run concurrently.

Async execution

Prompts run sequentially by default — one at a time, which is simple and predictable. To run independent prompts together, opt them in with async="yes". Within each dependency level, the renderer runs the async="yes" prompts concurrently with asyncio.gather while the rest run one by one — so a level can be sequential, concurrent, or a mix:

conceptpython
for level in topological_levels(graph):
    concurrent = []
    for pid in level:
        if prompt.is_async:        # async="yes" → run concurrently
            concurrent.append(execute(prompt))
        else:                       # default → awaited one at a time
            await execute(prompt)
    if concurrent:
        await asyncio.gather(*concurrent)

Dependent prompts always wait for the responses they include, regardless of async.

Cache support

Per-prompt caching is opt-in via the cache attribute with a human-friendly TTL.

htmlhtml
<prompt id="summary" cache="24h">...</prompt>   <!-- 30m, 24h, 2d, 1w, "3600" -->
<prompt id="live"    cache="0">...</prompt>     <!-- "0" / absent = no caching -->

The cache key is a SHA-256 over everything that can change the output: the fully-resolved prompt text (which already embeds fills, params and included responses), model, temperature, max_tokens, and the sorted tool names (so tool order doesn't matter). On a hit, the LLM is not called.

python — second render is a cache hitpython
client = MockLLMClient()
cfg = HPRCConfig(llm_client=client, cache=MemoryCache())
tpl = '<prompt id="a" cache="24h">hello</prompt><x><response id="a"/></x>'

await hprc.render_template_string(tpl, config=cfg)
await hprc.render_template_string(tpl, config=cfg)
assert len(client.calls) == 1   # second render served from cache

Backends implement the Cache ABC (async get(key), async set(key, value, ttl)):

  • MemoryCache — in-process TTL cache; accepts an injectable time_func for deterministic expiry in tests; has clear().
  • NullCache — stores nothing; every lookup misses.

The abstraction is deliberately minimal so a Redis-backed cache can be dropped in without touching the renderer.

Providers

Every backend implements one coroutine. The renderer is provider-blind.

the interfacepython
class LLMClient(ABC):
    @abstractmethod
    async def generate(self, prompt, model=None, temperature=None,
                       max_tokens=None, tools=None) -> str: ...

Client = provider, model = sub-selection. The client you put in HPRCConfig fixes the provider; a prompt's model="…" only picks the variant within it. Each provider SDK is imported lazily, so import hprc needs none of them — install only the extras you use.

Shipped providers

ClientBackendExtra
MockLLMClientdeterministic, offline; echoes the request, records .calls; accepts a responder
OpenAIClient(api_key, default_model)official async openai client; tools → OpenAI function-tool schemasopenai
AnthropicClient(api_key, default_model, default_max_tokens)anthropic Messages API (Claude); supplies the required max_tokens; joins text blocksanthropic
GeminiClient(api_key, default_model)google-genai async API; params under config, max_tokensmax_output_tokensgemini
OllamaClient(base_url, default_model)local OpenAI-compatible endpoint (Ollama / LM Studio)
MultiProviderClient({name: client}, default=…)routes by a "provider:model" prefix
pick a providerpython
from hprc import OpenAIClient, AnthropicClient, GeminiClient, OllamaClient

HPRCConfig(llm_client=OpenAIClient(api_key=os.environ["OPENAI_API_KEY"]))
HPRCConfig(llm_client=AnthropicClient(api_key=os.environ["ANTHROPIC_API_KEY"]))
HPRCConfig(llm_client=GeminiClient(api_key=os.environ["GOOGLE_API_KEY"]))
HPRCConfig(llm_client=OllamaClient(base_url="http://localhost:11434/v1",
                                   default_model="llama3"))

Routing by model + portable aliases

To let the model value also choose the provider, use MultiProviderClient (prefix routing) and/or model_aliases (logical names resolved before each call — and folded into the cache key):

pythonpython
from hprc import MultiProviderClient, OpenAIClient, AnthropicClient

config = HPRCConfig(
    llm_client=MultiProviderClient(
        {"openai": OpenAIClient(), "anthropic": AnthropicClient()},
        default="openai",
    ),
    model_aliases={"summarizer": "anthropic:claude-sonnet-4-6"},
)
# template:  <prompt model="summarizer">  -> routed to Anthropic as claude-sonnet-4-6
#            <prompt model="gpt-5">        -> no prefix -> default (OpenAI)

Adding another provider

Subclass LLMClient, implement generate, and lazy-import the SDK inside _get_client — no changes to the renderer or templates. The mocked conformance suite in tests/test_providers.py shows the contract every client must satisfy (request mapping + text extraction); copy a case for your adapter.

HPRCConfig API

One object bundles the four pluggable seams. All fields have sensible defaults.

FieldTypeDefault
llm_clientLLMClientMockLLMClient()
rules{name: predicate(bindings)->bool}{}
tools{name: callable | ToolDefinition}{} (normalized on init)
cacheCacheMemoryCache()
model_aliases{logical: concrete}{}

Bare callables in tools are auto-wrapped into ToolDefinitions; passing cache=None falls back to a fresh MemoryCache; model_aliases maps logical model names to concrete ones (optionally "provider:model" for a MultiProviderClient), resolved before each call.

FastAPI example

The shipped examples/fastapi_app.py shows the full developer workflow. Run it:

shellbash
pip install -e ".[fastapi]"
uvicorn examples.fastapi_app:app --reload
# open http://127.0.0.1:8000/customer/42?product=WidgetPro
the controller (essence)python
@app.get("/customer/{customer_id}", response_class=HTMLResponse)
async def customer_page(customer_id: str, request: Request):
    bindings = {
        "customer": load_customer(customer_id),
        "account":  load_account(customer_id),
    }
    config = HPRCConfig(
        llm_client=MockLLMClient(),                 # or OpenAIClient(...)
        rules={"is_premium_customer":
               lambda ctx: ctx["customer"]["tier"] == "premium"},
        tools={"crm_lookup": crm_lookup, "pricing_engine": pricing_engine},
        cache=MemoryCache(),
    )
    html = await hprc.render_template(
        template_path="examples/templates/customer.sprep.html",
        request=request, bindings=bindings, config=config,
    )
    return HTMLResponse(html)

If OPENAI_API_KEY is set, the example automatically uses OpenAIClient; otherwise it falls back to the offline mock so the demo runs with no key. Note: the developer writes no prompt orchestration — HPRC runs summary then upsell for you.

Standalone (no framework)

shellbash
python examples/standalone.py

Demonstrates the entire flow — bindings, a named rule, two dependent prompts and a hidden response — with the offline mock client.

Running tests

shellbash
pip install -e ".[dev]"
python -m pytest -q
# 66 passed

Coverage spans parsing, fill/param/rule resolution, the dependency graph, async concurrency (including async="no"), cache behaviour (including TTL expiry and cache="0"), hidden responses, tacit prompts, tool registration, and a mocked provider conformance suite (OpenAI / Anthropic / Gemini / Ollama / multi-provider routing / aliases). asyncio_mode = "auto" is set, so async tests need no decorator.

Gotchas & FAQ

Why does my prompt text appear in the output?

The <prompt> block itself is never rendered, but its <response> is — and the MockLLMClient echoes the prompt as its answer. With a real provider (or a scripted responder) you'll see the model's output instead.

Can two <response> elements share an id?

Yes — rendering the same response in two places is supported and simply emits the same text twice. (Duplicate prompt ids, however, are rejected.)

What happens to a skipped prompt's response?

It resolves to an empty string, so its <response> and any <include response> of it render as empty.

Is output escaped?

Values emitted by <fill>/<param> in the document are HTML-escaped. Model responses inserted at <response> are emitted as-is — sanitize upstream if responses may contain untrusted markup.

Does HPRC call my tools?

Yes — when a prompt declares tools and the model asks to call one or more, HPRC runs your registered function(s) with the model's arguments and feeds the result(s) back, then renders the model's next response. It's a single iteration, not an open-ended agent framework. (Gemini's isn't implemented yet; OpenAI/Ollama and Anthropic are. A multi-step agent loop is on the roadmap.)

Status: open source (Apache-2.0), early release v0.1.0 (Alpha). Created by Rajesh Ramani.

HPRC Framework · SPREP Templates (Simple Prompt Response Embedded Pages) · User Guide · see also Architecture & Design.