User Documentation
HPRC User Guide
Everything you need to write templates that embed LLM prompts and render them to HTML — with runnable examples throughout.
What & why
Most “LLM + web” code today is imperative glue: you receive a request, hand-build prompt strings in Python, manually sequence dependent calls, await them, and splice the text into a template. The orchestration lives far from the page it produces.
HPRC inverts that. The template is the source of truth:
- The page author writes prompts inline, next to where their output will appear.
- The application developer supplies only data and policy — a
bindingsdict, named rules, allowlisted tools, an LLM provider and a cache — via oneHPRCConfig. - HPRC does the orchestration: condition evaluation, fill resolution, dependency ordering, concurrency, caching and serialization.
Mental model. <prompt> blocks are
tacit (they execute but never render). <response>
placeholders are where answers appear. Everything else is ordinary HTML.
Install
Install from PyPI:
pip install hprc-framework # core (pydantic only) — then: import hprc
pip install "hprc-framework[anthropic]" # + Claude provider
pip install "hprc-framework[openai]" # + OpenAI / Ollama provider
pip install "hprc-framework[fastapi]" # + FastAPI/uvicorn for the web example
pip install "hprc-framework[all]" # + every provider SDK
The only hard runtime dependency is pydantic>=2; provider SDKs are
optional and lazily imported. For development, clone the repo and install editable:
git clone https://github.com/HPRCFramework/hprc-framework
cd hprc-framework
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]" && python -m pytest -q
Quick start
A complete, offline example using the deterministic MockLLMClient (no API
key needed).
import asyncio
import hprc
from hprc import HPRCConfig, MockLLMClient, MemoryCache
TEMPLATE = """<!DOCTYPE html>
<html><body>
<h1>Hello <fill>customer.name</fill></h1>
<prompt id="summary" model="gpt-5" condition="is_premium" cache="24h">
Customer <fill>customer.name</fill> is interested in <param>product</param>.
Write a one-line account summary.
</prompt>
<prompt id="upsell" model="gpt-5">
Given: <include response="summary"/>
Suggest one upsell.
</prompt>
<section><h2>Summary</h2><response id="summary"/></section>
<section><h2>Upsell</h2><response id="upsell"/></section>
</body></html>"""
async def main():
config = HPRCConfig(
llm_client=MockLLMClient(),
rules={"is_premium": lambda ctx: ctx["customer"]["tier"] == "premium"},
cache=MemoryCache(),
)
html = await hprc.render_template_string(
template_html=TEMPLATE,
request={"query": {"product": "WidgetPro"}, "path": {}, "method": "GET"},
bindings={"customer": {"name": "Ada", "tier": "premium"}},
config=config,
)
print(html)
asyncio.run(main())
HPRC runs summary first (because upsell includes its
response), feeds the answer into upsell, and renders both — while the
<prompt> blocks themselves never appear in the output.
Entry points
All three are coroutines — drive them from an event loop.
| Function | Use when |
|---|---|
render_template(template_path, request, bindings, config) | You have a .sprep.html file on disk. |
render_template_string(template_html, request, bindings, config) | You have the template as a string. |
render_string(template, request, bindings, config) | You already parsed a TemplateDefinition (e.g. to cache parsing). |
request and bindings default to empty; config
defaults to a HPRCConfig() (Mock client + MemoryCache).
Template syntax reference
<prompt> — executable, tacit
Defines a prompt to send to the model. Never rendered. Its body is
the prompt text, with embedded <fill>, <param>
and <include> directives resolved at render time.
<prompt
id="summary" <!-- required, unique within the template -->
model="gpt-5" <!-- passed to the LLM client -->
condition="is_premium" <!-- named rule; prompt skipped if false -->
temperature="0.2" <!-- float, passed through -->
max_tokens="500" <!-- int, passed through -->
async="yes" <!-- "no" = run sequentially, not concurrently -->
cache="24h" <!-- TTL: s/m/h/d/w, int seconds, or "0" to disable -->
tools="crm_lookup,pricing_engine"> <!-- allowlisted tool names -->
Customer: <fill>customer.name</fill>
Product: <param>product</param>
Summarize the account.
</prompt>
| Attribute | Default | Meaning |
|---|---|---|
id | — (required) | Links this prompt to its <response> and to <include>s. |
model | None | Model identifier handed to the client. |
condition | none | Named rule gating execution (see Rules). |
temperature | None | Float generation parameter. |
max_tokens | None | Integer token cap. |
async | no | Sequential by default; async="yes" opts this prompt into concurrent execution with other async prompts in its level. |
cache | none | TTL spec; absent/empty/0 means no caching. |
tools | none | Comma-separated allowlist of registered tool names. |
<response> — placeholder, with render
Marks where a prompt's generated text is inserted, bound by id.
<response id="summary" render="yes"></response> <!-- visible (default) -->
<response id="summary" render="no"/> <!-- generated but hidden -->
Hidden but available. A render="no" response is still
executed and can be pulled into other prompts via
<include response="…"/> — perfect for an intermediate generation
you don't want on the page. Truthy values for render:
yes/true/1/on; everything else is hidden.
<fill> — data, dot-notation
Resolves a dotted path against the bindings (and the request namespace).
Missing paths resolve to an empty string. In rendered HTML the value is
HTML-escaped; inside a prompt body it is inserted raw.
<fill>customer.name</fill> → bindings["customer"]["name"]
<fill>customer.profile.tier</fill> → nested dict / object attribute
<fill>items.0</fill> → list index 0
<param> — request query shortcut
Shorthand for a request query parameter. These two are equivalent:
<param>product</param>
<fill>request.query.product</fill>
request namespace
HPRC normalizes any request object into a stable shape addressable from templates:
<fill>request.query.product</fill> <!-- ?product=... -->
<fill>request.path.customer_id</fill> <!-- /customer/{customer_id} -->
<fill>request.method</fill> <!-- "GET" -->
Accepted request inputs: a FastAPI/Starlette Request (uses
query_params/path_params/method); a plain dict
{"query":…, "path":…, "method":…}; any object exposing those attributes;
or None.
<include> — compose prompts
Pulls one prompt's output into another prompt's text. This is how you express dependencies; HPRC detects them and orders execution automatically.
<include prompt="summary"/> <!-- inserts summary's CONSTRUCTED PROMPT TEXT -->
<include response="summary"/> <!-- inserts summary's GENERATED RESPONSE -->
Validated at parse time. An <include> pointing
at an undefined prompt id raises a ValueError when the template is
parsed — typos fail loudly rather than silently dropping.
Rules — conditional execution
Templates reference rules by name only — there is no expression language. You register the predicates in Python; each receives the bindings and returns a bool.
RULES = {
"is_premium_customer": lambda ctx: ctx["customer"]["tier"] == "premium",
"has_balance": lambda ctx: ctx["account"]["balance"] > 0,
}
config = HPRCConfig(llm_client=MockLLMClient(), rules=RULES)
A rule is a per-prompt gate — it does not decide whether the
renderer runs (that always happens); it decides whether one prompt
runs. At the start of a render, for a prompt with
condition="is_premium_customer" HPRC looks that name up in
HPRCConfig.rules, calls it with the bindings dict (the
function's ctx argument is your bindings, not the request), and
coerces the result to a bool. Truthy → the prompt runs and its
<response> is filled. Falsy → the prompt is skipped
(no model call) and its <response> renders empty (and any
<include response> of it resolves to ""). A blank/absent
condition always runs.
Each prompt has at most one condition, evaluated independently — there is
no and/or in templates. Put compound logic
inside one rule function (lambda ctx: ctx["a"] and ctx["b"]); the
rules dict can hold many named rules that different prompts select by name.
A missing/unregistered rule name fails loudly — the render raises
RuleError naming the prompt and rule, so a typo in condition=
is caught rather than silently skipped. A rule that runs but raises (e.g. the
bindings lack a key it reads) is treated as "condition not met" and skips just that
prompt, so a data gap degrades gracefully.
Tools — the model can call your functions
Register tools by name; a prompt opts into a subset via its tools
attribute. When that prompt runs, HPRC executes a single tool iteration:
it calls the model with the tools; if the model asks to call one or more, HPRC runs your
registered function(s) with the model's arguments, feeds the result(s) back, and asks the
model once more — that response is rendered. If the model is still calling a tool after
that iteration, the prompt renders empty. Implemented for OpenAI/Ollama and Anthropic;
Gemini forwards schemas but doesn't run the iteration yet. (A multi-step agent loop is on
the roadmap.)
def crm_lookup(customer: str) -> str:
"Look up CRM notes for a customer." # docstring → tool description
return f"CRM notes for {customer}"
TOOLS = {"crm_lookup": crm_lookup, "pricing_engine": pricing_engine}
config = HPRCConfig(llm_client=MockLLMClient(), tools=TOOLS)
# template: <prompt id="s" tools="crm_lookup,pricing_engine"> ... </prompt>
A tool value may be a bare callable (its docstring becomes the description) or a fully
built ToolDefinition(name, func, description, parameters).
Dependency graphs
HPRC scans each prompt body for <include> directives, builds a
directed graph, and computes ordered execution levels. You never wire ordering by
hand.
from hprc import parse, build_graph, topological_levels
tpl = parse('<prompt id="a">A</prompt>'
'<prompt id="b">uses <include response="a"/></prompt>')
build_graph(tpl.prompts) # {'a': set(), 'b': {'a'}}
topological_levels(build_graph(tpl.prompts)) # [['a'], ['b']]
A cycle raises DependencyError. Independent prompts land in the same
level and run concurrently.
Async execution
Prompts run sequentially by default — one at a time, which is simple
and predictable. To run independent prompts together, opt them in with
async="yes". Within each dependency level, the renderer runs the
async="yes" prompts concurrently with asyncio.gather while the
rest run one by one — so a level can be sequential, concurrent, or a mix:
for level in topological_levels(graph):
concurrent = []
for pid in level:
if prompt.is_async: # async="yes" → run concurrently
concurrent.append(execute(prompt))
else: # default → awaited one at a time
await execute(prompt)
if concurrent:
await asyncio.gather(*concurrent)
Dependent prompts always wait for the responses they include, regardless of
async.
Cache support
Per-prompt caching is opt-in via the cache attribute with a
human-friendly TTL.
<prompt id="summary" cache="24h">...</prompt> <!-- 30m, 24h, 2d, 1w, "3600" -->
<prompt id="live" cache="0">...</prompt> <!-- "0" / absent = no caching -->
The cache key is a SHA-256 over everything that can change the output: the
fully-resolved prompt text (which already embeds fills, params and included
responses), model, temperature, max_tokens, and
the sorted tool names (so tool order doesn't matter). On a hit, the
LLM is not called.
client = MockLLMClient()
cfg = HPRCConfig(llm_client=client, cache=MemoryCache())
tpl = '<prompt id="a" cache="24h">hello</prompt><x><response id="a"/></x>'
await hprc.render_template_string(tpl, config=cfg)
await hprc.render_template_string(tpl, config=cfg)
assert len(client.calls) == 1 # second render served from cache
Backends implement the Cache ABC (async get(key),
async set(key, value, ttl)):
MemoryCache— in-process TTL cache; accepts an injectabletime_funcfor deterministic expiry in tests; hasclear().NullCache— stores nothing; every lookup misses.
The abstraction is deliberately minimal so a Redis-backed cache can be dropped in without touching the renderer.
Providers
Every backend implements one coroutine. The renderer is provider-blind.
class LLMClient(ABC):
@abstractmethod
async def generate(self, prompt, model=None, temperature=None,
max_tokens=None, tools=None) -> str: ...
Client = provider, model = sub-selection. The client you
put in HPRCConfig fixes the provider; a prompt's model="…"
only picks the variant within it. Each provider SDK is imported lazily, so
import hprc needs none of them — install only the extras you use.
Shipped providers
| Client | Backend | Extra |
|---|---|---|
MockLLMClient | deterministic, offline; echoes the request, records .calls; accepts a responder | — |
OpenAIClient(api_key, default_model) | official async openai client; tools → OpenAI function-tool schemas | openai |
AnthropicClient(api_key, default_model, default_max_tokens) | anthropic Messages API (Claude); supplies the required max_tokens; joins text blocks | anthropic |
GeminiClient(api_key, default_model) | google-genai async API; params under config, max_tokens → max_output_tokens | gemini |
OllamaClient(base_url, default_model) | local OpenAI-compatible endpoint (Ollama / LM Studio) | — |
MultiProviderClient({name: client}, default=…) | routes by a "provider:model" prefix | — |
from hprc import OpenAIClient, AnthropicClient, GeminiClient, OllamaClient
HPRCConfig(llm_client=OpenAIClient(api_key=os.environ["OPENAI_API_KEY"]))
HPRCConfig(llm_client=AnthropicClient(api_key=os.environ["ANTHROPIC_API_KEY"]))
HPRCConfig(llm_client=GeminiClient(api_key=os.environ["GOOGLE_API_KEY"]))
HPRCConfig(llm_client=OllamaClient(base_url="http://localhost:11434/v1",
default_model="llama3"))
Routing by model + portable aliases
To let the model value also choose the provider, use
MultiProviderClient (prefix routing) and/or model_aliases
(logical names resolved before each call — and folded into the cache key):
from hprc import MultiProviderClient, OpenAIClient, AnthropicClient
config = HPRCConfig(
llm_client=MultiProviderClient(
{"openai": OpenAIClient(), "anthropic": AnthropicClient()},
default="openai",
),
model_aliases={"summarizer": "anthropic:claude-sonnet-4-6"},
)
# template: <prompt model="summarizer"> -> routed to Anthropic as claude-sonnet-4-6
# <prompt model="gpt-5"> -> no prefix -> default (OpenAI)
Adding another provider
Subclass LLMClient, implement generate, and lazy-import the SDK
inside _get_client — no changes to the renderer or templates. The mocked
conformance suite in tests/test_providers.py shows the contract every
client must satisfy (request mapping + text extraction); copy a case for your adapter.
HPRCConfig API
One object bundles the four pluggable seams. All fields have sensible defaults.
| Field | Type | Default |
|---|---|---|
llm_client | LLMClient | MockLLMClient() |
rules | {name: predicate(bindings)->bool} | {} |
tools | {name: callable | ToolDefinition} | {} (normalized on init) |
cache | Cache | MemoryCache() |
model_aliases | {logical: concrete} | {} |
Bare callables in tools are auto-wrapped into ToolDefinitions;
passing cache=None falls back to a fresh MemoryCache;
model_aliases maps logical model names to concrete ones (optionally
"provider:model" for a MultiProviderClient), resolved before
each call.
FastAPI example
The shipped examples/fastapi_app.py shows the full developer workflow.
Run it:
pip install -e ".[fastapi]"
uvicorn examples.fastapi_app:app --reload
# open http://127.0.0.1:8000/customer/42?product=WidgetPro
@app.get("/customer/{customer_id}", response_class=HTMLResponse)
async def customer_page(customer_id: str, request: Request):
bindings = {
"customer": load_customer(customer_id),
"account": load_account(customer_id),
}
config = HPRCConfig(
llm_client=MockLLMClient(), # or OpenAIClient(...)
rules={"is_premium_customer":
lambda ctx: ctx["customer"]["tier"] == "premium"},
tools={"crm_lookup": crm_lookup, "pricing_engine": pricing_engine},
cache=MemoryCache(),
)
html = await hprc.render_template(
template_path="examples/templates/customer.sprep.html",
request=request, bindings=bindings, config=config,
)
return HTMLResponse(html)
If OPENAI_API_KEY is set, the example automatically uses
OpenAIClient; otherwise it falls back to the offline mock so the demo runs
with no key. Note: the developer writes no prompt orchestration —
HPRC runs summary then upsell for you.
Standalone (no framework)
python examples/standalone.py
Demonstrates the entire flow — bindings, a named rule, two dependent prompts and a hidden response — with the offline mock client.
Running tests
pip install -e ".[dev]"
python -m pytest -q
# 66 passed
Coverage spans parsing, fill/param/rule resolution, the dependency graph, async
concurrency (including async="no"), cache behaviour (including TTL
expiry and cache="0"), hidden responses, tacit prompts, tool
registration, and a mocked provider conformance suite (OpenAI / Anthropic / Gemini /
Ollama / multi-provider routing / aliases). asyncio_mode = "auto" is set, so async tests need no
decorator.
Gotchas & FAQ
Why does my prompt text appear in the output?
The <prompt> block itself is never rendered, but its
<response> is — and the MockLLMClient echoes the
prompt as its answer. With a real provider (or a scripted
responder) you'll see the model's output instead.
Can two <response> elements share an id?
Yes — rendering the same response in two places is supported and simply emits the same text twice. (Duplicate prompt ids, however, are rejected.)
What happens to a skipped prompt's response?
It resolves to an empty string, so its <response> and any
<include response> of it render as empty.
Is output escaped?
Values emitted by <fill>/<param> in the document
are HTML-escaped. Model responses inserted at <response> are emitted
as-is — sanitize upstream if responses may contain untrusted markup.
Does HPRC call my tools?
Yes — when a prompt declares tools and the model asks to call one or more, HPRC runs your registered function(s) with the model's arguments and feeds the result(s) back, then renders the model's next response. It's a single iteration, not an open-ended agent framework. (Gemini's isn't implemented yet; OpenAI/Ollama and Anthropic are. A multi-step agent loop is on the roadmap.)
Status: open source (Apache-2.0), early release v0.1.0 (Alpha). Created by Rajesh Ramani.