Chat & LLM gateway
Two endpoints, two audiences. Don't confuse them.
| Endpoint | Auth | Used by |
|---|---|---|
POST /api/chat |
JWT | The Juglans web UI's chat panel. |
POST /api/llm/chat/completions |
jg_a_* API key |
Your code, juglans-lang's [ai.providers.juglans], anything OpenAI-compatible. |
Agent integrators want the second one. The first is documented here only so you don't waste time wiring it up — it does not accept agent API keys.
POST /api/chat (JWT only — context, not for you)
This is what the Juglans dashboard uses to drive the in-browser chat against an agent. It expects a session JWT, attaches per-conversation history, and runs the agent's configured chat workflow (chat.jg, juglans.toml, system_prompt.jgx) server-side.
If you call it with Authorization: Bearer jg_a_… you get a 401. Skip it. For programmatic chat, use the LLM proxy below.
POST /api/llm/chat/completions
OpenAI-compatible chat completions, with one important twist: the model field is an alias that the gateway resolves against a server-side registry. Juglans uses the alias to figure out which provider to call, which API key to use, and how to price the request. Callers always speak OpenAI format — even when the upstream is Claude or Gemini — and the gateway does the protocol translation transparently.
Aliases
Aliases are caller-facing names you put in the model field. They map 1-to-1 onto (provider, upstream-model, price) tuples in the gateway's registry.
| Alias | Upstream | Notes |
|---|---|---|
juglans/juglans-test |
DeepSeek deepseek-chat (OpenAI-compatible) |
Default for [ai.providers.juglans] in juglans-lang scaffolds. |
Additional aliases are platform-managed. Talk to your platform admin for the current list; if you run the admin yourself, manage them at /admin → Models.
Request
POST /api/llm/chat/completions HTTP/1.1
Host: api.juglans.ai
Authorization: Bearer jg_a_3f8a9c1...
Content-Type: application/json
{
"model": "juglans/juglans-test",
"messages": [
{ "role": "system", "content": "You are Nora, a market-making agent." },
{ "role": "user", "content": "What's the spread on BTC-PERP right now?" }
],
"temperature": 0.2,
"stream": false
}
The model field must match an alias from the registry. Unknown aliases return 400 {"error":"model not found: …"} — the platform never silently falls back to a default.
Standard OpenAI fields (temperature, top_p, max_tokens, tools, tool_choice, response_format, stop, stream) are forwarded as-is. The body shape stays OpenAI no matter what's behind the alias.
Multi-provider routing
Behind the alias the gateway can dispatch to any of three provider kinds. The wire-level handshake differs, but you never see it — translation lives inside the gateway.
| Kind | Examples | Translation |
|---|---|---|
openai_compatible |
OpenAI, DeepSeek, Moonshot, Qwen-compat, Zhipu, xAI, Mistral, local vLLM | Body passthrough; only model is rewritten to the upstream's native name. |
anthropic |
Claude 3.5 / 3.7 / 4 | System message extracted to top-level system, stop → stop_sequences, default max_tokens=4096 if omitted, Anthropic SSE events synthesized back into OpenAI chat.completion.chunk frames. |
gemini |
Google Gemini 1.5 / 2.0 / 2.5 | Roles mapped (assistant → model), system → systemInstruction, sampling fields → generationConfig, URL switched to :streamGenerateContent?alt=sse when streaming, response events synthesized back to OpenAI shape. |
Non-text capabilities (tool calls, vision, thinking blocks, prompt caching) are not translated in the current release — keep your requests text-only for now if the upstream isn't already OpenAI-compatible.
Response (non-streaming)
{
"id": "chatcmpl-7a3d4f1c",
"object": "chat.completion",
"created": 1738180800,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "BTC-PERP is showing roughly a 1 bp spread..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 67,
"total_tokens": 109
}
}
The model field in the response depends on the provider:
openai_compatible— passthrough; whatever the upstream returned (e.g.deepseek-chat).anthropic— the model name from the upstreammessagesenvelope (e.g.claude-3-5-sonnet-20241022).gemini— the upstream model name configured in the registry (Gemini's native response doesn't carry amodelfield, so we surface the alias's mapped name).
finish_reason is normalized into the OpenAI vocabulary, but the available values still depend on what the upstream emits:
| Provider | Can emit | Cannot emit |
|---|---|---|
openai_compatible |
passthrough | — |
anthropic |
stop, length, tool_calls |
content_filter |
gemini |
stop, length, content_filter |
tool_calls |
Streaming
Pass "stream": true. You get an SSE stream of chat.completion.chunk frames, terminating with data: [DONE]\n\n. The terminal frame carries usage.
curl -N https://api.juglans.ai/api/llm/chat/completions \
-H "Authorization: Bearer jg_a_3f8a9c1..." \
-H "Content-Type: application/json" \
-d '{
"model": "juglans/juglans-test",
"messages": [{"role":"user","content":"Hi"}],
"stream": true
}'
data: {"id":"...","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"...","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"...","choices":[{"delta":{},"finish_reason":"stop"}],
"usage":{"prompt_tokens":3,"completion_tokens":1,"total_tokens":4}}
data: [DONE]
Two gateway-specific notes:
- When
stream: trueis set on anopenai_compatibleupstream, the gateway auto-injectsstream_options.include_usage = trueinto the request so the terminal chunk always carriesusage. You don't need to set it yourself. (No-op for non-streaming requests and for Anthropic / Gemini, which carry usage natively.) - For Anthropic and Gemini upstreams the OpenAI-format SSE you see is synthesized from the native event stream (Anthropic
message_start/content_block_delta/message_delta/message_stop; Gemini incrementalcandidates[0].content.parts[0].textchunks). Usage arrives on the terminal chunk regardless of provider.
Usage tracking
Every successful and failed request writes a row to llm_usage_events:
agent_id, project_id, model_id, key_id,
prompt_tokens, completion_tokens, cost_usd,
streaming, upstream_status, duration_ms, created_at
cost_usd is computed at write time from the model's current input_price_per_mtok / output_price_per_mtok, so historical accounting stays stable when prices later change. There's no per-call cost field in the response yet; admins roll the events up daily / per-model from /admin → Usage. (The schema also has a request_id column reserved for upstream-supplied request ids; today's gateway leaves it NULL.)
Errors
| Status | Body | Cause |
|---|---|---|
400 |
{"error":"invalid JSON body: …"} |
Body isn't valid JSON. |
400 |
{"error":"missing or non-string `model` field"} |
Request is missing model or it isn't a string. |
400 |
{"error":"model not found: …"} |
The alias isn't in the registry (or it's disabled, or its provider is disabled). |
401 |
— | Bad / missing agent API key. |
403 |
— | Agent frozen. |
503 |
{"error":"no active upstream key for this provider"} |
Provider exists but has no active key in the pool. |
503 |
{"error":"provider kind not yet supported: …"} |
Provider's kind is recognized in the DB but not wired in the router. (Shouldn't happen in current builds.) |
502 |
{"error":"…"} |
Upstream LLM unreachable or returned a transport error. |
| upstream status | depends on provider | For anthropic / gemini upstreams, provider errors are translated into OpenAI's {"error":{"message","type"}} envelope with the upstream HTTP status preserved. For openai_compatible upstreams, the body is passed through verbatim — DeepSeek / OpenAI / etc. already speak the OpenAI error shape, so the gateway doesn't re-wrap it. |
Wiring it into juglans-lang
# juglans.toml
[ai.providers.juglans]
api_key = "jg_a_3f8a9c1..."
api_base = "https://api.juglans.ai/api/llm"
model = "juglans/juglans-test"
let answer = chat([
msg("system", "You are a helpful agent."),
msg("user", "Summarize today's positions."),
])
print(answer)
The lib points at <api_base>/chat/completions and attaches the bearer header on every call.
Wiring it into the OpenAI SDK
The gateway is wire-compatible with the OpenAI SDK — pass the alias as model.
from openai import OpenAI
client = OpenAI(
api_key="jg_a_3f8a9c1...",
base_url="https://api.juglans.ai/api/llm",
)
resp = client.chat.completions.create(
model="juglans/juglans-test", # any alias from the registry
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "jg_a_3f8a9c1...",
baseURL: "https://api.juglans.ai/api/llm",
});
const resp = await client.chat.completions.create({
model: "juglans/juglans-test",
messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);
Switching providers is a model-string change — same client, same key. If your admin registers a Claude alias as e.g. juglans/sonnet-fast, swap the model field and you get Anthropic behind the same OpenAI SDK call.