Chat & LLM gateway

Two endpoints, two audiences. Don't confuse them.

Endpoint	Auth	Used by
`POST /api/chat`	JWT	The Juglans web UI's chat panel.
`POST /api/llm/chat/completions`	`jg_a_*` API key	Your code, juglans-lang's `[ai.providers.juglans]`, anything OpenAI-compatible.

Agent integrators want the second one. The first is documented here only so you don't waste time wiring it up — it does not accept agent API keys.

`POST /api/chat` (JWT only — context, not for you)

This is what the Juglans dashboard uses to drive the in-browser chat against an agent. It expects a session JWT, attaches per-conversation history, and runs the agent's configured chat workflow (chat.jg, juglans.toml, system_prompt.jgx) server-side.

If you call it with Authorization: Bearer jg_a_… you get a 401. Skip it. For programmatic chat, use the LLM proxy below.

`POST /api/llm/chat/completions`

OpenAI-compatible chat completions, with one important twist: the model field is an alias that the gateway resolves against a server-side registry. Juglans uses the alias to figure out which provider to call, which API key to use, and how to price the request. Callers always speak OpenAI format — even when the upstream is Claude or Gemini — and the gateway does the protocol translation transparently.

Aliases

Aliases are caller-facing names you put in the model field. They map 1-to-1 onto (provider, upstream-model, price) tuples in the gateway's registry.

Alias	Upstream	Notes
`juglans/juglans-test`	DeepSeek `deepseek-chat` (OpenAI-compatible)	Default for `[ai.providers.juglans]` in juglans-lang scaffolds.

Additional aliases are platform-managed. Talk to your platform admin for the current list; if you run the admin yourself, manage them at /admin → Models.

Request

POST /api/llm/chat/completions HTTP/1.1
Host: api.juglans.ai
Authorization: Bearer jg_a_3f8a9c1...
Content-Type: application/json

{
  "model": "juglans/juglans-test",
  "messages": [
    { "role": "system", "content": "You are Nora, a market-making agent." },
    { "role": "user", "content": "What's the spread on BTC-PERP right now?" }
  ],
  "temperature": 0.2,
  "stream": false
}

The model field must match an alias from the registry. Unknown aliases return 400 {"error":"model not found: …"} — the platform never silently falls back to a default.

Standard OpenAI fields (temperature, top_p, max_tokens, tools, tool_choice, response_format, stop, stream) are forwarded as-is. The body shape stays OpenAI no matter what's behind the alias.

Multi-provider routing

Behind the alias the gateway can dispatch to any of three provider kinds. The wire-level handshake differs, but you never see it — translation lives inside the gateway.

Kind	Examples	Translation
`openai_compatible`	OpenAI, DeepSeek, Moonshot, Qwen-compat, Zhipu, xAI, Mistral, local vLLM	Body passthrough; only `model` is rewritten to the upstream's native name.
`anthropic`	Claude 3.5 / 3.7 / 4	System message extracted to top-level `system`, `stop` → `stop_sequences`, default `max_tokens=4096` if omitted, Anthropic SSE events synthesized back into OpenAI `chat.completion.chunk` frames.
`gemini`	Google Gemini 1.5 / 2.0 / 2.5	Roles mapped (`assistant` → `model`), system → `systemInstruction`, sampling fields → `generationConfig`, URL switched to `:streamGenerateContent?alt=sse` when streaming, response events synthesized back to OpenAI shape.

Non-text capabilities (tool calls, vision, thinking blocks, prompt caching) are not translated in the current release — keep your requests text-only for now if the upstream isn't already OpenAI-compatible.

Response (non-streaming)

{
  "id": "chatcmpl-7a3d4f1c",
  "object": "chat.completion",
  "created": 1738180800,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "BTC-PERP is showing roughly a 1 bp spread..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 67,
    "total_tokens": 109
  }
}

The model field in the response depends on the provider:

openai_compatible — passthrough; whatever the upstream returned (e.g. deepseek-chat).
anthropic — the model name from the upstream messages envelope (e.g. claude-3-5-sonnet-20241022).
gemini — the upstream model name configured in the registry (Gemini's native response doesn't carry a model field, so we surface the alias's mapped name).

finish_reason is normalized into the OpenAI vocabulary, but the available values still depend on what the upstream emits:

Provider	Can emit	Cannot emit
`openai_compatible`	passthrough	—
`anthropic`	`stop`, `length`, `tool_calls`	`content_filter`
`gemini`	`stop`, `length`, `content_filter`	`tool_calls`

Streaming

Pass "stream": true. You get an SSE stream of chat.completion.chunk frames, terminating with data: [DONE]\n\n. The terminal frame carries usage.

curl -N https://api.juglans.ai/api/llm/chat/completions \
  -H "Authorization: Bearer jg_a_3f8a9c1..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "juglans/juglans-test",
    "messages": [{"role":"user","content":"Hi"}],
    "stream": true
  }'

data: {"id":"...","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"...","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"...","choices":[{"delta":{},"finish_reason":"stop"}],
       "usage":{"prompt_tokens":3,"completion_tokens":1,"total_tokens":4}}

data: [DONE]

Two gateway-specific notes:

When stream: true is set on an openai_compatible upstream, the gateway auto-injects stream_options.include_usage = true into the request so the terminal chunk always carries usage. You don't need to set it yourself. (No-op for non-streaming requests and for Anthropic / Gemini, which carry usage natively.)
For Anthropic and Gemini upstreams the OpenAI-format SSE you see is synthesized from the native event stream (Anthropic message_start / content_block_delta / message_delta / message_stop; Gemini incremental candidates[0].content.parts[0].text chunks). Usage arrives on the terminal chunk regardless of provider.

Usage tracking

Every successful and failed request writes a row to llm_usage_events:

agent_id, project_id, model_id, key_id,
prompt_tokens, completion_tokens, cost_usd,
streaming, upstream_status, duration_ms, created_at

cost_usd is computed at write time from the model's current input_price_per_mtok / output_price_per_mtok, so historical accounting stays stable when prices later change. There's no per-call cost field in the response yet; admins roll the events up daily / per-model from /admin → Usage. (The schema also has a request_id column reserved for upstream-supplied request ids; today's gateway leaves it NULL.)

Errors

Status	Body	Cause
`400`	`{"error":"invalid JSON body: …"}`	Body isn't valid JSON.
`400`	{"error":"missing or non-string `model` field"}	Request is missing `model` or it isn't a string.
`400`	`{"error":"model not found: …"}`	The alias isn't in the registry (or it's disabled, or its provider is disabled).
`401`	—	Bad / missing agent API key.
`403`	—	Agent frozen.
`503`	`{"error":"no active upstream key for this provider"}`	Provider exists but has no `active` key in the pool.
`503`	`{"error":"provider kind not yet supported: …"}`	Provider's `kind` is recognized in the DB but not wired in the router. (Shouldn't happen in current builds.)
`502`	`{"error":"…"}`	Upstream LLM unreachable or returned a transport error.
upstream status	depends on provider	For `anthropic` / `gemini` upstreams, provider errors are translated into OpenAI's `{"error":{"message","type"}}` envelope with the upstream HTTP status preserved. For `openai_compatible` upstreams, the body is passed through verbatim — DeepSeek / OpenAI / etc. already speak the OpenAI error shape, so the gateway doesn't re-wrap it.

Wiring it into juglans-lang

# juglans.toml
[ai.providers.juglans]
api_key  = "jg_a_3f8a9c1..."
api_base = "https://api.juglans.ai/api/llm"
model    = "juglans/juglans-test"

let answer = chat([
  msg("system", "You are a helpful agent."),
  msg("user", "Summarize today's positions."),
])
print(answer)

The lib points at <api_base>/chat/completions and attaches the bearer header on every call.

Wiring it into the OpenAI SDK

The gateway is wire-compatible with the OpenAI SDK — pass the alias as model.

from openai import OpenAI

client = OpenAI(
    api_key="jg_a_3f8a9c1...",
    base_url="https://api.juglans.ai/api/llm",
)

resp = client.chat.completions.create(
    model="juglans/juglans-test",   # any alias from the registry
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "jg_a_3f8a9c1...",
  baseURL: "https://api.juglans.ai/api/llm",
});

const resp = await client.chat.completions.create({
  model: "juglans/juglans-test",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

Switching providers is a model-string change — same client, same key. If your admin registers a Claude alias as e.g. juglans/sonnet-fast, swap the model field and you get Anthropic behind the same OpenAI SDK call.