Chat & LLM gateway

Two endpoints, two audiences. Don't confuse them.

Endpoint Auth Used by
POST /api/chat JWT The Juglans web UI's chat panel.
POST /api/llm/chat/completions jg_a_* API key Your code, juglans-lang's [ai.providers.juglans], anything OpenAI-compatible.

Agent integrators want the second one. The first is documented here only so you don't waste time wiring it up — it does not accept agent API keys.

POST /api/chat (JWT only — context, not for you)

This is what the Juglans dashboard uses to drive the in-browser chat against an agent. It expects a session JWT, attaches per-conversation history, and runs the agent's configured chat workflow (chat.jg, juglans.toml, system_prompt.jgx) server-side.

If you call it with Authorization: Bearer jg_a_… you get a 401. Skip it. For programmatic chat, use the LLM proxy below.


POST /api/llm/chat/completions

OpenAI-compatible chat completions, with one important twist: the model field is an alias that the gateway resolves against a server-side registry. Juglans uses the alias to figure out which provider to call, which API key to use, and how to price the request. Callers always speak OpenAI format — even when the upstream is Claude or Gemini — and the gateway does the protocol translation transparently.

Aliases

Aliases are caller-facing names you put in the model field. They map 1-to-1 onto (provider, upstream-model, price) tuples in the gateway's registry.

Alias Upstream Notes
juglans/juglans-test DeepSeek deepseek-chat (OpenAI-compatible) Default for [ai.providers.juglans] in juglans-lang scaffolds.

Additional aliases are platform-managed. Talk to your platform admin for the current list; if you run the admin yourself, manage them at /admin → Models.

Request

POST /api/llm/chat/completions HTTP/1.1
Host: api.juglans.ai
Authorization: Bearer jg_a_3f8a9c1...
Content-Type: application/json
{
  "model": "juglans/juglans-test",
  "messages": [
    { "role": "system", "content": "You are Nora, a market-making agent." },
    { "role": "user", "content": "What's the spread on BTC-PERP right now?" }
  ],
  "temperature": 0.2,
  "stream": false
}

The model field must match an alias from the registry. Unknown aliases return 400 {"error":"model not found: …"} — the platform never silently falls back to a default.

Standard OpenAI fields (temperature, top_p, max_tokens, tools, tool_choice, response_format, stop, stream) are forwarded as-is. The body shape stays OpenAI no matter what's behind the alias.

Multi-provider routing

Behind the alias the gateway can dispatch to any of three provider kinds. The wire-level handshake differs, but you never see it — translation lives inside the gateway.

Kind Examples Translation
openai_compatible OpenAI, DeepSeek, Moonshot, Qwen-compat, Zhipu, xAI, Mistral, local vLLM Body passthrough; only model is rewritten to the upstream's native name.
anthropic Claude 3.5 / 3.7 / 4 System message extracted to top-level system, stopstop_sequences, default max_tokens=4096 if omitted, Anthropic SSE events synthesized back into OpenAI chat.completion.chunk frames.
gemini Google Gemini 1.5 / 2.0 / 2.5 Roles mapped (assistantmodel), system → systemInstruction, sampling fields → generationConfig, URL switched to :streamGenerateContent?alt=sse when streaming, response events synthesized back to OpenAI shape.

Non-text capabilities (tool calls, vision, thinking blocks, prompt caching) are not translated in the current release — keep your requests text-only for now if the upstream isn't already OpenAI-compatible.

Response (non-streaming)

{
  "id": "chatcmpl-7a3d4f1c",
  "object": "chat.completion",
  "created": 1738180800,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "BTC-PERP is showing roughly a 1 bp spread..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 67,
    "total_tokens": 109
  }
}

The model field in the response depends on the provider:

  • openai_compatible — passthrough; whatever the upstream returned (e.g. deepseek-chat).
  • anthropic — the model name from the upstream messages envelope (e.g. claude-3-5-sonnet-20241022).
  • gemini — the upstream model name configured in the registry (Gemini's native response doesn't carry a model field, so we surface the alias's mapped name).

finish_reason is normalized into the OpenAI vocabulary, but the available values still depend on what the upstream emits:

Provider Can emit Cannot emit
openai_compatible passthrough
anthropic stop, length, tool_calls content_filter
gemini stop, length, content_filter tool_calls

Streaming

Pass "stream": true. You get an SSE stream of chat.completion.chunk frames, terminating with data: [DONE]\n\n. The terminal frame carries usage.

curl -N https://api.juglans.ai/api/llm/chat/completions \
  -H "Authorization: Bearer jg_a_3f8a9c1..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "juglans/juglans-test",
    "messages": [{"role":"user","content":"Hi"}],
    "stream": true
  }'
data: {"id":"...","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"...","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"...","choices":[{"delta":{},"finish_reason":"stop"}],
       "usage":{"prompt_tokens":3,"completion_tokens":1,"total_tokens":4}}

data: [DONE]

Two gateway-specific notes:

  1. When stream: true is set on an openai_compatible upstream, the gateway auto-injects stream_options.include_usage = true into the request so the terminal chunk always carries usage. You don't need to set it yourself. (No-op for non-streaming requests and for Anthropic / Gemini, which carry usage natively.)
  2. For Anthropic and Gemini upstreams the OpenAI-format SSE you see is synthesized from the native event stream (Anthropic message_start / content_block_delta / message_delta / message_stop; Gemini incremental candidates[0].content.parts[0].text chunks). Usage arrives on the terminal chunk regardless of provider.

Usage tracking

Every successful and failed request writes a row to llm_usage_events:

agent_id, project_id, model_id, key_id,
prompt_tokens, completion_tokens, cost_usd,
streaming, upstream_status, duration_ms, created_at

cost_usd is computed at write time from the model's current input_price_per_mtok / output_price_per_mtok, so historical accounting stays stable when prices later change. There's no per-call cost field in the response yet; admins roll the events up daily / per-model from /admin → Usage. (The schema also has a request_id column reserved for upstream-supplied request ids; today's gateway leaves it NULL.)

Errors

Status Body Cause
400 {"error":"invalid JSON body: …"} Body isn't valid JSON.
400 {"error":"missing or non-string `model` field"} Request is missing model or it isn't a string.
400 {"error":"model not found: …"} The alias isn't in the registry (or it's disabled, or its provider is disabled).
401 Bad / missing agent API key.
403 Agent frozen.
503 {"error":"no active upstream key for this provider"} Provider exists but has no active key in the pool.
503 {"error":"provider kind not yet supported: …"} Provider's kind is recognized in the DB but not wired in the router. (Shouldn't happen in current builds.)
502 {"error":"…"} Upstream LLM unreachable or returned a transport error.
upstream status depends on provider For anthropic / gemini upstreams, provider errors are translated into OpenAI's {"error":{"message","type"}} envelope with the upstream HTTP status preserved. For openai_compatible upstreams, the body is passed through verbatim — DeepSeek / OpenAI / etc. already speak the OpenAI error shape, so the gateway doesn't re-wrap it.

Wiring it into juglans-lang

# juglans.toml
[ai.providers.juglans]
api_key  = "jg_a_3f8a9c1..."
api_base = "https://api.juglans.ai/api/llm"
model    = "juglans/juglans-test"
let answer = chat([
  msg("system", "You are a helpful agent."),
  msg("user", "Summarize today's positions."),
])
print(answer)

The lib points at <api_base>/chat/completions and attaches the bearer header on every call.

Wiring it into the OpenAI SDK

The gateway is wire-compatible with the OpenAI SDK — pass the alias as model.

from openai import OpenAI

client = OpenAI(
    api_key="jg_a_3f8a9c1...",
    base_url="https://api.juglans.ai/api/llm",
)

resp = client.chat.completions.create(
    model="juglans/juglans-test",   # any alias from the registry
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "jg_a_3f8a9c1...",
  baseURL: "https://api.juglans.ai/api/llm",
});

const resp = await client.chat.completions.create({
  model: "juglans/juglans-test",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

Switching providers is a model-string change — same client, same key. If your admin registers a Claude alias as e.g. juglans/sonnet-fast, swap the model field and you get Anthropic behind the same OpenAI SDK call.