Skip to content

LLM Endpoints

Direct LLM chat with streaming support.

Stream Chat

POST /llm/stream

Stream LLM response.

bash
curl -X POST http://localhost:3000/llm/stream \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Response (SSE stream):

data: {"type":"content","text":"Hello"}
data: {"type":"content","text":"! How"}
data: {"type":"content","text":" can I"}
data: {"type":"content","text":" help?"}
data: {"type":"done"}

With System Prompt

bash
curl -X POST http://localhost:3000/llm/stream \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is TypeScript?"}
    ]
  }'

Request Parameters

ParameterTypeRequiredDescription
messagesarrayYesChat messages
modelstringNoOverride model
temperaturenumberNoSampling temperature (0-2)
maxTokensnumberNoMax response tokens
streambooleanNoEnable streaming (default: true)

Message Format

typescript
interface Message {
  role: 'system' | 'user' | 'assistant';
  content: string;
}

List Models

GET /llm/models

List available models.

bash
curl http://localhost:3000/llm/models \
  -H "Authorization: Bearer <token>"

Response:

json
{
  "models": [
    {
      "id": "gpt-4o",
      "provider": "openai",
      "name": "GPT-4o"
    },
    {
      "id": "claude-sonnet-4-20250514",
      "provider": "anthropic",
      "name": "Claude Sonnet"
    }
  ],
  "default": "claude-sonnet-4-20250514"
}

Non-Streaming

POST /llm/chat

Non-streaming chat (returns complete response).

bash
curl -X POST http://localhost:3000/llm/chat \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "stream": false
  }'

Response:

json
{
  "content": "Hello! How can I help you today?",
  "model": "claude-sonnet-4-20250514",
  "usage": {
    "inputTokens": 10,
    "outputTokens": 15
  }
}

Error Responses

Model Not Available

json
{
  "error": "Model not available",
  "message": "The requested model is not configured"
}

Rate Limited

json
{
  "error": "Rate limited",
  "message": "LLM rate limit exceeded",
  "retryAfter": 60
}

Providers

Configured in config.json:

json
{
  "agent": {
    "llm": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514",
      "apiKey": "&#123;&#123;env.ANTHROPIC_API_KEY&#125;&#125;"
    }
  }
}

Supported providers:

  • openai - GPT-4, GPT-4o, GPT-3.5
  • anthropic - Claude 3.5, Claude 3
  • google - Gemini Pro, Gemini Flash
  • lmstudio - Local models

Released under the ISC License.