Documentation

Everything you need to integrate Soto in under five minutes.

Authentication

Every API call needs a bearer API key in the Authorization header. Create keys in API Keys. Keys are shown once — save them somewhere safe.

POST /v1/embed

Returns a pooled embedding per input string (mean + max + std of chunk summaries → 576 dims). Up to 64 strings per request, up to 8,192 chars each.

curl

curl https://api.soto.dev/v1/embed \
  -H "Authorization: Bearer $SOTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "transfer 500 to my savings account"}'

Python

import requests, os

r = requests.post(
    "https://api.soto.dev/v1/embed",
    headers={"Authorization": f"Bearer {os.environ['SOTO_API_KEY']}"},
    json={"text": ["hello world", "another doc"]},
)
print(r.json())  # {"embeddings": [[...], [...]], "dim": 576, "model": "soto-v8"}

POST /v1/classify

Top-k intent classification. Currently supports the banking77 task (77-class banking intent). More tasks coming — LoRA adapters let us add new domains without retraining the encoder.

curl

curl https://api.soto.dev/v1/classify \
  -H "Authorization: Bearer $SOTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "transfer 500 to my savings account", "top_k": 3}'

JavaScript

const r = await fetch("https://api.soto.dev/v1/classify", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SOTO_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ text: "my card was declined", top_k: 3 }),
});
const { results } = await r.json();
// [{label: "card_declined", score: 0.89}, ...]

GET /v1/usage

Current month's inference count and tier limit. Use this to show users their usage or stop calls before you hit your quota.

curl -H "Authorization: Bearer $SOTO_API_KEY" https://api.soto.dev/v1/usage
# {"tier":"free","month_used":238,"month_limit":10000,"remaining":9762}

Rate limits & errors

StatusMeaning
401Missing or revoked API key
400Invalid body — see details field
429Monthly quota exceeded — upgrade or wait for reset
502Inference backend unavailable — retry with backoff