Documentation
Everything you need to integrate Soto in under five minutes.
Authentication
Every API call needs a bearer API key in the Authorization header. Create keys in API Keys. Keys are shown once — save them somewhere safe.
POST /v1/embed
Returns a pooled embedding per input string (mean + max + std of chunk summaries → 576 dims). Up to 64 strings per request, up to 8,192 chars each.
curl
curl https://api.soto.dev/v1/embed \
-H "Authorization: Bearer $SOTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "transfer 500 to my savings account"}'Python
import requests, os
r = requests.post(
"https://api.soto.dev/v1/embed",
headers={"Authorization": f"Bearer {os.environ['SOTO_API_KEY']}"},
json={"text": ["hello world", "another doc"]},
)
print(r.json()) # {"embeddings": [[...], [...]], "dim": 576, "model": "soto-v8"}POST /v1/classify
Top-k intent classification. Currently supports the banking77 task (77-class banking intent). More tasks coming — LoRA adapters let us add new domains without retraining the encoder.
curl
curl https://api.soto.dev/v1/classify \
-H "Authorization: Bearer $SOTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "transfer 500 to my savings account", "top_k": 3}'JavaScript
const r = await fetch("https://api.soto.dev/v1/classify", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.SOTO_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ text: "my card was declined", top_k: 3 }),
});
const { results } = await r.json();
// [{label: "card_declined", score: 0.89}, ...]GET /v1/usage
Current month's inference count and tier limit. Use this to show users their usage or stop calls before you hit your quota.
curl -H "Authorization: Bearer $SOTO_API_KEY" https://api.soto.dev/v1/usage
# {"tier":"free","month_used":238,"month_limit":10000,"remaining":9762}Rate limits & errors
| Status | Meaning |
|---|---|
| 401 | Missing or revoked API key |
| 400 | Invalid body — see details field |
| 429 | Monthly quota exceeded — upgrade or wait for reset |
| 502 | Inference backend unavailable — retry with backoff |