Model Cascade

Optimize performance and cost with tiered model selection.

Njira-AI supports a tiered model selection strategy, allowing you to choose the right balance of speed, cost, and intelligence for your policy evaluations.

Tiers

We offer three tiers of model capability. The specific models backing these tiers are updated regularly to the latest state-of-the-art versions.

Tier Description Typical Use Case Underlying Models (Example)
FAST High speed, low latency, lower cost. Real-time chat filtering, simple PII detection. gpt-5-mini, claude-haiku-4.5
STANDARD (Default) Balanced performance and intelligence. Complex policy logic, context-aware safety. gpt-5.2, claude-sonnet-4.5
STRONG Maximum reasoning capability. Nuanced threat detection, security auditing. gpt-5.2-pro, claude-opus-4.5

Selecting a Tier

You can specify the desired tier for each request using a custom header or metadata.

Via Headers (Proxy Mode)

When using the Njira Gateway proxy, pass the X-Njira-Tier header:

curl https://gateway.njira.ai/v1/chat/completions \
  -H "Authorization: Bearer <your-key>" \
  -H "X-Njira-Tier: fast" \
  ...

Via SDK

If you are using the internal SDK or integrating directly with the API, include the tier in the metadata dictionary:

response = njira.audit(
    content="...",
    metadata={"tier": "strong"}
)

Default Behavior

If no tier is specified, the system defaults to STANDARD. This ensures a safe baseline for all traffic.

Provider Specifics

The actual model used depends on the configured LLM provider for your organization.

  • OpenAI: Maps to gpt-5-mini, gpt-5.2, gpt-5.2-pro.
  • Anthropic: Maps to claude-haiku, claude-sonnet, claude-opus.
  • Gemini: Maps to gemini-3-flash, gemini-3-pro, gemini-3-pro.