Observability & Telemetry Manual

An implementation guide for capturing LLM spans, tool executions, and custom policy events.

Njira Tracing provides end-to-end observability for your AI agents. This guide details how to instrument your application to capture every LLM call, tool invocation, and routing decision in a structured format.

Span Taxonomy

The core SDK provides a span API. If you are using framework adapters (e.g., LangChain, CrewAI), these frameworks automatically map their internal events into Njira's structured spans:

Span Type Operational Purpose Example Payload
llm Auditing LLM inputs and outputs. Chat completions, embeddings
tool Auditing capability execution. Web search, database queries
chain Grouping multi-step reasoning workflows. ReAct loops, complex workflows
retriever Auditing RAG context injection. Vector search results
custom Tracking user-defined business logic. Authentication checks, payment processing

Implementation Reference (TypeScript)

When manually creating spans, wrap your execution in a try/finally block to guarantee the span is closed, even during crashes.

const spanId = njira.trace.startSpan({
  name: "llm-call",
  type: "llm",
  input: { prompt },
  tags: { model: "gpt-5.2" },
});

try {
  const output = await callLLM(prompt);
  // Attach the output and token metrics before closing
  njira.trace.endSpan(spanId, { output, metrics: { tokens: 150 } });
} catch (err) {
  // CRITICAL: Record the exception before throwing upstream
  njira.trace.error(spanId, err as Error);
  throw err;
}

Implementation Reference (Python)

span_id = njira.start_span(
    name="llm-call",
    span_type="llm",
    input_data={"prompt": prompt},
    tags={"model": "gpt-5.2"},
)

try:
    output = await call_llm(prompt)
    njira.end_span(span_id, output=output, metrics={"tokens": 150})
except Exception as e:
    njira.span_error(span_id, e)
    raise

Custom Telemetry Events

Use events for key milestones and debug breadcrumbs that don't warrant measuring duration (a full span). This is your primary operator tool for adding context to Traces.

njira.trace.event("policy_decision", { 
  verdict: "allow", 
  policy: "payments_guard",
  latency_ms: 12 
});

Common operator use cases:

  • Logging manual shadow policy evaluations.
  • Recording dynamic tool routing decisions ("Agent chose to use Web Search").
  • Tagging user-visible rejection reasons ("Why was this user blocked?").
  • Marking checkpoint states in long-running async chains.

Injecting Metrics

Attach numeric metrics to spans to populate dashboards and trigger alerts.

njira.trace.endSpan(spanId, {
  output: result,
  metrics: {
    latency_ms: 245,
    input_tokens: 100,
    output_tokens: 50,
    cost_usd: 0.0015
  }
});

Operational Triage in the Console

Navigate to the Traces tab in the Njira Console to investigate agent behavior:

  1. Search by request_id, user_id, or time range to locate a specific session.
  2. Drill down into individual traces to view the parent/child span tree.
  3. Inspect payloads: View exact inputs, outputs, and enforcement verdicts for each span.
  4. Replay: Run a historical trace through a new draft policy version (see the Policy Management runbook).

Instrumentation Best Practices

  • Connect your Context: Ensure you have configured context-propagation.md correctly, or your spans will appear orphaned and useless during an incident.
  • Use deterministic names: Span names like execute_stripe_charge are searchable; tool-1 is not.
  • Tag liberally: Prepend tenant_id or environment to spans for easier aggregate filtering.
  • Truncate extreme payloads: Do not pass multi-megabyte base64 images into input; truncate them or pass metadata to avoid exceeding trace storage limits.