Serverless Deployment Runbook
A pre-flight guide to guarantee trace delivery in short-lived v8/Lambda runtimes.
Serverless environments (Vercel Edge, AWS Lambda, Cloudflare Workers) require explicit lifecycle management to guarantee Traces and Spans are delivered to Njira before the compute instance is frozen or destroyed.
The Operational Risk
In standard containerized setups (e.g., Kubernetes), Njira flushes traces asynchronously in the background. In serverless runtimes:
- The process may terminate or freeze immediately after you return a generic HTTP response.
- Background async tasks (like sending traces) will be instantly killed in mid-flight.
- Buffered events are silently lost if not explicitly flushed.
The Solution: Explicit Flush
To prevent data loss, you must strictly command the SDK to flush() the event buffer at the end of each request lifecycle.
Pre-Flight Checklist
Before deploying a serverless handler to production, verify:
- Has an explicit
await njira.flush()or SDK Middleware been applied to the endpoint? - Is the flush wrapped in a
finallyblock to guarantee execution during errors? - Is
timeoutMs(ortimeout_ms) configured so a slow network doesn't hang the upstream response?
Implementation Reference (TypeScript)
With Middleware (Recommended)
The SDK middleware handles flushing automatically by hooking into the response lifecycle:
- Express/Fastify: Calls
trace.flush()on the responsefinishevent. - Next.js: Calls
trace.flush()immediately before returning the final response.
Manual Flush (AWS Lambda / Bare Handlers)
If you are not using standard HTTP middleware, you must instrument the handler manually.
export const handler = async (event: APIGatewayEvent) => {
try {
const result = await processEvent(event);
return { statusCode: 200, body: JSON.stringify(result) };
} finally {
// CRITICAL: Await flush before the cloud provider freezes the instance
await njira.trace.flush({ timeoutMs: 2000 });
}
};
Vercel Edge / Cloudflare Workers
Edge functions have stricter execution limits. For Cloudflare, you must inject the flush into the waitUntil context extension.
export default {
async fetch(request, env, ctx) {
const result = await handleRequest(request);
// Extend the worker lifecycle just long enough to send the telemetry
ctx.waitUntil(njira.trace.flush());
return result;
}
};
Implementation Reference (Python)
With Middleware (Recommended)
The FastAPI middleware automatically flushes after the response is produced, executing as a background task before the framework yields the ASGI thread.
Manual Flush (AWS Lambda)
def handler(event, context):
try:
result = process_event(event)
return {"statusCode": 200, "body": json.dumps(result)}
finally:
# Note: AWS Lambda runtime requires a synchronous flush
njira.flush_sync(timeout_ms=2000)
Buffer & Timeout Configuration
| Variable | Description | Default |
|---|---|---|
NJIRA_BUFFER_SIZE |
Max events before auto-flush | 100 |
NJIRA_FLUSH_INTERVAL_MS |
Auto-flush interval | 5000 |
Triage: Traces are missing in production
If your application works locally but traces disappear when deployed to the cloud, run through this triage flow:
- Verify Flush Placement: Ensure
flush()is truly the last SDK operation executed. If you log atrace.event()after flushing, it will sit in the buffer and be destroyed. - Check for Timeout Silencing: Set your
timeoutMsto2000and check your cloud provider logs (Cloudwatch / Vercel Logs). If you seeNjira Flush Timeoutwarnings, the network connection from your worker to the Njira endpoint is excessively slow or blocked by a VPC firewall. - Inspect the Finally Block: Ensure the flush resides in a
finallyblock. If your agent throws an exception, the handler might return a 500 early, bypassing a flush invocation placed at the bottom of the function.