Aim 🥅
Understand what OpenTelemetry is and how can I use it in my code. Most iterated example: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel
TL;DR 🩳
- OTEL is cool to generate high quality data for troubleshooting.
- Generally, use of logs & traces (& spans) is sufficient.
- Use SDK approach, else just follow this for CloudFlare workers: https://github.com/kubeshop/tracetest/blob/main/examples/testing-cloudflare-workers/src/index.ts
- Use centralized semantics for better logs/spans: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel
What's on the plate 🍽?
- Why do we need OTEL?
- Problem: Too many services and logs are inconsistent, how to ensure from where some issue (error, latency, anything else) is arising from.
- The solution is better OBSERVABILITY! (or Telemetry )
- Observability has 3 parts (SIgnal):
- Logs: text records of events for specific occurrences (like payment made, or search done, or history fetching). Don't concern anything about compute resources or something of that sort.
- Metrics: Numerical data of system state/resources
- Tracing: Tracks request across services, most detailed in localising source of error. Each trace will have multiple spans, I would define it as single unit of work. A span can have many child spans, so image a tree, root node is start of trace and end is end of trace, each node is a span.
- OTeL is universal standard for telemetry, it provides high quality data.
- How to get this done? Heart of OTeL has 3 pars:
- Signals: Data points collected by OTeL: Logs, Metrics & Tracing
- Context: Info about env & conditions for signals: when, where and what happened.
- Semantic conventions: Standard naming for telemetry data. Consistent definitions of things.
- OTeL just ensures that data collected is high quality data.
- Instrumentation: adding observability code to app.
- OTeL has:
- SDK: libraries for code, for py, ts etc, talks to the API
- API: providing a way to instrument the code with traces, logs & metrics
- Exporters: Send data to observability backend like Baselime
- Automatic instrumentation: set automatic traces
How to use?
Best resource: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel
- Important to start sdk to use our custom instrumentation.
- Perhaps we should keep separate module for telemetry configs, so instrumentation is separated from main code.
Getting started (auto instrumentation):
ref: https://opentelemetry.io/docs/languages/js/getting-started/nodejs/
- just add these packages:
npm install @opentelemetry/sdk-node \
@opentelemetry/api \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/sdk-metrics \
@opentelemetry/sdk-trace-node
- Basic instrumentation setup (
instrumentation.ts
):
/*instrumentation.ts*/
import { NodeSDK } from '@opentelemetry/sdk-node';
import { ConsoleSpanExporter } from '@opentelemetry/sdk-trace-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import {
PeriodicExportingMetricReader,
ConsoleMetricExporter,
} from '@opentelemetry/sdk-metrics';
const sdk = new NodeSDK({
traceExporter: new ConsoleSpanExporter(),
metricReader: new PeriodicExportingMetricReader({
exporter: new ConsoleMetricExporter(),
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
- Have a basic endpoint in TS code (
app.ts
):
/*app.ts*/
import express, { Express } from 'express';
import { rollTheDice } from './dice';
const PORT: number = parseInt(process.env.PORT || '8080');
const app: Express = express();
app.get('/rolldice', (req, res) => {
const rolls = req.query.rolls ? parseInt(req.query.rolls.toString()) : NaN;
if (isNaN(rolls)) {
res
.status(400)
.send("Request parameter 'rolls' is missing or not a number.");
return;
}
res.send(JSON.stringify(rollTheDice(rolls, 1, 6)));
});
app.listen(PORT, () => {
console.log(`Listening for requests on http://localhost:${PORT}`);
});
- Run the main code with
-require
flag
$ npx ts-node --require ./instrumentation.ts app.ts
Listening for requests on http://localhost:8080
Manual instrumentation:
ref: https://opentelemetry.io/docs/languages/js/instrumentation/
- Post creating a new project, add these libraries
npm install @opentelemetry/api @opentelemetry/resources @opentelemetry/semantic-conventions
npm install @opentelemetry/sdk-node
- "Before any other module in your application is loaded, you must initialize the SDK. If you fail to initialize the SDK or initialize it too late, no-op implementations will be provided to any library that acquires a tracer or meter from the API."
/*instrumentation.ts*/
import { NodeSDK } from '@opentelemetry/sdk-node';
import { ConsoleSpanExporter } from '@opentelemetry/sdk-trace-node';
import {
PeriodicExportingMetricReader,
ConsoleMetricExporter,
} from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import {
ATTR_SERVICE_NAME,
ATTR_SERVICE_VERSION,
} from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[ATTR_SERVICE_NAME]: 'yourServiceName',
[ATTR_SERVICE_VERSION]: '1.0',
}),
traceExporter: new ConsoleSpanExporter(),
metricReader: new PeriodicExportingMetricReader({
exporter: new ConsoleMetricExporter(),
}),
});
sdk.start();
- Just run it post that:
npx ts-node --require ./instrumentation.ts app.ts
- We already have tracer ready from the above steps.
- By default
BatchSpanProcessor
is used, processes spans in batches before they are exported to a tool like Baselime. SimpleSpanProcessor
: processes spans as they are created. If you create 5 spans, each will be processed and exported before the next span is created in code.
- By default
In most cases, stick with
BatchSpanProcessor
overSimpleSpanProcessor
.
Acquiring a tracer
Best practices:
- Logging:
- use structured JSON
- add trace and span ids
- Semantic conventions:
- operation.resource:
db.save_post
,api.get_news
- Throw exceptions well (https://opentelemetry.io/docs/specs/otel/trace/exceptions/):
- operation.resource:
Span span = myTracer.startSpan(/*...*/);
try {
// Code that does the actual work which the Span represents
} catch (Throwable e) {
span.recordException(e);
span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName())
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
Best example with cloudflare workers:
My example code: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel ref: https://github.com/kubeshop/tracetest/blob/main/examples/testing-cloudflare-workers/src/index.ts
import { trace, SpanStatusCode } from '@opentelemetry/api'
import { instrument, ResolveConfigFn } from '@microlabs/otel-cf-workers'
const tracer = trace.getTracer('pokemon-api')
export interface Env {
DB: D1Database
TRACETEST_URL: string
}
export async function addPokemon(pokemon: any, env: Env) {
return await env.DB.prepare(
"INSERT INTO Pokemon (name) VALUES (?) RETURNING *"
).bind(pokemon.name).all()
}
export async function getPokemon(pokemon: any, env: Env) {
return await env.DB.prepare(
"SELECT * FROM Pokemon WHERE id = ?;"
).bind(pokemon.id).all();
}
async function formatPokeApiResponse(response: any) {
const { headers } = response
const contentType = headers.get("content-type") || ""
if (contentType.includes("application/json")) {
const data = await response.json()
const { name, id } = data
// Add manual instrumentation
const span = trace.getActiveSpan()
if(span) {
span.setStatus({ code: SpanStatusCode.OK, message: String("Pokemon fetched successfully!") })
span.setAttribute('pokemon.name', name)
span.setAttribute('pokemon.id', id)
}
return { name, id }
}
return response.text()
}
const handler = {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
try {
const { pathname, searchParams } = new URL(request.url)
// Import a Pokemon
if (pathname === "/api/pokemon" && request.method === "POST") {
const queryId = searchParams.get('id')
const requestUrl = `https://pokeapi.co/api/v2/pokemon/${queryId || '6'}`
const response = await fetch(requestUrl)
const resPokemon = await formatPokeApiResponse(response)
// Add manual instrumentation
return tracer.startActiveSpan('D1: Add Pokemon', async (span) => {
const addedPokemon = await addPokemon(resPokemon, env)
span.setStatus({ code: SpanStatusCode.OK, message: String("Pokemon added successfully!") })
span.setAttribute('pokemon.name', String(addedPokemon?.results[0].name))
span.end()
return Response.json(addedPokemon)
})
}
return new Response("Hello Worker!")
} catch (err) {
return new Response(String(err))
}
},
}
const config: ResolveConfigFn = (env: Env, _trigger) => {
return {
exporter: {
url: env.TRACETEST_URL,
headers: { },
},
service: { name: 'pokemon-api' },
}
}
export default instrument(handler, config)
References 📘
- What is OpenTelemetry? - Explanation and Demo: https://www.youtube.com/watch?v=LzLULxhyIpU&t=259s
- Best practices - Grafana: https://grafana.com/blog/2023/12/18/opentelemetry-best-practices-a-users-guide-to-getting-started-with-opentelemetry/
- Semantic conventions, too much data here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/README.md
- Nodejs examples: https://opentelemetry.io/docs/languages/js/getting-started/nodejs/
- This looks neat: https://tracetest.io/blog/crafting-observable-cloudflare-workers-with-opentelemetry
- git for this: https://github.com/kubeshop/tracetest/tree/main/examples/testing-cloudflare-workers
- Exceptions: https://opentelemetry.io/docs/specs/otel/trace/exceptions/
- In depth in logs, traces and spans with example code: https://www.youtube.com/watch?v=NbVVZlSsvvM
- Final app: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel