Pranav Soni

Entrepreneur | Software Engineer | Learner

Exploring OpenTelemetry

Published on 2025-03-08
cloudflareinfrastructureobservabilityTypeScript

Aim 🥅

Understand what OpenTelemetry is and how can I use it in my code. Most iterated example: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel

TL;DR 🩳

  1. OTEL is cool to generate high quality data for troubleshooting.
  2. Generally, use of logs & traces (& spans) is sufficient.
  3. Use SDK approach, else just follow this for CloudFlare workers: https://github.com/kubeshop/tracetest/blob/main/examples/testing-cloudflare-workers/src/index.ts
  4. Use centralized semantics for better logs/spans: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel

What's on the plate 🍽?

  • Why do we need OTEL?
    • Problem: Too many services and logs are inconsistent, how to ensure from where some issue (error, latency, anything else) is arising from.
    • The solution is better OBSERVABILITY! (or Telemetry )
  • Observability has 3 parts (SIgnal):
    • Logs: text records of events for specific occurrences (like payment made, or search done, or history fetching). Don't concern anything about compute resources or something of that sort.
    • Metrics: Numerical data of system state/resources
    • Tracing: Tracks request across services, most detailed in localising source of error. Each trace will have multiple spans, I would define it as single unit of work. A span can have many child spans, so image a tree, root node is start of trace and end is end of trace, each node is a span.
  • OTeL is universal standard for telemetry, it provides high quality data.
  • How to get this done? Heart of OTeL has 3 pars:
    • Signals: Data points collected by OTeL: Logs, Metrics & Tracing
    • Context: Info about env & conditions for signals: when, where and what happened.
    • Semantic conventions: Standard naming for telemetry data. Consistent definitions of things.
  • OTeL just ensures that data collected is high quality data.
  • Instrumentation: adding observability code to app.
  • OTeL has:
    • SDK: libraries for code, for py, ts etc, talks to the API
    • API: providing a way to instrument the code with traces, logs & metrics
    • Exporters: Send data to observability backend like Baselime
  • Automatic instrumentation: set automatic traces

How to use?

Best resource: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel

  • Important to start sdk to use our custom instrumentation.
  • Perhaps we should keep separate module for telemetry configs, so instrumentation is separated from main code.

Getting started (auto instrumentation):

ref: https://opentelemetry.io/docs/languages/js/getting-started/nodejs/

  1. just add these packages:
npm install @opentelemetry/sdk-node \
  @opentelemetry/api \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/sdk-metrics \
  @opentelemetry/sdk-trace-node
  1. Basic instrumentation setup (instrumentation.ts):
/*instrumentation.ts*/
import { NodeSDK } from '@opentelemetry/sdk-node';
import { ConsoleSpanExporter } from '@opentelemetry/sdk-trace-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import {
  PeriodicExportingMetricReader,
  ConsoleMetricExporter,
} from '@opentelemetry/sdk-metrics';

const sdk = new NodeSDK({
  traceExporter: new ConsoleSpanExporter(),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new ConsoleMetricExporter(),
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
  1. Have a basic endpoint in TS code (app.ts):
/*app.ts*/
import express, { Express } from 'express';
import { rollTheDice } from './dice';

const PORT: number = parseInt(process.env.PORT || '8080');
const app: Express = express();

app.get('/rolldice', (req, res) => {
  const rolls = req.query.rolls ? parseInt(req.query.rolls.toString()) : NaN;
  if (isNaN(rolls)) {
    res
      .status(400)
      .send("Request parameter 'rolls' is missing or not a number.");
    return;
  }
  res.send(JSON.stringify(rollTheDice(rolls, 1, 6)));
});

app.listen(PORT, () => {
  console.log(`Listening for requests on http://localhost:${PORT}`);
});

  1. Run the main code with -require flag
$ npx ts-node --require ./instrumentation.ts app.ts
Listening for requests on http://localhost:8080

Manual instrumentation:

ref: https://opentelemetry.io/docs/languages/js/instrumentation/

  1. Post creating a new project, add these libraries
npm install @opentelemetry/api @opentelemetry/resources @opentelemetry/semantic-conventions
npm install @opentelemetry/sdk-node
  1. "Before any other module in your application is loaded, you must initialize the SDK. If you fail to initialize the SDK or initialize it too late, no-op implementations will be provided to any library that acquires a tracer or meter from the API."
/*instrumentation.ts*/
import { NodeSDK } from '@opentelemetry/sdk-node';
import { ConsoleSpanExporter } from '@opentelemetry/sdk-trace-node';
import {
  PeriodicExportingMetricReader,
  ConsoleMetricExporter,
} from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import {
  ATTR_SERVICE_NAME,
  ATTR_SERVICE_VERSION,
} from '@opentelemetry/semantic-conventions';

const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'yourServiceName',
    [ATTR_SERVICE_VERSION]: '1.0',
  }),
  traceExporter: new ConsoleSpanExporter(),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new ConsoleMetricExporter(),
  }),
});

sdk.start();
  1. Just run it post that:
npx ts-node --require ./instrumentation.ts app.ts
  1. We already have tracer ready from the above steps.
    1. By default BatchSpanProcessor is used, processes spans in batches before they are exported to a tool like Baselime.
    2. SimpleSpanProcessor: processes spans as they are created. If you create 5 spans, each will be processed and exported before the next span is created in code.

In most cases, stick with BatchSpanProcessor over SimpleSpanProcessor.

Acquiring a tracer

Best practices:

  • Logging:
    • use structured JSON
    • add trace and span ids
  • Semantic conventions:
    • operation.resource: db.save_post, api.get_news
    • Throw exceptions well (https://opentelemetry.io/docs/specs/otel/trace/exceptions/):
Span span = myTracer.startSpan(/*...*/);
try {
  // Code that does the actual work which the Span represents
} catch (Throwable e) {
  span.recordException(e);
  span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName())
  span.setStatus(StatusCode.ERROR, e.getMessage());
  throw e;
} finally {
  span.end();
}

Best example with cloudflare workers:

My example code: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel ref: https://github.com/kubeshop/tracetest/blob/main/examples/testing-cloudflare-workers/src/index.ts

import { trace, SpanStatusCode } from '@opentelemetry/api'
import { instrument, ResolveConfigFn } from '@microlabs/otel-cf-workers'
const tracer = trace.getTracer('pokemon-api')

export interface Env {
  DB: D1Database
    TRACETEST_URL: string
}

export async function addPokemon(pokemon: any, env: Env) {
  return await env.DB.prepare(
    "INSERT INTO Pokemon (name) VALUES (?) RETURNING *"
  ).bind(pokemon.name).all()
}

export async function getPokemon(pokemon: any, env: Env) {
  return await env.DB.prepare(
    "SELECT * FROM Pokemon WHERE id = ?;"
  ).bind(pokemon.id).all();
}

async function formatPokeApiResponse(response: any) {
  const { headers } = response
  const contentType = headers.get("content-type") || ""
  if (contentType.includes("application/json")) {
    const data = await response.json()
    const { name, id } = data

    // Add manual instrumentation
    const span = trace.getActiveSpan()
    if(span) {
      span.setStatus({ code: SpanStatusCode.OK, message: String("Pokemon fetched successfully!") })
      span.setAttribute('pokemon.name', name)
      span.setAttribute('pokemon.id', id)
    }

    return { name, id }
  }
  return response.text()
}

const handler = {
    async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    try {
      const { pathname, searchParams } = new URL(request.url)

      // Import a Pokemon
      if (pathname === "/api/pokemon" && request.method === "POST") {
        const queryId = searchParams.get('id')
        const requestUrl = `https://pokeapi.co/api/v2/pokemon/${queryId || '6'}`
        const response = await fetch(requestUrl)
        const resPokemon = await formatPokeApiResponse(response)

        // Add manual instrumentation
        return tracer.startActiveSpan('D1: Add Pokemon', async (span) => {
          const addedPokemon = await addPokemon(resPokemon, env)

          span.setStatus({ code: SpanStatusCode.OK, message: String("Pokemon added successfully!") })
          span.setAttribute('pokemon.name', String(addedPokemon?.results[0].name))
          span.end()

          return Response.json(addedPokemon)
        })
      }

      return new Response("Hello Worker!")
    } catch (err) {
      return new Response(String(err))
    }
    },
}

const config: ResolveConfigFn = (env: Env, _trigger) => {
  return {
    exporter: {
      url: env.TRACETEST_URL,
      headers: { },
    },
        service: { name: 'pokemon-api' },
    }
}

export default instrument(handler, config)

References 📘

  • What is OpenTelemetry? - Explanation and Demo: https://www.youtube.com/watch?v=LzLULxhyIpU&t=259s
  • Best practices - Grafana: https://grafana.com/blog/2023/12/18/opentelemetry-best-practices-a-users-guide-to-getting-started-with-opentelemetry/
  • Semantic conventions, too much data here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/README.md
  • Nodejs examples: https://opentelemetry.io/docs/languages/js/getting-started/nodejs/
  • This looks neat: https://tracetest.io/blog/crafting-observable-cloudflare-workers-with-opentelemetry
    • git for this: https://github.com/kubeshop/tracetest/tree/main/examples/testing-cloudflare-workers
  • Exceptions: https://opentelemetry.io/docs/specs/otel/trace/exceptions/
  • In depth in logs, traces and spans with example code: https://www.youtube.com/watch?v=NbVVZlSsvvM
  • Final app: https://github.com/ps428/otel-examples/tree/main/examples/cf-worker-otel