OpenTelemetry Observability Baseline
Measurement standards that connect logs, metrics, and traces to reduce time to cause of failure

Introduction
Failure response speed is determined by context quality, not log volume. Even if OpenTelemetry is introduced, the cause analysis time is hardly reduced if trace_id does not cross the service boundary. This article presents an observability baseline that ensures traceability at minimal cost.

Problem definition
The reason observability fails is because it tries to collect all signals at once.
- There is no span naming convention, so the dashboard becomes a collection of strings that cannot be queried.
- There is no business key in the error event, so the extent of the failure impact cannot be tracked.
- The sampling policy is fixed, so important traces are missed during peak times.
Initially, tracking only three critical paths is sufficient. Instead, tag standards and ID propagation rules must be strongly fixed.
Key concepts
| perspective | Design criteria | Verification points |
|---|---|---|
| Trace | Request Unit Correlation | Span connection rate by service boundary |
| Metrics | SLO-centric indicators | False positive alert rate |
| Logs | structured log + trace_id | Time required for cause analysis |
| Sampling | Traffic/Error Based Dynamic Sampling | Signal Density to Cost |
Observability is design, not collection. You can reduce unnecessary collection costs by first defining what questions you are leaving data to answer.
Code example 1: HTTP service tracing
import { trace } from "@opentelemetry/api";
const tracer = trace.getTracer("8space-api");
export async function tracedHandler(requestId: string, fn: () => Promise<unknown>) {
return tracer.startActiveSpan("http.request", async (span) => {
span.setAttribute("request.id", requestId);
span.setAttribute("service.name", "blog-api");
try {
const result = await fn();
span.setStatus({ code: 1 });
return result;
} catch (error) {
span.recordException(error as Error);
span.setStatus({ code: 2, message: "handler failed" });
throw error;
} finally {
span.end();
}
});
}
Code Example 2: Collector Pipeline
receivers:
otlp:
protocols:
http:
grpc:
processors:
batch:
attributes:
actions:
- key: deployment.environment
value: production
action: upsert
exporters:
prometheus:
otlphttp:
endpoint: https://otel.example.com/v1/traces
Architecture flow
Tradeoffs
- Increasing sampling improves analysis quality, but storage costs quickly increase.
- Searchability improves if you add a lot of tags, but performance may decrease due to high cardinality.
- Narrowing the initial scope can lead to faster introduction, but failures in non-core sections require separate analysis.
Cleanup
The OpenTelemetry baseline is not about complete measurement, but about first creating a minimum operable standard. By fixing ID propagation, tag rules, and dynamic sampling, failure response speed can be reliably improved.
Image source
- Cover: source link
- License: CC BY-SA 3.0 / Author: Unknown
- Note: After downloading the free license image from Wikimedia Commons, it was optimized to JPG at 1600px.