OpenTelemetry Configuration
1. Overview
EMS services emit traces, metrics, and (via Logback appender) correlated logs to a central OpenTelemetry Collector running in the cluster’s observability namespace. The collector forwards to the observability backend (currently an in-house stack; future: Tempo/Loki/Prometheus or a vendor SaaS).
Two instrumentation paths:
-
Java agent (
opentelemetry-javaagent) — auto-instruments HTTP servers, JDBC, Hazelcast, Kafka, etc. at bytecode level. Baked into the service’s Docker image via Jib’sextraDirectories. -
SDK / starter (
opentelemetry-spring-boot-starter) — enables manual@WithSpanannotations,@Counted,@Timedmethod-level instrumentation, and custom span creation viaTracer.
Both are active simultaneously in admin-service. Spring Web auto-instrumentation is disabled in admin-service’s application-otlp.yml — the agent would double-count if Spring Web also reported. Manual spans and JDBC spans cover the rest.
See also: SpringApplication Bootstrap, Hazelcast Configuration.
2. Topology
Collector endpoint in-cluster: http://opentelemetry-collector.observability.svc.cluster.local:4317 (gRPC) or :4318 (HTTP). Deployed via ~/dev/idl-xnl-jhb-rc01/argocd/opentelemetry-collector.yml.
3. Admin-Service Configuration
Full detail — this is the reference implementation.
3.1. Profile-gated
Active only when otlp profile is enabled. ArgoCD prod manifest sets config.profiles: "prod,kubernetes,otlp" (or similar — see ArgoCD Deployment Patterns).
3.2. POM configuration
The agent is always baked into the image (regardless of which Spring profile is active); the otlp profile only adds the Spring-side dependencies that activate application-otlp.yml.
In the main <build> section (always runs):
-
Maven plugin:
maven-dependency-plugincopyexecution, bound to thepackagephase. Downloadsio.opentelemetry.javaagent:opentelemetry-javaagent:<version>into${agent-extraction-root}(=${project.build.directory}/jib-agents) as${opentelemetry-javaagent-filename}(=opentelemetry-javaagent.jar). -
Jib
<extraDirectories>then copies that directory into the image at${agent-install-location}(=/javaagent). The image always has/javaagent/opentelemetry-javaagent.jarregardless of profile. -
Jib
<jvmFlags>always include-javaagent:/javaagent/opentelemetry-javaagent.jarplus-Dotel.{logs,traces,metrics}.exporter=otlp. -
Jib
<environment>setsOTEL_SERVICE_NAME=${project.artifactId}so traces are tagged with the service name.
Under the otlp profile (Spring-side only):
-
BOM:
io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom:<version> -
Dependency:
io.opentelemetry.instrumentation:opentelemetry-spring-boot-starter -
Dependency:
io.opentelemetry.instrumentation:opentelemetry-instrumentation-annotations -
Dependency:
io.opentelemetry.instrumentation:opentelemetry-logback-appender-1.0:<version>-alpha
The agent operates at bytecode level and works without any Spring-side dependency. The Spring starter adds @WithSpan annotation support, the logback appender for trace correlation, and SDK-level autoconfiguration. With the otlp profile inactive, the agent is still loaded but it falls back to its own auto-instrumentation only — no Spring @WithSpan, no logback trace correlation.
Anti-pattern (do not copy from older internal scaffolds): downloading the agent into src/main/jib/opt/otel/ with Jib <extraDirectories> pointing at a different path. The path mismatch silently drops the agent from the image; the JVM emits -javaagent: file not found on startup and OTel emits nothing. Source-tree pollution is also wrong — agent jars belong in target/.
See Jib Docker Build § OTel javaagent for the full downloader + Jib XML.
3.3. Runtime config (application-otlp.yml)
spring:
jpa:
properties:
hibernate.generate_statistics: true # admin-service only — drop for non-JPA services
management:
metrics:
export:
otlp:
enabled: true
otel:
java:
global-autoconfigure:
enabled: true
exporter:
otlp:
endpoint: 'http://opentelemetry-collector.observability.svc.cluster.local:4317'
jaeger:
enabled: false
zipkin:
enabled: false
springboot:
resource:
enable: true
resource:
attributes:
'service.version': '${project.version}'
'deployment.environment': production
instrumentation:
annotations:
enabled: true
logback-appender.enabled: true
spring-web.enabled: false
spring-webmvc.enabled: false
spring-webflux.enabled: false
Key decisions:
-
global-autoconfigure: true— picks up Spring Boot auto-config -
Jaeger and Zipkin exporters disabled — we only export OTLP to the collector
-
Resource attributes include
service.version(fromproject.version) anddeployment.environment— both surfaced in the backend UI for filtering -
Spring Web/WebMVC/WebFlux auto-instrumentation disabled — the javaagent already instruments these at bytecode level; keeping the Spring-SDK version enabled produces duplicate spans
-
Logback appender enabled — every log line emitted through Logback carries the current trace/span context so logs correlate in the backend
3.4. Javaagent at runtime
Jib bakes the javaagent at /javaagent/opentelemetry-javaagent.jar (the path is parameterised by the Maven properties agent-install-location and opentelemetry-javaagent-filename so it stays consistent across services). The container’s jvmFlags include:
<jvmFlag>-javaagent:${agent-install-location}/${opentelemetry-javaagent-filename}</jvmFlag>
<jvmFlag>-Dotel.logs.exporter=otlp</jvmFlag>
<jvmFlag>-Dotel.traces.exporter=otlp</jvmFlag>
<jvmFlag>-Dotel.metrics.exporter=otlp</jvmFlag>
Plus environment:
<environment>
<OTEL_SERVICE_NAME>${project.artifactId}</OTEL_SERVICE_NAME>
</environment>
The exporter selectors (-Dotel.{logs,traces,metrics}.exporter=otlp) are needed because the agent’s default behaviour for some signal types changed across versions. Setting them explicitly avoids surprise.
Customisation (sampling rate, instrumentation toggles, custom resource attributes) is done via env vars (OTEL_TRACES_SAMPLER=parentbased_traceidratio, OTEL_TRACES_SAMPLER_ARG=0.1, etc.) injected by the Helm chart from the ArgoCD valuesObject — not via an agent.properties file. The properties-file approach is supported by the agent but adds an extra config artefact for no benefit.
4. Registration-portal and admin-portal
registration-portal currently has no OTel configured — otlp profile absent, no dependencies, no agent. This is a known gap; adding it is a backlog item.
admin-portal should launch with OTel enabled from day one. Clone admin-service’s otlp profile config:
-
Same POM dependencies
-
Same
application-otlp.yml— exceptservice.versionadjusted per-service -
Same Jib extraDirectory for the javaagent
-
Same
jvmFlag -javaagent:/opt/otel/javaagent.jar
Gateway-specific spans worth adding manually:
-
POST /api/session/tenant— wrap the token-exchange call in a span with attributesuser.sub,tenant.requested,tenant.current-before -
AdminServiceJwtRelayFilter— wrap the proxy call in a span withadmin-service.endpointattribute -
TenantResolutionFilter— a short span identifying the resolution source (domain/header/session)
These make debugging multi-tenant auth issues tractable in trace view.
5. Metrics
Micrometer + OTel bridge emit the standard JVM + HTTP + Hazelcast metrics. Additional EMS-custom metrics live in admin-service/src/main/java/…/config/MetricsConfiguration.java — OtlpMetricsNamingConvention keeps names dot-separated (OTel style) rather than underscore-separated (Prometheus style).
Selected metrics:
-
http.server.requests— rate, p95/p99 latency, status-code distribution per URI (automatic) -
jdbc.connections.active/jdbc.connections.max— HikariCP pool state -
hazelcast.partition.is-migrating— cluster rebalance indicator -
jvm.memory.used/jvm.gc.pause— standard JVM -
Custom:
ems.import.duration/ems.import.rows— import-specific timers, seeImportAsyncConfiguration
Dashboards live in the observability backend; owner: Solution Architect / Ops.
6. Tracing Patterns
6.1. Business-flow spans
Group multiple API calls that belong to the same user journey under a "business flow" span. See design-journal/2026-03/end-to-end-distributed-tracing.adoc for the design. Pattern:
@WithSpan("membership-registration")
public void registerMembership(...) {
// child spans from auto-instrumented Spring Web + JDBC roll up under this
}
Useful for showing "registration took 3.4s" with breakdown across the participant, payment, and email sub-operations.
6.2. W3C traceparent propagation
Frontend → gateway → admin-service all propagate traceparent header. registration-portal’s interceptor (when OTel lands for it) should extract any existing trace from the browser’s performance-navigation entries and attach; otherwise generate a new root.
Cross-cluster propagation (e.g. admin-service → WordPress → RunSignup) honours the same convention where supported.
7. Logs-to-Traces Correlation
opentelemetry-logback-appender-1.0 emits every log line with the current trace_id and span_id as structured attributes. In the backend UI, clicking a trace shows the correlated log lines. In kubectl logs, the trace/span IDs appear in the log pattern (%X{trace_id}, %X{span_id} via MDC).
No action needed per-log-line; the appender handles it globally when active.
8. Sampling
Current default: 100% (all traces exported). Low volume; backend handles it. When volume grows:
-
Head-based sampling at the collector — drop 90% of low-interest traces (health checks, readiness probes), keep 100% of error traces, keep high-percentage of slow traces.
-
Tail-based sampling — collector decides after collecting the full trace, based on total duration + error status.
Tune at the collector, not at the service. Services always emit; collector filters.
9. Known Gaps
-
registration-portal lacks OTel — backlog item; not a blocker but reduces end-to-end trace visibility.
-
Frontend instrumentation — browser-originated OTLP proxied via gateway is designed (see
design-journal/2026-03/end-to-end-distributed-tracing.adoc) but not implemented. Adds browser-to-admin-service trace root. -
No SLO tracking — metrics exist but service-level objectives are not formalised. Future work.
10. Reference
| File | Role |
|---|---|
|
OTel runtime config for admin-service |
|
Custom metrics registry + OTel bridge |
|
OTel-style naming convention |
|
Dependencies + javaagent download |
|
OTel Collector ArgoCD Application |
|
Cross-cutting tracing design including frontend |