Skip to main content
Under Reviewv0.1.0-alpha

Observability Overview

AxCom emits three observability signals from the same running process: metrics, structured logs, and distributed traces. Together they let you answer any operational question without guessing.

SignalWhat it answersBackend
Metrics"Is the service healthy right now? What are the trends?"Prometheus → Grafana
Logs"What happened, and why?"Loki (via OTel Collector) → Grafana
Traces"Which code path did this request take, and where was it slow?"Tempo (via OTel Collector) → Grafana

Signal Routing

All signals converge in Grafana, making cross-signal navigation possible: click a spike in a metric graph, jump to the log stream for that time window, then click a trace ID in the logs to open the full trace.


How Each Signal is Produced

Metrics

The pkg/metrics package registers all Prometheus metrics on the default registry at import time using promauto. Metrics are grouped into five subsystems:

SubsystemPrefixWhat it covers
HTTPecom_engine_http_*Request rate, latency, in-flight count
Databaseecom_engine_db_pool_*PostgreSQL connection pool stats
Cacheecom_engine_cache_*L1/L2 hit rates, operation latency, evictions
Rate limitingecom_engine_ratelimit_*Allow/deny counts, backend fallbacks
Runtime & processecom_engine_runtime_* / ecom_engine_process_*Go heap, GC, goroutines, CPU %, RSS

Prometheus scrapes app:8080/metrics every 15 seconds. Pre-computed recording rules in prometheus/rules/recording-rules.yml materialise expensive expressions once so dashboards load instantly.

See Metrics for the full metric catalog and PromQL examples.

Logs

The pkg/logger package writes structured logs using Go's log/slog. In production (LOG_FORMAT=json) every log line is a JSON object conforming to Elastic Common Schema (ECS) 8.11, ready for Loki ingestion.

All log methods have *Ctx variants that automatically inject trace.id and span.id from the active OpenTelemetry span - enabling direct log-to-trace navigation in Grafana.

See Logs for the full schema and correlation guide.

Traces

The pkg/telemetry package bootstraps the OpenTelemetry SDK and registers a global TracerProvider. Traces are exported via OTLP/HTTP to the OTel Collector which forwards them to Tempo.

Sampling is controlled by OTEL_TRACE_SAMPLE (default: 1% in production). W3C TraceContext and Baggage propagators are registered globally, so incoming trace context from upstream services is respected automatically.

See Traces for span conventions and sampling configuration.


Dashboards at a Glance

Eight Grafana dashboards are provisioned automatically on first start. Start at Service Health during an incident and drill into the focused dashboard that matches the symptom.

DashboardUIDWhen to use
Service Healthecom-engine-healthOn-call first look, SLO monitoring
HTTP Trafficecom-engine-httpRequest rate / latency / 5xx investigation
Databaseecom-engine-dbConnection pool saturation
Cacheecom-engine-cacheHit rate drops, Redis pool issues
Business Eventsecom-engine-businessOrder/payment drops, cart abandonment
Securityecom-engine-securityAuth failures, rate limit health
Logsecom-engine-logsAd-hoc log search with level/trace filtering
Runtime & Processecom-engine-runtimeGo heap, GC, goroutines, CPU/memory

See Dashboards for a full description of every panel.


Alert Layers

Alerts are defined in two independent layers so they survive partial outages:

LayerFileFires even if
Prometheus alerting rulesprometheus/rules/alerting-rules.ymlGrafana is down
Grafana unified alertinggrafana/provisioning/alerting/*.ymlPrometheus is unreachable (Loki-sourced alerts)

See Alerts for the full alert catalog and response playbooks.


Deployment

The monitoring stack runs as a separate Docker Compose layer and can be added to any deployment scenario. See Ops & Deploy → Monitoring for setup instructions.