Skip to main content
Under Reviewv0.1.0-alpha

Monitoring

The monitoring stack adds full observability to any app deployment — metrics, logs, and distributed traces in Grafana. Two options are available:

This page covers deployment and setup only. For what the signals mean, the full metric catalog, dashboard descriptions, and alert playbooks, see the Observability section.

OptionStackCost
Scenario 5 — Self-hostedOTelCol + Prometheus + Loki + Tempo + GrafanaVPS resources only
Scenario 6 — Grafana CloudOTelCol → Grafana Cloud OTLP endpointGrafana Cloud free tier (10k metrics, 50 GB logs, 50 GB traces/month)

Both share the same OTel Collector pipeline — only the exporter config differs.


Signal Routing

Metricsprometheus/client_golang is already wired. Prometheus scrapes app:8080/metrics on every 15-second interval. No app changes needed.

Logs — The OTel Collector tails Docker JSON log files and ships structured log lines to Loki. The app's JSON log output is parsed automatically.

Traces — When the app's OTel SDK is initialized, traces are pushed via OTLP to the Collector and forwarded to Tempo. The infrastructure is ready; SDK instrumentation is the next step.


Scenario 5 — Self-Hosted

Prerequisites

Complete the Prerequisites page. The monitoring stack requires:

  • ecom-net Docker network
  • A copy of monitoring/.env (copy from .env.example)

Start

cd ecom-backend/deployments/monitoring
cp .env.example .env # edit if you want to change ports or Grafana credentials
docker compose up -d

Services

ServiceDefault URLPurpose
Grafanahttp://localhost:3000Dashboards, alerts
Prometheushttp://localhost:9090Metric storage and query
Lokihttp://localhost:3100Log storage and query
Tempohttp://localhost:3200Trace storage and query

Default Grafana credentials: admin / admin (change in .env).

Connect the App

Add to your app deployment's environment (or .env.dev):

OTEL_ENABLED=true
OTEL_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4318

Because both stacks are on ecom-net, the app can reach the OTel Collector by service name otelcol.

Pre-built Dashboards

Grafana is provisioned with eight dashboards automatically on first start:

DashboardWhen to use
Service HealthOn-call first look, SLO monitoring
HTTP TrafficRequest rate / latency / 5xx investigation
DatabaseDB connection pool saturation
CacheCache hit rate drops, Redis pool issues
Business EventsOrder/payment drops, cart abandonment
SecurityAuth failures, rate limiting health
LogsAd-hoc log search with level/trace filtering
RuntimeGo heap, GC, goroutines, CPU/memory

Start at Service Health during an incident, then drill into the relevant focused dashboard.

Prometheus Alerts

Metric-based alerts are defined in prometheus/rules/alerting-rules.yml and fire even when Grafana is down:

AlertCondition
HighHttpErrorRate5xx rate > 5% for 5m
HighP99Latencyp99 > 2s for 5m
DbPoolExhaustedpool utilization > 90% for 2m
LowCacheHitRateRedis hit rate < 50% for 10m
HighCpuUsageCPU > 80% for 5m
GoroutineLeakgoroutines > 500 for 15m

Grafana alerts (in grafana/provisioning/alerting/) extend these with Loki-based log alerts:

AlertCondition
ErrorLogSpike> 50 error logs in 5m
PaymentErrorSpike> 10 payment error logs in 5m
AuthFailureSpike> 30 auth failure logs in 5m

Scenario 6 — Grafana Cloud

Use Grafana Cloud as the observability backend. Only the OTel Collector runs locally; all storage and dashboards are managed by Grafana Cloud.

Get Grafana Cloud Credentials

  1. Sign up or log in at grafana.com
  2. Go to your stack → OpenTelemetry tile → Configure
  3. Note your:
    • OTLP endpoint (e.g. https://otlp-gateway-prod-us-east-0.grafana.net/otlp)
    • Instance ID (numeric stack ID)
    • API key (create one with MetricsPublisher + LogsPublisher + TracesPublisher roles)

Configure

Add to ecom-backend/deployments/monitoring/.env:

GRAFANA_CLOUD_OTLP_ENDPOINT=https://otlp-gateway-prod-us-east-0.grafana.net/otlp
GRAFANA_CLOUD_INSTANCE_ID=123456
GRAFANA_CLOUD_API_KEY=glc_eyJ...

Start (Cloud Override)

cd ecom-backend/deployments/monitoring
docker compose -f docker-compose.yml -f docker-compose.cloud.yml up -d

This starts only the OTel Collector. Prometheus, Loki, Tempo, and Grafana are all provided by Grafana Cloud — nothing to manage locally.

What the Cloud Collector Does

The cloud config (otelcol/otelcol-cloud.yml) runs a prometheus receiver that scrapes app:8080/metrics and a filelog receiver for logs. All three signals (metrics + logs + traces) are forwarded to the single Grafana Cloud OTLP endpoint over HTTPS with Basic Auth.


Stopping / Upgrading

# Stop monitoring (does not stop the app)
docker compose down

# Upgrade images and restart
docker compose pull && docker compose up -d

# Remove all monitoring data
docker compose down -v

Removing volumes deletes all local Prometheus metrics, Loki logs, and Tempo traces. Only do this if you want a clean slate.