Skip to main content

metrics

Under Reviewv0.1.0-alpha

The metrics package registers all Prometheus metrics for the application in a single place. Metrics are auto-registered with the default Prometheus registry on package import (via promauto).

Import path: ecom-engine/pkg/metrics

Note: This package imports internal/infra/db to access the PoolStatsProvider interface for database pool metrics. This is the one exception to the pkg/ rule of no internal/ imports — it exists to keep all metric registrations in one canonical location.

For the full metric catalog, PromQL examples, recording rules, Grafana dashboards, and alert playbooks, see the Observability section.


Metric namespace

All metrics share the namespace ecom_engine. The full metric name follows the pattern:

ecom_engine_<subsystem>_<name>

HTTP metrics

These are updated by the metrics middleware in the gateway layer. They cover every completed HTTP request.

HTTPRequestsTotal

ecom_engine_http_requests_total

Type: Counter Labels: method, route, status

Counts completed HTTP requests partitioned by HTTP method (GET, POST, etc.), route template (e.g. /api/v1/products/:id), and HTTP status code.

Use this to compute request rates and error rates per endpoint.

HTTPRequestDuration

ecom_engine_http_request_duration_seconds

Type: Histogram Labels: method, route Buckets: 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s

Tracks request latency in seconds. Use the _bucket, _sum, and _count suffixes to compute percentile latency (p50, p95, p99) in Prometheus.

HTTPRequestsInFlight

ecom_engine_http_requests_in_flight

Type: Gauge Labels: none

Current number of HTTP requests actively being processed. Useful for detecting traffic spikes and connection saturation.


Database pool metrics

DBPoolCollector

A prometheus.Collector implementation that reads live PostgreSQL connection pool statistics from any infradb.PoolStatsProvider.

Construct and register it at startup:

import (
"github.com/prometheus/client_golang/prometheus"
"ecom-engine/pkg/metrics"
)

collector := metrics.NewDBPoolCollector(pgPool) // pgPool implements PoolStatsProvider
prometheus.MustRegister(collector)

Metrics exposed:

MetricTypeDescription
ecom_engine_db_pool_max_connsGaugeConfigured maximum connections in pool
ecom_engine_db_pool_total_connsGaugeCurrent total connections (acquired + idle)
ecom_engine_db_pool_acquired_connsGaugeConnections currently in use
ecom_engine_db_pool_idle_connsGaugeConnections available for acquisition
ecom_engine_db_pool_acquire_count_totalCounterCumulative successful acquisitions
ecom_engine_db_pool_empty_acquire_count_totalCounterAcquisitions that waited because pool was empty
ecom_engine_db_pool_acquire_duration_seconds_totalCounterCumulative time waiting to acquire connections

Cache metrics

Used by the cache infrastructure layer to track L1 (in-memory) and L2 (Redis) cache operations.

CacheRequestsTotal

ecom_engine_cache_requests_total

Type: Counter Labels: layer (L1/L2), operation (get/set/delete/increment), result (hit/miss/error)

Use this to calculate the cache hit rate per layer:

rate(ecom_engine_cache_requests_total{result="hit"}[5m])
/
rate(ecom_engine_cache_requests_total[5m])

CacheOperationDuration

ecom_engine_cache_operation_duration_seconds

Type: Histogram Labels: layer, operation Buckets: 0.1ms, 0.5ms, 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 500ms

Tracks cache operation latency. Buckets are skewed toward sub-millisecond ranges expected of L1/L2 caches.

CacheStampedeDedupTotal

ecom_engine_cache_stampede_dedup_total

Type: Counter

Counts requests collapsed by singleflight — concurrent callers for the same missing key that shared a single fetch call. A high value means stampede protection is actively firing under load.

CacheNegativeCacheTotal

ecom_engine_cache_negative_cache_total

Type: Counter

Counts null sentinels written to cache when the fetch function confirmed a record does not exist. A sudden spike may indicate a cache penetration attack (repeated lookups for non-existent keys).

CacheMemoryItems

ecom_engine_cache_memory_items

Type: Gauge

Current number of items in the L1 in-memory cache. Alert if this approaches CacheMemoryMaxItems.

CacheMemoryMaxItems

ecom_engine_cache_memory_max_items

Type: Gauge

Configured maximum item capacity of the L1 cache. Exposed as a gauge for dashboard ratio calculations:

ecom_engine_cache_memory_items / ecom_engine_cache_memory_max_items

CacheMemoryEvictionsTotal

ecom_engine_cache_memory_evictions_total

Type: Counter Labels: reason (expired, lfu)

Evictions from the L1 cache. expired means TTL elapsed; lfu means the item was evicted due to capacity (Least Frequently Used policy).


Redis pool metrics

CacheRedisPoolCollector

A prometheus.Collector that exposes connection pool statistics from the go-redis client.

collector := metrics.NewCacheRedisPoolCollector(redisClient)
prometheus.MustRegister(collector)

Metrics exposed:

MetricTypeDescription
ecom_engine_cache_redis_pool_total_connsGaugeTotal connections in Redis pool
ecom_engine_cache_redis_pool_idle_connsGaugeIdle connections available
ecom_engine_cache_redis_pool_stale_conns_totalCounterStale connections removed
ecom_engine_cache_redis_pool_hits_totalCounterPool hits (free connection found)
ecom_engine_cache_redis_pool_misses_totalCounterPool misses (new connection dialed)
ecom_engine_cache_redis_pool_timeouts_totalCounterCallers that timed out waiting for a connection