1. Structured Logging
Bad:
Order failed
Good:
{
"timestamp": "2026-02-19T14:32:17Z",
"level": "ERROR",
"service": "pos-backend",
"store_id": "4523",
"order_id": "12345",
"error": "Connection timeout",
"error_code": "TIMEOUT_ERR_001",
"context": {
"customer_id": "987654",
"total": 42.50,
"payment_method": "credit_card"
}
}Why: Structured logs are searchable, filterable, and correlatable.
2. Correlation IDs
What: Unique identifier that follows a request through all services.
Example:
Trace ID: 7d5d747b-e160-e280-5049-099d984bcfe0
[pos-frontend] trace_id=7d5d747b → Order received
[pos-backend] trace_id=7d5d747b → Processing order
[payment-svc] trace_id=7d5d747b → Authorizing payment
[order-svc] trace_id=7d5d747b → Creating order
Why: Easily trace a single transaction across multiple services.
3. Metric Cardinality
Problem: Too many unique metric labels = high cardinality = performance issues
Bad (high cardinality):
order_count{customer_id="12345", order_id="98765", ...}
# Millions of unique combinations!Good (low cardinality):
order_count{store_id="4523", status="success"}
# Dozens of stores × 3 statuses = manageableWhy: High cardinality metrics are expensive to store and query.
4. Sampling Traces
Problem: Tracing every request = massive data volume
Solution: Sample traces intelligently
- 100% of errors (always trace failed requests)
- 100% of slow requests (>1s)
- 1% of successful requests (statistical sample)
Why: Get comprehensive error visibility while controlling costs.