Back to course: Edge

Byte Edge | Reading Module

ClickStack Deep Dive for IM

Status: Not Started | Pass threshold: 100% | Points: 95

L3 40 min

Best score

0%

Attempts

0

Pass rate

0%

Passed

0

Completion happens in the checkpoint panel below.

Learning Guidance

Objectives

  • **OpenAI**: ChatGPT infrastructure monitoring
  • **Anthropic**: Claude infrastructure observability
  • **Tesla**: 1 billion events/second processing
  • **Shopify**: In-house observability platform

Source Artifacts

Internal source references are available for maintainers but are not exposed in deployed environments.

Field Evidence

Real incidents related to what you're learning.

Module Content

Not Started

Key Takeaways

  • OpenAI: ChatGPT infrastructure monitoring
  • Anthropic: Claude infrastructure observability
  • Tesla: 1 billion events/second processing
  • Shopify: In-house observability platform
  • ClickHouse = PostgreSQL (database)

Overview

Reading time: ~50 minutes


Overview

ClickStack is ClickHouse's official open-source observability platform that provides a complete edge observability solution.

What is ClickStack?

ClickStack = ClickHouse + HyperDX + OpenTelemetry

ComponentPurpose
ClickHouseColumnar database engine (stores telemetry data)
HyperDXIntelligent UI and API layer (unified query experience)
OpenTelemetryStandard ingestion pipeline (OTLP protocol)

Key Insight: ClickStack is not just ClickHouse the database - it's a complete, production-ready observability platform optimized for high-volume telemetry workloads.

Industry Adoption:

  • OpenAI: ChatGPT infrastructure monitoring
  • Anthropic: Claude infrastructure observability
  • Tesla: 1 billion events/second processing
  • Shopify: In-house observability platform

Analogy:

  • ClickHouse = PostgreSQL (database)
  • HyperDX = pgAdmin (UI to query the database)
  • ClickStack = The full integrated platform

Why ClickStack for Edge Observability?

Traditional Observability Challenges

Problem: Most observability platforms (Datadog, New Relic, Splunk) are designed for cloud environments with:

  • Always-on connectivity ✅
  • Centralized infrastructure ✅
  • Cost scales with data volume (acceptable for cloud) ✅

Edge Reality: Restaurants, stores, remote locations have:

  • Intermittent connectivity ❌
  • Distributed infrastructure ❌
  • High data volume = prohibitive cloud costs ❌

ClickStack's Edge Advantages

  1. Self-hosted: Runs entirely on edge infrastructure (no cloud dependency)
  2. High performance: Sub-second queries on billions of events
  3. Cost efficient: Storage cost = local disk (not per-GB ingestion fees)
  4. Offline capable: Works without internet connectivity
  5. Unified querying: SQL + Lucene syntax for both power users and beginners
  6. JSON columns: Dynamic schema support without pre-defining fields
  7. High cardinality: Handle millions of unique label combinations efficiently

What is ClickHouse?

Elevator Pitch

ClickHouse is an open-source columnar database optimized for analytics and time-series data.

Why Columnar?

Row-based databases (PostgreSQL, MySQL):

| id  | timestamp           | level | message               |
|-----|---------------------|-------|-----------------------|
| 1   | 2026-02-19 14:00:00 | INFO  | Order created         |
| 2   | 2026-02-19 14:00:01 | ERROR | Payment failed        |
| 3   | 2026-02-19 14:00:02 | INFO  | Order confirmed       |

Storage: Row 1 [1, 2026-02-19 14:00:00, INFO, Order created]
         Row 2 [2, 2026-02-19 14:00:01, ERROR, Payment failed]
         Row 3 [3, 2026-02-19 14:00:02, INFO, Order confirmed]

Columnar databases (ClickHouse):

Storage: Column id        [1, 2, 3]
         Column timestamp [2026-02-19 14:00:00, 2026-02-19 14:00:01, ...]
         Column level     [INFO, ERROR, INFO]
         Column message   [Order created, Payment failed, ...]

Why This Matters for Telemetry

Query: "Count ERROR logs in the last hour"

Row-based: Read ALL columns for ALL rows, filter by level

  • Must scan: id, timestamp, level, message for millions of rows
  • Slow ❌

Columnar: Read only the level and timestamp columns

  • Scan only relevant columns
  • 10-100x faster ✅

Key insight: Telemetry queries typically filter/aggregate on a few columns (timestamp, level, service) → columnar is perfect.


ClickHouse Features for Telemetry

1. Blazing Fast Queries

  • Compression: Similar values in a column compress well (e.g., "INFO" repeated 1M times)
  • Parallelization: Queries use all CPU cores
  • Vectorized execution: Process thousands of rows per CPU instruction

Result: Query billions of log entries in seconds.


2. Time-Series Optimized

Partitioning by date:

CREATE TABLE logs (
  timestamp DateTime,
  level String,
  message String,
  service String
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)  -- One partition per day
ORDER BY (timestamp, service);

Benefit: When querying "last 2 hours," ClickHouse only scans relevant partitions, ignoring the rest.


3. TTL (Time-To-Live) for Auto-Cleanup

ALTER TABLE logs MODIFY TTL timestamp + INTERVAL 90 DAY;

Benefit: Automatically delete logs older than 90 days → no manual cleanup, storage stays manageable.


4. Materialized Views for Pre-Aggregation

-- Pre-compute error counts per service per hour
CREATE MATERIALIZED VIEW error_counts_hourly
ENGINE = SummingMergeTree()
ORDER BY (service, hour)
AS SELECT
  service,
  toStartOfHour(timestamp) AS hour,
  countIf(level = 'ERROR') AS error_count
FROM logs
GROUP BY service, hour;

Benefit: Instead of scanning millions of logs to count errors, query the pre-aggregated view (instant results).


What is HyperDX?

Elevator Pitch

HyperDX is the frontend layer of ClickStack - an open-source observability UI (like Datadog) that provides unified querying across all telemetry signals.

Acquired by ClickHouse Inc. in early 2025, HyperDX is now the official UI for ClickStack.

Key Philosophy: No Signal Silos

Traditional observability: Separate tabs for Logs, Metrics, Traces

  • ❌ Forces you to switch between different tools
  • ❌ Manual correlation between signals
  • ❌ Fragmented investigation workflow

HyperDX approach: Unified search across all signals

  • ✅ Single query syntax for logs, metrics, and traces
  • ✅ Automatic correlation (click log → see related trace)
  • ✅ Symptom-to-root-cause workflow

Key Features

  1. Unified Search Experience
  • Query logs, metrics, and traces with one syntax
  • Both SQL (powerful analytics) and Lucene (simple text search) supported
  • Automatic correlation between signals
  1. Dual Query Syntax
  • Lucene: error payment (simple, fast)
  • SQL: SELECT * FROM logs WHERE severity='ERROR' AND body LIKE '%payment%' (powerful, flexible)
  • Choose based on your expertise and use case
  1. Log Search & Filtering
  • Full-text search
  • Structured field filtering
  • Time range selection
  • Pattern detection (clustering similar logs)
  1. Metrics Dashboards
  • Custom dashboards
  • Visualization (line charts, bar charts, heatmaps)
  • Alerting (trigger on thresholds)
  1. Distributed Tracing
  • Trace visualization (waterfall diagrams)
  • Service dependency maps
  • Latency analysis
  1. Correlation
  • Jump from logs → traces → metrics
  • Unified view of all telemetry
  • Client-side + backend telemetry in one view

ClickStack Architecture

┌─────────────────────────────────────────────────────────┐
│  Applications (Edge K8s cluster)                         │
│  - pos-backend                                           │
│  - payment-service                                       │
│  - order-service                                         │
└─────┬───────────────────────────────────────────────────┘
      │ (logs, metrics, traces via OpenTelemetry SDK)
      │
┌─────▼───────────────────────────────────────────────────┐
│  OpenTelemetry Collector (ClickStack Component)          │
│  - Receives telemetry via OTLP protocol (standard)       │
│  - Enriches data (adds store_id, environment tags)       │
│  - Processors: batch, filter, transform                  │
│  - Routes to ClickHouse using native exporter            │
└─────┬───────────────────────────────────────────────────┘
      │
┌─────▼───────────────────────────────────────────────────┐
│  ClickHouse Database (ClickStack Component)              │
│  - logs table (log entries with JSON columns)            │
│  - traces table (spans)                                  │
│  - metrics table (time-series data)                      │
│  - Native JSON type for dynamic fields                   │
│  - High cardinality support (billions of labels)         │
└─────┬───────────────────────────────────────────────────┘
      │
┌─────▼───────────────────────────────────────────────────┐
│  HyperDX API + UI (ClickStack Component)                 │
│  - Web UI: http://clickstack.store-4523.local:8080       │
│  - REST API: Query logs/metrics/traces programmatically  │
│  - Query engine: Dual syntax (SQL + Lucene)              │
│  - Unified search: Logs + Metrics + Traces in one view   │
└──────────────────────────────────────────────────────────┘

Note: ClickStack is the integrated platform. The three components work together seamlessly.


ClickStack's Unique Features

1. Native JSON Column Type

Problem: Traditional observability requires pre-defining every field

  • ❌ "Add a new field? Update the schema first"
  • ❌ Dynamic fields stored as strings = slow queries
  • ❌ Nested JSON requires complex parsing

ClickStack Solution: Native JSON columns

-- Each path in JSON automatically becomes its own column
attributes JSON  -- Dynamically expands to: attributes.order_id, attributes.customer_id, etc.

Performance Gains:

  • 10x faster searches (only read relevant fields)
  • 100x less data scanned (skip irrelevant columns)
  • No manual column management

Real Example:

-- Old approach: String column, slow scan
SELECT * FROM logs WHERE JSONExtractString(attributes, 'order_id') = '12345';

-- ClickStack approach: Native column, fast lookup
SELECT * FROM logs WHERE attributes.order_id = '12345';

2. Dual Query Syntax: SQL + Lucene

HyperDX provides two query modes:

Lucene Syntax (Simple & Fast)

error payment
service:payment-service status:ERROR
store_id:4523 AND (payment OR authorization)

When to use: Quick searches, finding specific events, exploring data

SQL Syntax (Powerful & Analytical)

SELECT service_name, count(*) as error_count
FROM logs
WHERE severity_text = 'ERROR'
  AND timestamp >= now() - INTERVAL 1 HOUR
GROUP BY service_name
ORDER BY error_count DESC;

When to use: Aggregations, complex filtering, analytical queries

Key Insight: You can start with Lucene, then switch to SQL for deeper analysis.


3. High Cardinality Support

Problem: Traditional time-series databases (Prometheus, InfluxDB) struggle with high cardinality

  • ❌ Millions of unique label combinations = performance degradation
  • ❌ Must sample or limit labels

ClickStack Solution: Everything in one big table

-- Even with billions of unique combinations, no problem
SELECT * FROM metrics
WHERE labels.customer_id = '12345'
  AND labels.order_id = '98765'
  AND labels.payment_method = 'credit_card';

Real-World Scale:

  • Tesla: 1 billion events/second, 1 quintillion rows
  • OpenAI: ChatGPT infrastructure monitoring
  • Anthropic: Claude infrastructure (high cardinality labels)

4. Unified Signal Correlation

HyperDX automatically correlates:

  • Logs with trace_id → Shows related trace
  • Traces with spans → Shows all logs for that request
  • Metrics with labels → Shows related logs and traces

Workflow:

  1. See error spike in metrics dashboard
  2. Click spike → Jump to logs filtered to that time
  3. Click log → See full distributed trace
  4. Identify bottleneck in trace → Jump back to logs for that service

This is the power of ClickStack's unified approach.


Telemetry Schema in ClickHouse

Logs Table

CREATE TABLE logs (
  timestamp DateTime64(3),         -- Millisecond precision
  trace_id String,                 -- Correlation ID (links to traces)
  span_id String,                  -- Span ID (links to specific trace span)
  severity_text String,            -- INFO, WARN, ERROR, DEBUG
  severity_number Int8,            -- Numeric severity (for sorting)
  service_name String,             -- pos-backend, payment-service, etc.
  body String,                     -- Log message

  -- Resource attributes (describe the source)
  resource_store_id String,        -- Store #4523
  resource_environment String,     -- production, staging
  resource_k8s_pod_name String,    -- pos-backend-abc123
  resource_k8s_namespace String,   -- pos

  -- Log attributes (structured data from application)
  attributes Map(String, String),  -- Key-value pairs (e.g., order_id, customer_id)

  INDEX idx_trace_id trace_id TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (timestamp, service_name, severity_number);

Traces Table

CREATE TABLE traces (
  timestamp DateTime64(3),
  trace_id String,                 -- Unique trace ID
  span_id String,                  -- Unique span ID
  parent_span_id String,           -- Parent span (for hierarchy)
  span_name String,                -- Operation name (e.g., "POST /order")
  span_kind String,                -- SERVER, CLIENT, INTERNAL
  service_name String,
  duration_ns UInt64,              -- Span duration in nanoseconds
  status_code String,              -- OK, ERROR

  -- Span attributes
  attributes Map(String, String),  -- http.method, http.status_code, etc.

  -- Resource attributes
  resource_store_id String,
  resource_environment String,

  INDEX idx_trace_id trace_id TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (timestamp, trace_id, span_id);

Metrics Table

CREATE TABLE metrics (
  timestamp DateTime64(3),
  metric_name String,              -- cpu_usage_percent, order_count, etc.
  value Float64,                   -- Metric value

  -- Metric attributes (labels)
  attributes Map(String, String),  -- service, host, status, etc.

  -- Resource attributes
  resource_store_id String,
  resource_environment String
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (timestamp, metric_name);

Common Query Patterns for IM

Pattern 1: Recent Error Logs

SELECT
  timestamp,
  service_name,
  body,
  attributes['order_id'] AS order_id,
  attributes['customer_id'] AS customer_id
FROM logs
WHERE
  timestamp >= now() - INTERVAL 2 HOUR
  AND severity_text = 'ERROR'
  AND resource_store_id = '4523'
ORDER BY timestamp DESC
LIMIT 100;

Use case: "Show me all errors at Store #4523 in the last 2 hours"


Pattern 2: Error Rate by Service

SELECT
  service_name,
  countIf(severity_text = 'ERROR') AS error_count,
  count() AS total_count,
  (error_count / total_count) * 100 AS error_rate_pct
FROM logs
WHERE
  timestamp >= now() - INTERVAL 1 HOUR
  AND resource_store_id = '4523'
GROUP BY service_name
ORDER BY error_rate_pct DESC;

Use case: "Which service has the highest error rate?"


Pattern 3: Trace Lookup by ID

SELECT
  span_id,
  parent_span_id,
  span_name,
  service_name,
  duration_ns / 1000000 AS duration_ms,
  status_code,
  attributes
FROM traces
WHERE
  trace_id = '7d5d747b-e160-e280-5049-099d984bcfe0'
ORDER BY timestamp ASC;

Use case: "Show me the full trace for this order"


Pattern 4: Slow Traces (P99 Latency)

SELECT
  trace_id,
  span_name,
  service_name,
  max(duration_ns) / 1000000 AS max_duration_ms
FROM traces
WHERE
  timestamp >= now() - INTERVAL 1 HOUR
  AND resource_store_id = '4523'
  AND span_kind = 'SERVER'
GROUP BY trace_id, span_name, service_name
HAVING max_duration_ms > 1000  -- Slower than 1 second
ORDER BY max_duration_ms DESC
LIMIT 20;

Use case: "What are the slowest requests in the last hour?"


Pattern 5: Correlated Logs for a Trace

SELECT
  timestamp,
  service_name,
  severity_text,
  body
FROM logs
WHERE
  trace_id = '7d5d747b-e160-e280-5049-099d984bcfe0'
ORDER BY timestamp ASC;

Use case: "Show me all logs related to this trace"


Pattern 6: Metric Trend (CPU Usage)

SELECT
  toStartOfMinute(timestamp) AS minute,
  avg(value) AS avg_cpu
FROM metrics
WHERE
  metric_name = 'cpu_usage_percent'
  AND resource_store_id = '4523'
  AND attributes['service'] = 'pos-backend'
  AND timestamp >= now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute ASC;

Use case: "Graph CPU usage for pos-backend over the last hour"


HyperDX UI Walkthrough

1. Log Search

UI Location: HyperDX → Logs

Features:

  • Full-text search: Search across all log messages
  • Field filters: Filter by service, level, store_id, etc.
  • Time picker: Last 15m, 1h, 4h, custom range
  • Live tail: Stream logs in real-time

Example:

Query: "payment failed"
Filters:
  - service_name = payment-service
  - severity_text = ERROR
  - resource_store_id = 4523
Time range: Last 2 hours

Result: All error logs containing "payment failed" from the payment service at store #4523.


2. Trace View

UI Location: HyperDX → Traces

Features:

  • Waterfall diagram: Visualize span hierarchy and timing
  • Service map: See service dependencies
  • Filter by duration: Find slow traces
  • Filter by status: Find failed traces

Example:

Filters:
  - duration > 1000ms
  - status = ERROR
  - service_name = pos-backend
Time range: Last 1 hour

Result: All failed traces from pos-backend that took longer than 1 second.


3. Metrics Dashboard

UI Location: HyperDX → Metrics

Features:

  • Custom dashboards: Create charts for key metrics
  • Visualization types: Line chart, bar chart, heatmap, gauge
  • Alerts: Set thresholds and get notified

Example Dashboard:

  • Panel 1: CPU usage (line chart)
  • Panel 2: Error rate per service (bar chart)
  • Panel 3: Request latency P50/P99 (line chart)
  • Panel 4: Active orders (gauge)

4. Correlation: Logs ↔ Traces ↔ Metrics

Workflow:

  1. See spike in error rate (Metrics Dashboard)
  2. Click spike → Jump to Logs filtered to that time range
  3. Click error log → See associated trace_id
  4. Click trace_id → View full trace waterfall
  5. Identify slow span → See which service caused delay

This is the power of unified observability!


ClickStack vs Datadog: Feature Comparison

FeatureClickStack (Edge)Datadog (Cloud)
Log search✅ Unified (SQL + Lucene)✅ Full-text + structured
Distributed tracing✅ Waterfall, service map✅ Waterfall, service map, flame graphs
Metrics dashboards✅ Custom dashboards✅ Custom dashboards + anomaly detection
Alerting✅ Basic threshold alerts✅ Advanced ML-based alerts
APM✅ Basic (via OpenTelemetry)✅ Full APM (profiling, code hotspots)
Log patterns✅ Pattern detection + clustering✅ Pattern detection + clustering
Query syntax✅ SQL + Lucene (dual mode)⚠️ Proprietary syntax only
High cardinality✅ Billions of labels⚠️ Limited by pricing
JSON support✅ Native JSON columns⚠️ Parsed at query time
DeploymentSelf-hosted (edge)SaaS (cloud)
CostFixed (hardware)Variable (per GB ingested)
Internet required❌ No (works offline)✅ Yes
Retention30-90 days (disk space)15-30 days (default)
Query speed⚡ Sub-second (local)~1-5 seconds (network latency)
Industry adoptionTesla, OpenAI, AnthropicMost Fortune 500

Key takeaway: ClickStack excels at high-volume, edge deployment scenarios. Datadog is better for fleet-wide analysis and advanced features like ML-based anomaly detection.


When to Use ClickStack vs Datadog

Use ClickStack (Edge) When:

  1. Store-specific investigation: Debugging issues at a specific location
  2. High-volume verbose logs: Full transaction logs, debug traces (not sent to Datadog)
  3. Offline scenarios: Store internet is down, need local observability
  4. Historical deep dives: Need data beyond Datadog's retention window
  5. SQL power users: Complex analytics queries on telemetry data
  6. High cardinality queries: Millions of unique label combinations
  7. Cost optimization: Avoid per-GB Datadog ingestion fees

Use Datadog (Cloud) When:

  1. Fleet-wide analysis: Query across all stores simultaneously
  2. Cross-store correlation: Is this affecting multiple locations?
  3. Long-term trend analysis: Months of aggregated data
  4. Advanced features: ML anomaly detection, forecasting, APM profiling
  5. Collaboration: Share links with team (cloud-based access)
  6. Alerting: Sophisticated alert routing and escalation

Use Both When:

  1. Initial investigation: Start in ClickStack (fast, granular, local)
  2. Fleet correlation: Export relevant data to Datadog for cross-store analysis
  3. Edge + cloud debugging: Correlate edge telemetry with cloud service telemetry
  4. Post-incident review: Combine edge + cloud data for comprehensive analysis
  5. Compliance: Keep full audit trail at edge, send summaries to cloud

Conditional Export: ClickStack → Datadog

Export Mechanism

Option 1: API-based export (ClickStack native)

# Export logs from ClickStack to Datadog
clickstack export \
  --source-store 4523 \
  --time-range "2026-02-19T14:00:00Z/2026-02-19T16:00:00Z" \
  --service pos-backend \
  --severity ERROR,WARN \
  --destination datadog \
  --format otlp

Option 2: OpenTelemetry Collector (dual export)

# Configure OTLP collector to send to both ClickHouse and Datadog
exporters:
  clickhouse:
    endpoint: http://localhost:9000
    enabled: true

  datadog:
    api:
      key: ${DD_API_KEY}
    enabled: false  # Enable only when export is needed

Option 3: Datadog Agent on Edge

  • Run Datadog Agent on edge server (disabled by default)
  • Enable agent only when export is needed
  • Configure filters to send specific logs/metrics

Option 4: Batch Export Job

  • Scheduled job runs daily
  • Export aggregated metrics (error counts, P99 latency, etc.)
  • Keep verbose logs at the edge, send summaries to cloud

Export Triggers (Byte Edge Implementation)

Trigger 1: Incident Declared

trigger: incident_declared
store_id: 4523
lookback: 2h
export:
  logs: [ERROR, WARN]
  traces: [status=ERROR]
  metrics: [cpu_usage, memory_usage, error_rate]

Trigger 2: Alert Threshold

trigger: error_rate > 5%
store_id: 4523
lookback: 30m
export:
  logs: [ERROR]
  traces: [status=ERROR, duration>1s]

Trigger 3: Manual Request

# IM engineer manually triggers export
hyperdx-export --store 4523 --time-range "last 2h" --all

Installation & Setup (Overview)

Prerequisites

  • Kubernetes cluster (edge server)
  • Persistent storage (local disk or NFS)
  • 8GB+ RAM for ClickHouse
  • 4GB+ RAM for HyperDX UI
  • 2GB+ RAM for OpenTelemetry Collector

Deployment (Helm Chart)

# Add ClickStack Helm repo
helm repo add clickstack https://clickhouse.com/clickstack

# Install complete ClickStack (ClickHouse + HyperDX + OTLP Collector)
helm install clickstack clickstack/clickstack \
  --namespace observability \
  --set clickhouse.persistence.size=100Gi \
  --set clickhouse.retention.logs=90d \
  --set clickhouse.jsonColumns.enabled=true \
  --set hyperdx.ingress.enabled=true \
  --set hyperdx.auth.enabled=true \
  --set otel.collector.enabled=true \
  --set otel.collector.endpoint=0.0.0.0:4318

Application Instrumentation

// Example: Instrument Node.js app with OpenTelemetry (standard OTLP)
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const provider = new NodeTracerProvider();
provider.addSpanProcessor(
  new BatchSpanProcessor(
    new OTLPTraceExporter({
      url: 'http://clickstack-collector:4318/v1/traces'  // ClickStack OTLP endpoint
    })
  )
);
provider.register();

Key point for IM: You won't be deploying this, but understanding the architecture helps with troubleshooting.

ClickStack vs Individual Components: ClickStack provides a single Helm chart that deploys all three components (ClickHouse, HyperDX, OpenTelemetry) with optimized configurations.


Key Takeaways

  1. ClickStack: Complete observability platform = ClickHouse + HyperDX + OpenTelemetry
  2. ClickHouse: Columnar database, optimized for time-series telemetry data with native JSON support
  3. HyperDX: Unified observability UI with dual query syntax (SQL + Lucene)
  4. OpenTelemetry: Standard ingestion pipeline (OTLP protocol)
  5. Columnar advantage: 10-100x faster queries for analytics workloads
  6. Native JSON columns: Dynamic schema, 10x faster searches, 100x less data scanned
  7. High cardinality: Handle billions of unique label combinations (Tesla: 1B events/sec)
  8. Edge deployment: Runs locally on edge servers, works offline
  9. Dual query syntax: Simple Lucene for exploration, powerful SQL for analytics
  10. Conditional export: Store locally, export to Datadog when needed
  11. Industry adoption: OpenAI (ChatGPT), Anthropic (Claude), Tesla, Shopify
  12. IM workflow: Investigate in ClickStack (fast, local), export to Datadog (collaborate, fleet-wide)

Discussion Questions

Before moving to Module 5, think about:

  1. What would you do if ClickStack itself is down at a store?
  2. How would you handle a store running out of disk space for telemetry?
  3. What telemetry would you export to Datadog after a payment outage?
  4. How would you troubleshoot if ClickHouse queries are slow?
  5. When would you use Lucene syntax vs SQL for querying?
  6. How does ClickStack's JSON column type improve query performance?
  7. What's the benefit of OpenTelemetry's OTLP protocol for edge deployments?

Next Steps

✅ Complete Module 1: Edge Computing ✅ Complete Module 2: Kubernetes Overview ✅ Complete Module 3: Observability & Telemetry ✅ Complete Module 4: ClickStack Deep Dive ⬜ Read Module 5: Local Demo Setup ⬜ Complete Module 6: Hands-On Exercises

Estimated time to next module: 1 day (prepare local environment)


Additional Resources

  • ClickStack Docs: https://clickhouse.com/docs/en/observability/clickstack
  • ClickStack GitHub: https://github.com/clickhouse/clickstack
  • HyperDX GitHub: https://github.com/hyperdxio/hyperdx
  • OpenTelemetry Docs: https://opentelemetry.io/docs/
  • ClickStack Open House: https://www.youtube.com/watch?v=clickstack-openhouse (Tesla, OpenAI, Anthropic use cases)

Reading Checkpoint

Current score: 0%

Sections complete

0/0

Checkpoint confirmed

Not yet

Reflection

0 chars

Completion requires 80% section coverage, checkpoint confirmation, and a short reflection. On completion, you will move to the next module automatically.

Add 40 more characters.

Mark at least 80% of sections complete.