Skillweave - Incident Learning Platform

Module Content

Not Started

Key Takeaways

ClickStack (complete observability platform):
ClickHouse (columnar database with native JSON support)
HyperDX (unified UI with SQL + Lucene query modes)
OpenTelemetry Collector (OTLP ingestion pipeline)
Sample application (generates logs, metrics, traces via OpenTelemetry)

Overview

Time to complete: ~3-4 hours (including downloads)

Overview

You'll set up a local Kubernetes cluster on your laptop with:

ClickStack (complete observability platform):
ClickHouse (columnar database with native JSON support)
HyperDX (unified UI with SQL + Lucene query modes)
OpenTelemetry Collector (OTLP ingestion pipeline)
Sample application (generates logs, metrics, traces via OpenTelemetry)

Goal: Simulate a restaurant edge environment locally so you can practice incident investigation using ClickStack's dual query modes and unified signal correlation.

Prerequisites

System Requirements

OS: Windows 10/11, macOS, or Linux
RAM: 16GB minimum (20GB+ recommended)
Disk: 20GB free space
CPU: 4+ cores

Tools to Install

1. Docker Desktop

What: Runs containers locally Download: https://www.docker.com/products/docker-desktop

# Verify installation
docker --version
# Expected: Docker version 24.0.0 or later

Windows users: Enable WSL 2 backend in Docker Desktop settings.

2. kubectl

What: Kubernetes command-line tool Download: https://kubernetes.io/docs/tasks/tools/

Windows (PowerShell):

choco install kubernetes-cli
# or download from https://kubernetes.io/docs/tasks/tools/install-kubectl-windows/

macOS:

brew install kubectl

Linux:

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

Verify:

kubectl version --client

3. kind (Kubernetes in Docker)

What: Runs a local Kubernetes cluster in Docker containers Why: Lightweight, fast, perfect for local testing

Windows/macOS/Linux:

# macOS
brew install kind

# Windows (PowerShell)
choco install kind

# Linux
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

Verify:

kind version

Alternative: Use minikube if you prefer (similar setup process).

4. Helm

What: Package manager for Kubernetes (like npm for K8s) Download: https://helm.sh/docs/intro/install/

macOS:

brew install helm

Windows (PowerShell):

choco install kubernetes-helm

Linux:

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Verify:

helm version

Step 1: Create Local Kubernetes Cluster

Create kind Cluster Config

Create file kind-config.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: byte-edge-demo
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 8080
    protocol: TCP
  - containerPort: 443
    hostPort: 8443
    protocol: TCP

Create Cluster

# Create cluster (takes 2-5 minutes)
kind create cluster --config kind-config.yaml

# Verify cluster is running
kubectl cluster-info
kubectl get nodes

# Expected output:
# NAME                            STATUS   ROLES           AGE   VERSION
# byte-edge-demo-control-plane    Ready    control-plane   2m    v1.27.3

Checkpoint: You should have a running K8s cluster!

Step 2: Deploy ClickHouse

Option A: Use Byte Edge Team's Helm Chart

If you have access to byte-edge/demo-helm-chart repo:

# Clone the repo
git clone https://github.com/yum-byte-edge/demo-helm-chart.git
cd demo-helm-chart

# Install ClickHouse
helm install clickhouse ./charts/clickhouse \
  --namespace observability \
  --create-namespace \
  --set persistence.size=10Gi \
  --set resources.requests.memory=4Gi

Option B: Use Public ClickHouse Helm Chart

If you don't have access yet:

# Add ClickHouse Helm repo
helm repo add clickhouse https://docs.clickhouse.com/charts
helm repo update

# Install ClickHouse
helm install clickhouse clickhouse/clickhouse \
  --namespace observability \
  --create-namespace \
  --set persistence.enabled=true \
  --set persistence.size=10Gi \
  --set auth.username=default \
  --set auth.password=clickhouse123

Verify ClickHouse

# Check if ClickHouse pod is running (may take 2-3 minutes)
kubectl get pods -n observability

# Expected output:
# NAME                          READY   STATUS    RESTARTS   AGE
# clickhouse-0                  1/1     Running   0          2m

# Port-forward to access ClickHouse locally
kubectl port-forward -n observability svc/clickhouse 9000:9000 &

# Test connection
echo "SELECT 1" | curl 'http://localhost:9000/' --data-binary @-
# Expected: 1

Checkpoint: ClickHouse is running and queryable!

Step 3: Deploy HyperDX

Option A: Use Byte Edge Team's Helm Chart

# Install HyperDX (from byte-edge/demo-helm-chart repo)
helm install hyperdx ./charts/hyperdx \
  --namespace observability \
  --set clickhouse.host=clickhouse.observability.svc.cluster.local \
  --set clickhouse.port=9000 \
  --set clickhouse.user=default \
  --set clickhouse.password=clickhouse123

Option B: Use Public HyperDX Helm Chart

# Add HyperDX Helm repo
helm repo add hyperdx https://charts.hyperdx.io
helm repo update

# Install HyperDX
helm install hyperdx hyperdx/hyperdx \
  --namespace observability \
  --set clickhouse.enabled=false \
  --set clickhouse.host=clickhouse.observability.svc.cluster.local \
  --set clickhouse.port=9000 \
  --set clickhouse.database=default \
  --set clickhouse.user=default \
  --set clickhouse.password=clickhouse123

Verify HyperDX

# Check pods
kubectl get pods -n observability

# Expected output:
# NAME                          READY   STATUS    RESTARTS   AGE
# clickhouse-0                  1/1     Running   0          5m
# hyperdx-api-xxx               1/1     Running   0          2m
# hyperdx-collector-xxx         1/1     Running   0          2m
# hyperdx-ui-xxx                1/1     Running   0          2m

# Port-forward to access HyperDX UI
kubectl port-forward -n observability svc/hyperdx-ui 8080:80

# Open in browser: http://localhost:8080
# You should see the HyperDX login page!

Checkpoint: HyperDX UI is accessible!

Step 4: Deploy Sample Application

We'll deploy a mock POS backend that generates telemetry.

Create Sample App Manifest

Create file sample-pos-app.yaml:

---
apiVersion: v1
kind: Namespace
metadata:
  name: pos

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pos-backend
  namespace: pos
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pos-backend
  template:
    metadata:
      labels:
        app: pos-backend
    spec:
      containers:
      - name: pos-backend
        image: ghcr.io/open-telemetry/demo:latest
        ports:
        - containerPort: 8080
        env:
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://hyperdx-collector.observability.svc.cluster.local:4318"
        - name: OTEL_SERVICE_NAME
          value: "pos-backend"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "store.id=demo-4523,environment=local"

---
apiVersion: v1
kind: Service
metadata:
  name: pos-backend-service
  namespace: pos
spec:
  selector:
    app: pos-backend
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

Deploy Sample App

# Deploy the app
kubectl apply -f sample-pos-app.yaml

# Check if pods are running
kubectl get pods -n pos

# Expected output:
# NAME                           READY   STATUS    RESTARTS   AGE
# pos-backend-xxx                1/1     Running   0          1m
# pos-backend-yyy                1/1     Running   0          1m

# View logs (should show startup logs)
kubectl logs -n pos -l app=pos-backend --tail=20

Checkpoint: Sample app is running and sending telemetry!

Step 5: Generate Traffic

Let's generate some traffic to create logs/metrics/traces.

Load Generator Script

Create file generate-traffic.sh:

#!/bin/bash

echo "Generating traffic to pos-backend..."

# Port-forward to pos-backend
kubectl port-forward -n pos svc/pos-backend-service 8081:80 &
PORT_FORWARD_PID=$!

# Wait for port-forward to be ready
sleep 3

# Generate traffic
for i in {1..100}; do
  echo "Request $i"

  # Successful request
  curl -s http://localhost:8081/api/orders \
    -H "Content-Type: application/json" \
    -d '{"order_id": "'$i'", "total": 42.50}' > /dev/null

  # Simulate some failures (20% error rate)
  if [ $((i % 5)) -eq 0 ]; then
    curl -s http://localhost:8081/api/orders/invalid > /dev/null
  fi

  sleep 0.5
done

# Kill port-forward
kill $PORT_FORWARD_PID

echo "Traffic generation complete!"

Run Load Generator

# Make script executable
chmod +x generate-traffic.sh

# Run it
./generate-traffic.sh

Result: You should see 100 requests sent, with ~20 failures.

Step 6: Investigate in HyperDX

Access HyperDX UI

# Port-forward HyperDX UI (if not already running)
kubectl port-forward -n observability svc/hyperdx-ui 8080:80

Open: http://localhost:8080

Login

Default credentials (if not configured): admin / admin
Or create a new account

Explore Logs

Navigate to Logs section
Set time range: Last 15 minutes
Filter by service_name = pos-backend
Search for errors: severity_text = ERROR

Expected: You should see ~20 error logs from the failed requests.

Explore Traces

Navigate to Traces section
Filter by service_name = pos-backend
Sort by duration (descending) to find slow traces
Click on a trace to see the waterfall diagram

Expected: You should see traces with different durations.

Create a Dashboard

Navigate to Dashboards
Create new dashboard: "POS Backend Health"
Add panels:

Error rate: count(logs) where severity_text = ERROR
Request count: count(logs)
P99 latency: quantile(0.99, traces.duration_ms)

Checkpoint: You can now investigate telemetry in HyperDX!

Step 7: Practice Incident Investigation

Simulate Incident: High Error Rate

Scenario: POS backend is returning errors for 50% of requests.

1. Generate Error Traffic

Create simulate-incident.sh:

#!/bin/bash

kubectl port-forward -n pos svc/pos-backend-service 8081:80 &
PORT_FORWARD_PID=$!
sleep 3

echo "Simulating incident: High error rate..."

for i in {1..50}; do
  # 50% errors
  if [ $((i % 2)) -eq 0 ]; then
    curl -s http://localhost:8081/api/orders/invalid > /dev/null
  else
    curl -s http://localhost:8081/api/orders \
      -d '{"order_id": "'$i'"}' > /dev/null
  fi
  sleep 0.2
done

kill $PORT_FORWARD_PID
echo "Incident simulation complete!"

chmod +x simulate-incident.sh
./simulate-incident.sh

2. Investigate in HyperDX

Your mission: Identify the root cause of high error rate.

Steps:

Open HyperDX UI
Go to Logs → Filter last 5 minutes
Group by severity_text → See spike in ERROR logs
Click on an ERROR log → View details
Note the trace_id
Go to Traces → Search by trace_id
View waterfall → Identify which service/span failed

Expected finding: Requests to /api/orders/invalid endpoint are returning 404 errors.

Step 8: Export to "Cloud" (Simulated)

Since we don't have real Datadog in this local setup, we'll simulate export.

Extract Logs from ClickHouse

# Port-forward ClickHouse
kubectl port-forward -n observability svc/clickhouse 9000:9000 &

# Query error logs (simulating export to Datadog)
kubectl exec -it -n observability clickhouse-0 -- \
  clickhouse-client --query \
  "SELECT timestamp, service_name, severity_text, body
   FROM default.otel_logs
   WHERE severity_text = 'ERROR'
   AND timestamp > now() - INTERVAL 1 HOUR
   ORDER BY timestamp DESC
   LIMIT 100
   FORMAT JSONEachRow" > exported-logs.json

cat exported-logs.json

Result: You've extracted error logs from the edge (ClickHouse) that could be sent to Datadog.

Troubleshooting

Issue: Pods Stuck in Pending

Cause: Insufficient resources Fix:

# Check resource usage
kubectl top nodes
kubectl top pods -A

# If resources are exhausted, restart Docker Desktop or increase limits

Issue: ClickHouse Won't Start

Cause: Memory limits too low Fix:

# Reduce memory request
helm upgrade clickhouse clickhouse/clickhouse \
  --namespace observability \
  --set resources.requests.memory=2Gi

Issue: Can't Access HyperDX UI

Cause: Port-forward not running or port conflict Fix:

# Kill existing port-forwards
pkill kubectl

# Restart port-forward on different port
kubectl port-forward -n observability svc/hyperdx-ui 9090:80
# Then access: http://localhost:9090

Issue: No Telemetry in HyperDX

Cause: Sample app not sending telemetry to HyperDX collector Fix:

# Check if hyperdx-collector is running
kubectl get pods -n observability

# Check collector logs
kubectl logs -n observability -l app=hyperdx-collector

# Verify sample app environment variables
kubectl describe pod -n pos -l app=pos-backend | grep OTEL

Cleanup

When you're done with the demo:

# Delete the kind cluster (removes everything)
kind delete cluster --name byte-edge-demo

# Or, just delete the deployments (keep cluster for later)
kubectl delete namespace pos
kubectl delete namespace observability

Key Takeaways

Local K8s: kind makes it easy to run K8s on your laptop
HyperDX + ClickHouse: Full observability stack deployed locally
OpenTelemetry: Standard way to instrument apps (logs, metrics, traces)
Port-forwarding: Access services running in K8s from your laptop
Incident investigation: Use HyperDX to query logs, traces, metrics
Export simulation: Extract data from ClickHouse (simulate export to Datadog)

Next Steps

✅ Complete Module 1-4 (Concepts) ✅ Complete Module 5 (Local Setup) ⬜ Complete Module 6: Hands-On Exercises ⬜ Access KFC US lab environment ⬜ Shadow call with Byte Edge engineer

You're now ready for hands-on incident investigation practice!

Appendix: Useful Commands Cheat Sheet

# Cluster management
kind create cluster --name <name>
kind delete cluster --name <name>
kubectl cluster-info
kubectl get nodes

# Namespace operations
kubectl get namespaces
kubectl create namespace <name>
kubectl delete namespace <name>

# Pod operations
kubectl get pods -n <namespace>
kubectl describe pod <pod> -n <namespace>
kubectl logs <pod> -n <namespace>
kubectl logs -f <pod> -n <namespace>  # Follow logs
kubectl exec -it <pod> -n <namespace> -- /bin/bash

# Service operations
kubectl get svc -n <namespace>
kubectl describe svc <service> -n <namespace>
kubectl port-forward -n <namespace> svc/<service> <local-port>:<remote-port>

# Deployment operations
kubectl get deployments -n <namespace>
kubectl describe deployment <deployment> -n <namespace>
kubectl rollout restart deployment/<deployment> -n <namespace>

# Helm operations
helm list -A
helm install <release> <chart> --namespace <namespace>
helm upgrade <release> <chart> --namespace <namespace>
helm uninstall <release> --namespace <namespace>

# Resource monitoring
kubectl top nodes
kubectl top pods -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Resources

kind docs: https://kind.sigs.k8s.io/docs/user/quick-start/
kubectl cheat sheet: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
HyperDX docs: https://www.hyperdx.io/docs
ClickHouse docs: https://clickhouse.com/docs/en/intro
OpenTelemetry: https://opentelemetry.io/docs/

Time checkpoint: If you've completed this module, you've spent ~3-4 hours and now have a working local edge environment! 🎉

Reading Checkpoint

Current score: 0%

Sections complete

0/0

Checkpoint confirmed

Not yet

Reflection

0 chars

Completion requires 80% section coverage, checkpoint confirmation, and a short reflection. On completion, you will move to the next module automatically.

I can explain one operational takeaway from this module and when to apply it. Reflection (40+ chars)

Add 40 more characters.

Mark at least 80% of sections complete.

Local Edge Lab Setup

Module Navigator

Learning Guidance

Lower than expected payment authorization attempts by tender type for ph_mx

Lower than usual payment authorization attempts detected by tender type for ph_mx

Container forcibly terminated due to out of memory in backend-users-deployment-green pod

Module Content

Overview

Overview

Prerequisites

System Requirements

Tools to Install

1. Docker Desktop

2. kubectl

3. kind (Kubernetes in Docker)

4. Helm

Step 1: Create Local Kubernetes Cluster

Create kind Cluster Config

Create Cluster

Step 2: Deploy ClickHouse

Option A: Use Byte Edge Team's Helm Chart

Option B: Use Public ClickHouse Helm Chart

Verify ClickHouse

Step 3: Deploy HyperDX

Option A: Use Byte Edge Team's Helm Chart

Option B: Use Public HyperDX Helm Chart

Verify HyperDX

Step 4: Deploy Sample Application

Create Sample App Manifest

Deploy Sample App

Step 5: Generate Traffic

Load Generator Script

Run Load Generator

Step 6: Investigate in HyperDX

Access HyperDX UI

Login

Explore Logs

Explore Traces

Create a Dashboard

Step 7: Practice Incident Investigation

Simulate Incident: High Error Rate

1. Generate Error Traffic

2. Investigate in HyperDX

Step 8: Export to "Cloud" (Simulated)

Extract Logs from ClickHouse

Troubleshooting

Issue: Pods Stuck in Pending

Issue: ClickHouse Won't Start

Issue: Can't Access HyperDX UI

Issue: No Telemetry in HyperDX

Cleanup

Key Takeaways

Next Steps

Appendix: Useful Commands Cheat Sheet

Resources