How to Monitor Kubernetes Using Dynatrace

Kubernetes is powerful — and notoriously hard to observe. This guide walks you through deploying Dynatrace on a Kubernetes cluster, configuring OneAgent, setting up alerting, and leveraging Davis® AI for automated incident analysis.

☸ K8s 1.26+ 📦 Dynatrace SaaS
99.9%
Visibility into all K8s layers
<2 min
OneAgent deployment time
3x
Faster MTTR vs manual tools

Running Kubernetes at scale introduces layers of complexity — pods crash, nodes go OOM, networking flaps, and PVCs get orphaned. Traditional monitoring tools weren't built for this dynamic, ephemeral environment. Dynatrace was.

With full-stack auto-instrumentation, automatic topology discovery via Smartscape, and AI-powered root cause analysis, Dynatrace gives you complete visibility into your cluster without endless manual configuration.

00 Architecture Overview

Before diving in, understand how Dynatrace integrates with Kubernetes. It's a layered approach that covers every component from the node down to individual containers.

Dynatrace ↔ Kubernetes Integration Architecture
Dynatrace SaaS / Managed
▲ metrics · traces · logs · events
Dynatrace Operator
ActiveGate
▼ instrumentation
OneAgent DaemonSet
·
OneAgent CSI Driver
▼ observes
Node / kubelet
·
Pods / Containers
·
Services / Ingress

The Dynatrace Operator manages the entire lifecycle of OneAgent on your cluster. ActiveGate acts as a proxy and pre-aggregator for cluster-level Kubernetes API data. OneAgent runs as a DaemonSet to instrument every node.

01 Prerequisites

Before deploying Dynatrace, make sure your environment meets these requirements:

1

Kubernetes cluster running v1.26+

EKS, GKE, AKS, or self-managed. Dynatrace supports all major distributions including OpenShift and Rancher.

2

kubectl configured with cluster-admin access

The Operator requires permissions to create namespaces, DaemonSets, ClusterRoles, and CRDs.

3

Helm 3.x installed

We'll use Helm to install the Dynatrace Operator. Install via brew install helm or the official Helm docs.

4

Dynatrace environment URL + API token

Log into your Dynatrace tenant. Navigate to Settings → Access Tokens → Generate new token with the required scopes.

📋 Required API Token Scopes

Your API token needs: metrics.ingest, logs.ingest, DataExport, InstallerDownload, and entities.read. For the Operator to manage cluster configuration, also add settings.write.

02 Install the Dynatrace Operator

The Dynatrace Operator is the recommended installation method for Kubernetes. It handles OneAgent deployment, configuration, and updates automatically.

Add the Helm repository

bash
# Add Dynatrace Helm repo
helm repo add dynatrace https://raw.githubusercontent.com/Dynatrace/dynatrace-operator/main/config/helm/repos/stable

# Update your local Helm cache
helm repo update

# Create a dedicated namespace
kubectl create namespace dynatrace

Create API credentials as a Kubernetes Secret

bash
kubectl create secret generic dynatrace-tokens \
  --namespace dynatrace \
  --from-literal=apiToken=YOUR_API_TOKEN \
  --from-literal=dataIngestToken=YOUR_DATA_INGEST_TOKEN

Install the Operator via Helm

bash
helm install dynatrace-operator dynatrace/dynatrace-operator \
  -n dynatrace \
  --atomic

# Verify the operator pod is running
kubectl get pods -n dynatrace

# Expected output:
# NAME                                   READY   STATUS    RESTARTS   AGE
# dynatrace-operator-5d8b9f9d8c-xk2rp   1/1     Running   0          45s

03 Configure the DynaKube Custom Resource

The DynaKube is a Custom Resource Definition (CRD) that tells the Operator exactly how to deploy and configure Dynatrace on your cluster. This is the heart of your configuration.

yaml · dynakube.yaml
apiVersion: dynatrace.com/v1beta1
kind: DynaKube
metadata:
  name: my-cluster
  namespace: dynatrace
spec:
  # Your Dynatrace environment URL
  apiUrl: https://YOUR_ENV_ID.live.dynatrace.com/api

  # Reference to your secret
  tokens: dynatrace-tokens

  # OneAgent - full-stack monitoring per node
  oneAgent:
    classicFullStack:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
          operator: Exists

  # ActiveGate for K8s API monitoring
  activeGate:
    capabilities:
      - kubernetes-monitoring
      - routing
      - metric-ingest
    resources:
      requests:
        memory: "512Mi"
        cpu: "250m"
      limits:
        memory: "1Gi"
        cpu: "500m"

  # Enable Kubernetes event ingestion
  metadataEnrichment:
    enabled: true
bash
# Apply the DynaKube configuration
kubectl apply -f dynakube.yaml

# Watch OneAgent pods come up on every node
kubectl get pods -n dynatrace -w

# You should see one oneagent-* pod per node:
# oneagent-7k2p9    1/1   Running   0   2m
# oneagent-bx8wq    1/1   Running   0   2m
# oneagent-jmn4r    1/1   Running   0   2m
✓ Pro Tip: ClassicFullStack vs CloudNativeFullStack

Use classicFullStack for broad compatibility. If you're on a modern cluster and want minimal footprint, cloudNativeFullStack uses the CSI driver to inject the agent binary per-pod instead of running a kernel module — better for constrained environments.

04 Enable Kubernetes Monitoring in Dynatrace UI

The Operator handles agent deployment, but you also need to enable Kubernetes cluster monitoring in the Dynatrace platform itself.

1

Navigate to Kubernetes in your Dynatrace environment

Go to Infrastructure → Kubernetes. Your cluster should appear automatically after the Operator connects.

2

Connect your cluster

Click on your cluster name → Settings. Toggle on "Monitor Kubernetes namespaces, pods, and workloads" and enable events ingest.

3

Configure namespace filtering (optional)

If you don't want to monitor kube-system or other system namespaces, add exclusion rules in Kubernetes → Settings → Namespace exclusions.

4

Verify data is flowing

Back on the Kubernetes overview, you should see nodes, pods, and workload health within 2–5 minutes of setup.

05 Key Kubernetes Metrics to Monitor

Dynatrace automatically collects hundreds of metrics from your cluster. Here are the most critical ones to build alerts around:

Metric Description Alert Threshold
builtin:kubernetes.node.cpu.usage Node CPU utilization % > 85% for 5 min
builtin:kubernetes.node.memory.usage Node memory usage % > 90% for 5 min
builtin:kubernetes.pod.phase Pod phase (Running/Pending/Failed) Failed > 0
builtin:kubernetes.workload.pods_ready Ready pods vs desired Ready < Desired
builtin:kubernetes.container.restarts Container restart count > 5 in 10 min
builtin:kubernetes.node.condition Node readiness condition NotReady = alert
builtin:kubernetes.pvc.usage Persistent volume usage % > 80% capacity

06 Configure Alerting with Metric Events

Dynatrace's Anomaly Detection handles many alerts automatically via Davis® AI. But you can also define custom metric-based alerts for your specific SLOs.

Create a custom metric alert for pod restarts

Settings API · JSON body
{
  "schemaId": "builtin:anomaly-detection.metric-events",
  "value": {
    "enabled": true,
    "summary": "K8s Pod Crash Loop Detected",
    "queryDefinition": {
      "type": "METRIC_KEY",
      "metricKey": "builtin:kubernetes.container.restarts",
      "aggregation": "MAX",
      "dimensionFilter": []
    },
    "modelProperties": {
      "type": "STATIC_THRESHOLD",
      "threshold": 5,
      "alertCondition": "ABOVE",
      "violatingSamples": 3,
      "samples": 5
    },
    "eventTemplate": {
      "eventType": "ERROR_EVENT",
      "title": "Pod {dims:k8s.pod.name} is crash-looping",
      "description": "Container has restarted {value} times in the window"
    }
  }
}
⚠ Alert Fatigue Warning

Start with Davis® AI anomaly detection enabled before adding manual thresholds. Dynatrace's adaptive baselining learns normal behavior per-entity, significantly reducing false positives compared to static thresholds alone.

07 Leverage Davis® AI for Root Cause Analysis

This is where Dynatrace truly separates itself. Once your cluster is instrumented, Davis® continuously analyzes relationships between all monitored entities — and automatically determines what caused a problem.

🔍

Automatic Root Cause

Davis® correlates events across pods, nodes, services, and infrastructure to pinpoint the exact origin of a problem, not just its symptoms.

🌐

Smartscape Topology

Real-time dependency map of every K8s entity and their relationships. See instantly what depends on what across namespaces.

📉

Adaptive Baselining

Learns what "normal" looks like for each pod, deployment, and service — then alerts only on genuine anomalies, not noisy static thresholds.

💼

Business Impact Scoring

Davis® prioritizes problems by their downstream impact — so your on-call team knows which alert to handle first, always.

💡 Davis® AI in Action — Example

A memory leak in one pod causes OOMKill → cascading restarts → latency spike in a downstream service → user-facing error rate increases. Davis® traces this entire chain automatically and surfaces it as a single root cause problem, not 6 separate alerts.

08 Build Kubernetes Dashboards

Dynatrace ships with pre-built Kubernetes dashboards you can use immediately. Go to Dashboards → Browse → Kubernetes to find templates for cluster overview, node capacity, and workload health.

Sample DQL query for a custom tile

Use Dynatrace Query Language (DQL) in Notebooks or Dashboards to build custom views:

DQL
// Top 10 pods by memory usage in the last 30 minutes
timeseries mem = avg(dt.kubernetes.container.memory_working_set),
  by: { k8s.pod.name, k8s.namespace.name },
  from: now()-30m
| sort arrayAvg(mem) desc
| limit 10

09 Best Practices & Common Pitfalls

Do: Use namespace exclusions

Exclude kube-system, kube-public, and your monitoring namespace from full instrumentation to reduce noise and agent overhead. They still appear in topology, but won't generate application-level alerts.

Do: Define SLOs in Dynatrace

Use Dynatrace SLO definitions to track service availability targets directly in the platform. Navigate to Service Level Objectives and create SLOs tied to your K8s service metrics — Davis® will alert if burn rate threatens your error budget.

Don't: Deploy OneAgent on spot/preemptible nodes without graceful drain

OneAgent needs a few seconds to flush data on shutdown. Add a preStop lifecycle hook with a 10-second sleep to your pods on spot nodes to avoid data gaps during node termination.

Don't: Ignore resource requests/limits on the Operator

In resource-constrained clusters, the Operator and ActiveGate can be throttled. Always set explicit resources.requests and resources.limits in your DynaKube spec as shown in Step 3.

✓ Quick Verification Checklist

After setup, verify: (1) All node OneAgent pods are in Running state. (2) Your cluster appears in Dynatrace → Infrastructure → Kubernetes. (3) Pods and namespaces are visible in the Kubernetes view. (4) A test problem event triggers and appears in Davis® AI Problems feed.