Running Kubernetes at scale introduces layers of complexity — pods crash, nodes go OOM, networking flaps, and PVCs get orphaned. Traditional monitoring tools weren't built for this dynamic, ephemeral environment. Dynatrace was.
With full-stack auto-instrumentation, automatic topology discovery via Smartscape, and AI-powered root cause analysis, Dynatrace gives you complete visibility into your cluster without endless manual configuration.
00 Architecture Overview
Before diving in, understand how Dynatrace integrates with Kubernetes. It's a layered approach that covers every component from the node down to individual containers.
The Dynatrace Operator manages the entire lifecycle of OneAgent on your cluster. ActiveGate acts as a proxy and pre-aggregator for cluster-level Kubernetes API data. OneAgent runs as a DaemonSet to instrument every node.
01 Prerequisites
Before deploying Dynatrace, make sure your environment meets these requirements:
Kubernetes cluster running v1.26+
EKS, GKE, AKS, or self-managed. Dynatrace supports all major distributions including OpenShift and Rancher.
kubectl configured with cluster-admin access
The Operator requires permissions to create namespaces, DaemonSets, ClusterRoles, and CRDs.
Helm 3.x installed
We'll use Helm to install the Dynatrace Operator. Install via brew install helm or the official Helm docs.
Dynatrace environment URL + API token
Log into your Dynatrace tenant. Navigate to Settings → Access Tokens → Generate new token with the required scopes.
Your API token needs: metrics.ingest, logs.ingest, DataExport, InstallerDownload, and entities.read. For the Operator to manage cluster configuration, also add settings.write.
02 Install the Dynatrace Operator
The Dynatrace Operator is the recommended installation method for Kubernetes. It handles OneAgent deployment, configuration, and updates automatically.
Add the Helm repository
# Add Dynatrace Helm repo helm repo add dynatrace https://raw.githubusercontent.com/Dynatrace/dynatrace-operator/main/config/helm/repos/stable # Update your local Helm cache helm repo update # Create a dedicated namespace kubectl create namespace dynatrace
Create API credentials as a Kubernetes Secret
kubectl create secret generic dynatrace-tokens \ --namespace dynatrace \ --from-literal=apiToken=YOUR_API_TOKEN \ --from-literal=dataIngestToken=YOUR_DATA_INGEST_TOKEN
Install the Operator via Helm
helm install dynatrace-operator dynatrace/dynatrace-operator \ -n dynatrace \ --atomic # Verify the operator pod is running kubectl get pods -n dynatrace # Expected output: # NAME READY STATUS RESTARTS AGE # dynatrace-operator-5d8b9f9d8c-xk2rp 1/1 Running 0 45s
03 Configure the DynaKube Custom Resource
The DynaKube is a Custom Resource Definition (CRD) that tells the Operator exactly how to deploy and configure Dynatrace on your cluster. This is the heart of your configuration.
apiVersion: dynatrace.com/v1beta1 kind: DynaKube metadata: name: my-cluster namespace: dynatrace spec: # Your Dynatrace environment URL apiUrl: https://YOUR_ENV_ID.live.dynatrace.com/api # Reference to your secret tokens: dynatrace-tokens # OneAgent - full-stack monitoring per node oneAgent: classicFullStack: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists # ActiveGate for K8s API monitoring activeGate: capabilities: - kubernetes-monitoring - routing - metric-ingest resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" # Enable Kubernetes event ingestion metadataEnrichment: enabled: true
# Apply the DynaKube configuration kubectl apply -f dynakube.yaml # Watch OneAgent pods come up on every node kubectl get pods -n dynatrace -w # You should see one oneagent-* pod per node: # oneagent-7k2p9 1/1 Running 0 2m # oneagent-bx8wq 1/1 Running 0 2m # oneagent-jmn4r 1/1 Running 0 2m
Use classicFullStack for broad compatibility. If you're on a modern cluster and want minimal footprint, cloudNativeFullStack uses the CSI driver to inject the agent binary per-pod instead of running a kernel module — better for constrained environments.
04 Enable Kubernetes Monitoring in Dynatrace UI
The Operator handles agent deployment, but you also need to enable Kubernetes cluster monitoring in the Dynatrace platform itself.
Navigate to Kubernetes in your Dynatrace environment
Go to Infrastructure → Kubernetes. Your cluster should appear automatically after the Operator connects.
Connect your cluster
Click on your cluster name → Settings. Toggle on "Monitor Kubernetes namespaces, pods, and workloads" and enable events ingest.
Configure namespace filtering (optional)
If you don't want to monitor kube-system or other system namespaces, add exclusion rules in Kubernetes → Settings → Namespace exclusions.
Verify data is flowing
Back on the Kubernetes overview, you should see nodes, pods, and workload health within 2–5 minutes of setup.
05 Key Kubernetes Metrics to Monitor
Dynatrace automatically collects hundreds of metrics from your cluster. Here are the most critical ones to build alerts around:
| Metric | Description | Alert Threshold |
|---|---|---|
| builtin:kubernetes.node.cpu.usage | Node CPU utilization % | > 85% for 5 min |
| builtin:kubernetes.node.memory.usage | Node memory usage % | > 90% for 5 min |
| builtin:kubernetes.pod.phase | Pod phase (Running/Pending/Failed) | Failed > 0 |
| builtin:kubernetes.workload.pods_ready | Ready pods vs desired | Ready < Desired |
| builtin:kubernetes.container.restarts | Container restart count | > 5 in 10 min |
| builtin:kubernetes.node.condition | Node readiness condition | NotReady = alert |
| builtin:kubernetes.pvc.usage | Persistent volume usage % | > 80% capacity |
06 Configure Alerting with Metric Events
Dynatrace's Anomaly Detection handles many alerts automatically via Davis® AI. But you can also define custom metric-based alerts for your specific SLOs.
Create a custom metric alert for pod restarts
{
"schemaId": "builtin:anomaly-detection.metric-events",
"value": {
"enabled": true,
"summary": "K8s Pod Crash Loop Detected",
"queryDefinition": {
"type": "METRIC_KEY",
"metricKey": "builtin:kubernetes.container.restarts",
"aggregation": "MAX",
"dimensionFilter": []
},
"modelProperties": {
"type": "STATIC_THRESHOLD",
"threshold": 5,
"alertCondition": "ABOVE",
"violatingSamples": 3,
"samples": 5
},
"eventTemplate": {
"eventType": "ERROR_EVENT",
"title": "Pod {dims:k8s.pod.name} is crash-looping",
"description": "Container has restarted {value} times in the window"
}
}
}
Start with Davis® AI anomaly detection enabled before adding manual thresholds. Dynatrace's adaptive baselining learns normal behavior per-entity, significantly reducing false positives compared to static thresholds alone.
07 Leverage Davis® AI for Root Cause Analysis
This is where Dynatrace truly separates itself. Once your cluster is instrumented, Davis® continuously analyzes relationships between all monitored entities — and automatically determines what caused a problem.
Automatic Root Cause
Davis® correlates events across pods, nodes, services, and infrastructure to pinpoint the exact origin of a problem, not just its symptoms.
Smartscape Topology
Real-time dependency map of every K8s entity and their relationships. See instantly what depends on what across namespaces.
Adaptive Baselining
Learns what "normal" looks like for each pod, deployment, and service — then alerts only on genuine anomalies, not noisy static thresholds.
Business Impact Scoring
Davis® prioritizes problems by their downstream impact — so your on-call team knows which alert to handle first, always.
A memory leak in one pod causes OOMKill → cascading restarts → latency spike in a downstream service → user-facing error rate increases. Davis® traces this entire chain automatically and surfaces it as a single root cause problem, not 6 separate alerts.
08 Build Kubernetes Dashboards
Dynatrace ships with pre-built Kubernetes dashboards you can use immediately. Go to Dashboards → Browse → Kubernetes to find templates for cluster overview, node capacity, and workload health.
Sample DQL query for a custom tile
Use Dynatrace Query Language (DQL) in Notebooks or Dashboards to build custom views:
// Top 10 pods by memory usage in the last 30 minutes timeseries mem = avg(dt.kubernetes.container.memory_working_set), by: { k8s.pod.name, k8s.namespace.name }, from: now()-30m | sort arrayAvg(mem) desc | limit 10
09 Best Practices & Common Pitfalls
Do: Use namespace exclusions
Exclude kube-system, kube-public, and your monitoring namespace from full instrumentation to reduce noise and agent overhead. They still appear in topology, but won't generate application-level alerts.
Do: Define SLOs in Dynatrace
Use Dynatrace SLO definitions to track service availability targets directly in the platform. Navigate to Service Level Objectives and create SLOs tied to your K8s service metrics — Davis® will alert if burn rate threatens your error budget.
Don't: Deploy OneAgent on spot/preemptible nodes without graceful drain
OneAgent needs a few seconds to flush data on shutdown. Add a preStop lifecycle hook with a 10-second sleep to your pods on spot nodes to avoid data gaps during node termination.
Don't: Ignore resource requests/limits on the Operator
In resource-constrained clusters, the Operator and ActiveGate can be throttled. Always set explicit resources.requests and resources.limits in your DynaKube spec as shown in Step 3.
After setup, verify: (1) All node OneAgent pods are in Running state. (2) Your cluster appears in Dynatrace → Infrastructure → Kubernetes. (3) Pods and namespaces are visible in the Kubernetes view. (4) A test problem event triggers and appears in Davis® AI Problems feed.