Skip to content

Pod Dns Error

Introduction

  • Pod-dns-error injects chaos to disrupt dns resolution in kubernetes pods.
  • It causes loss of access to services by blocking dns resolution of hostnames/domains

Scenario: DNS error for the target pod

Pod DNS Error

Uses

View the uses of the experiment

coming soon

Prerequisites

Verify the prerequisites
  • Ensure that Kubernetes Version > 1.16
  • Ensure that the Litmus Chaos Operator is running by executing kubectl get pods in operator namespace (typically, litmus).If not, install from here
  • Ensure that the pod-dns-error experiment resource is available in the cluster by executing kubectl get chaosexperiments in the desired namespace. If not, install from here

Default Validations

View the default validations

The application pods should be in running state before and after chaos injection.

Minimal RBAC configuration example (optional)

NOTE

If you are using this experiment as part of a litmus workflow scheduled constructed & executed from chaos-center, then you may be making use of the litmus-admin RBAC, which is pre installed in the cluster as part of the agent setup.

View the Minimal RBAC permissions

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pod-dns-error-sa
  namespace: default
  labels:
    name: pod-dns-error-sa
    app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-dns-error-sa
  namespace: default
  labels:
    name: pod-dns-error-sa
    app.kubernetes.io/part-of: litmus
rules:
  # Create and monitor the experiment & helper pods
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["create","delete","get","list","patch","update", "deletecollection"]
  # Performs CRUD operations on the events inside chaosengine and chaosresult
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create","get","list","patch","update"]
  # Fetch configmaps details and mount it to the experiment pod (if specified)
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get","list",]
  # Track and get the runner, experiment, and helper pods log
  - apiGroups: [""]
    resources: ["pods/log"]
    verbs: ["get","list","watch"]
  # for creating and managing to execute comands inside target container
  - apiGroups: [""]
    resources: ["pods/exec"]
    verbs: ["get","list","create"]
  # deriving the parent/owner details of the pod(if parent is anyof {deployment, statefulset, daemonsets})
  - apiGroups: ["apps"]
    resources: ["deployments","statefulsets","replicasets", "daemonsets"]
    verbs: ["list","get"]
  # deriving the parent/owner details of the pod(if parent is deploymentConfig)
  - apiGroups: ["apps.openshift.io"]
    resources: ["deploymentconfigs"]
    verbs: ["list","get"]
  # deriving the parent/owner details of the pod(if parent is deploymentConfig)
  - apiGroups: [""]
    resources: ["replicationcontrollers"]
    verbs: ["get","list"]
  # deriving the parent/owner details of the pod(if parent is argo-rollouts)
  - apiGroups: ["argoproj.io"]
    resources: ["rollouts"]
    verbs: ["list","get"]
  # for configuring and monitor the experiment job by the chaos-runner pod
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["create","list","get","delete","deletecollection"]
  # for creation, status polling and deletion of litmus chaos resources used within a chaos workflow
  - apiGroups: ["litmuschaos.io"]
    resources: ["chaosengines","chaosexperiments","chaosresults"]
    verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-dns-error-sa
  namespace: default
  labels:
    name: pod-dns-error-sa
    app.kubernetes.io/part-of: litmus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-dns-error-sa
subjects:
  - kind: ServiceAccount
    name: pod-dns-error-sa
    namespace: default
Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app) namespace. This example consists of the minimum necessary role permissions to execute the experiment.

Experiment tunables

check the experiment tunables

Optional Fields

Variables Description Notes
TARGET_CONTAINER Name of container which is subjected to dns-error None
TOTAL_CHAOS_DURATION The time duration for chaos insertion (seconds) Default (60s)
TARGET_HOSTNAMES List of the target hostnames or keywords eg. '["litmuschaos"]' If not provided, all hostnames/domains will be targeted
MATCH_SCHEME Determines whether the dns query has to match exactly with one of the targets or can have any of the targets as substring. Can be either exact or substring if not provided, it will be set as exact
PODS_AFFECTED_PERC The Percentage of total pods to target Defaults to 0 (corresponds to 1 replica), provide numeric value only
CONTAINER_RUNTIME container runtime interface for the cluster Defaults to containerd, supported values: docker
SOCKET_PATH Path of the docker socket file Defaults to /run/containerd/containerd.sock
LIB The chaos lib used to inject the chaos Default value: litmus, supported values: litmus
LIB_IMAGE Image used to run the netem command Defaults to litmuschaos/go-runner:latest
RAMP_TIME Period to wait before and after injection of chaos in sec
SEQUENCE It defines sequence of chaos execution for multiple target pods Default value: parallel. Supported: serial, parallel

Experiment Examples

Common and Pod specific tunables

Refer the common attributes and Pod specific tunable to tune the common tunables for all experiments and pod specific tunables.

Target Host Names

It defines the comma-separated name of the target hosts subjected to chaos. It can be tuned with the TARGET_HOSTNAMES ENV. If TARGET_HOSTNAMESnot provided then all hostnames/domains will be targeted.

Use the following example to tune this:

# contains the target host names for the dns error
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: pod-dns-error-sa
  experiments:
  - name: pod-dns-error
    spec:
      components:
        env:
        ## comma separated list of host names
        ## if not provided, all hostnames/domains will be targeted
        - name: TARGET_HOSTNAMES
          value: '["litmuschaos"]'
        - name: TOTAL_CHAOS_DURATION
          value: '60'

Match Scheme

It determines whether the DNS query has to match exactly with one of the targets or can have any of the targets as a substring. It can be tuned with MATCH_SCHEME ENV. It supports exact or substring values.

Use the following example to tune this:

# contains match scheme for the dns error
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: pod-dns-error-sa
  experiments:
  - name: pod-dns-error
    spec:
      components:
        env:
        ## it supports 'exact' and 'substring' values
        - name: MATCH_SCHEME
          value: 'exact' 
        - name: TOTAL_CHAOS_DURATION
          value: '60'

Container Runtime Socket Path

It defines the CONTAINER_RUNTIME and SOCKET_PATH ENV to set the container runtime and socket file path.

  • CONTAINER_RUNTIME: It supports docker runtime only.
  • SOCKET_PATH: It contains path of docker socket file by default(/run/containerd/containerd.sock).

Use the following example to tune this:

## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: pod-dns-error-sa
  experiments:
  - name: pod-dns-error
    spec:
      components:
        env:
        # runtime for the container
        # supports docker
        - name: CONTAINER_RUNTIME
          value: 'containerd'
        # path of the socket file
        - name: SOCKET_PATH
          value: '/run/containerd/containerd.sock'
        - name: TOTAL_CHAOS_DURATION
          value: '60'