Last Updated: 2021-08-14
In this tutorial, you will get an overview of the chaos experimentation flow using the LitmusChaos framework. Before we get started, let us take a look at what you'll learn while you walk yourself through this tutorial.
In this tutorial, we will inject a pod-delete fault against a sample microservices application called podtato-head and verify if the service continues to be available through the chaos duration.
Installation of Litmus can be done using either of below methods
1.Create a Litmus namespace in Kubernetes
kubectl create ns litmus
2.Add the Litmus Helm Repo
helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm/
3.Install Litmus
helm install litmuschaos --namespace litmus litmuschaos/litmus
1.Create a Litmus namespace in Kubernetes
kubectl create ns litmus
2.Install Litmus from the manifest
kubectl apply -f https://litmuschaos.github.io/litmus/2.12.0/litmus-2.12.0.yaml
Check whether the Litmus control plane components comprising the Web-UI (frontend), GraphQL server & MongoDB pods are created and running successfully.
kubectl get pods -n litmus
NAME READY STATUS RESTARTS AGE litmusportal-auth-server-7594bf7c46-td8jv 1/1 Running 0 47m litmusportal-frontend-5fb5bb84f7-tcf95 1/1 Running 0 47m litmusportal-server-858584d5b9-r8kkn 1/1 Running 0 47m mongo-0 1/1 Running 0 47m
Note: It might take a couple of minutes for the components to enter the running state.
In this section, we will use the ChaosCenter dashboard to set up your first chaos project, verify the installation of chaos delegate services on the cluster and prepare for chaos scenario execution.
Obtain the service endpoint of the litmus-frontend service, either via nodePort or LoadBalancer URL and open on your preferred browser. In case of the former, use the external IP of any one of the cluster nodes with the assigned nodePort.
kubectl get svc -n litmus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE litmusportal-auth-server-service NodePort 10.76.20.219 <none> 9003:31241/TCP,3030:31248/TCP 49m litmusportal-frontend-service LoadBalancer 10.76.21.27 34.136.203.195 9091:32733/TCP 49m litmusportal-server-service NodePort 10.76.23.74 <none> 9002:30707/TCP,8000:30327/TCP 49m mongo-headless-service ClusterIP None <none> 27017/TCP 49m mongo-service ClusterIP 10.76.26.27 <none> 27017/TCP 49m
Use the default credentials to login.
Username: admin
Password: litmus
Configure a new password for your use.
You have been assigned a default project with Owner permissions
Once the project is created, the cluster is automagically registered as a chaos target via installation of Chaos Delegate. This is represented as "Self-Agent " in the Delegate console of the ChaosCenter Dashboard.
kubectl get pods -n litmus
NAME READY STATUS RESTARTS AGE chaos-exporter-567c7cc8b-96pkf 1/1 Running 0 36m chaos-operator-ce-89b4475f6-smnnl 1/1 Running 0 36m event-tracker-64566957b7-ks5pm 1/1 Running 0 36m litmusportal-auth-server-7594bf7c46-td8jv 1/1 Running 0 47m litmusportal-frontend-5fb5bb84f7-tcf95 1/1 Running 0 47m litmusportal-server-858584d5b9-r8kkn 1/1 Running 0 47m mongo-0 1/1 Running 0 47m subscriber-7b7c598677-d2lf9 1/1 Running 0 36m workflow-controller-5c7fc6fcdd-st7n4 1/1 Running 0 36m
At this point, we are ready to run chaos experiments!
In this section, we will execute a pre-defined chaos scenario to inject a pod-delete/kill fault on a sample microservices application. The chaos scenario is configured to perform the following actions:
Click the "Schedule Chaos Scenario" on the ChaosCenter home page to get started:
Select the "Self-Agent" as the target cluster for chaos execution
Select the option to create a new chaos scenario from pre-defined chaos scenario templates. Amongst the available templates, select the "podtato-head" option.
View the description of the chaos scenario
View the preview of the chaos scenario.
(Optional) You could also edit the chaos scenario YAML depending upon your need at this stage
(Optional) You could also add the steady-state hypothesis for the chaos scenario
(Optional) Verify the hypothesis for the chaos scenario
Assign weights to the chaos experiments that are part of the chaos scenario using the slider. This is useful when there are multiple experiments in the chaos scenario. These weights influence the "Resilience Score" calculation for the chaos scenario.
Schedule the chaos scenario for immediate and one-time execution
Click on "Finish" to launch the Chaos Scenario.
The chaos scenario has started running now
Check the Chaos Scenario Progress
Click on "Show the chaos scenario" graph to check the progress of the chaos scenario steps.
Each chaos scenario consists of a set of well-defined steps. The most common amongst them involve installing the ChaosExperiment template (consisting of low-level details of the fault to be executed) and ChaosEngine custom resource (maps a given application instance to the fault along with the specification of steady-state hypothesis). The latter triggers the creation of chaos pods.
In the current example, litmus deploys a sample multi-replica hello-service application before going on to pull the pod-delete ChaosExperiment template. In the next step, it creates the ChaosEngine to launch the chaos injection via dedicated pods.
The ChaosEngine spec embedded into the chaos scenario spec looks like this:
apiVersion: litmuschaos.io/v1alpha1 kind: ChaosEngine metadata: name: podtato-head-pod-delete-chaos namespace: litmus labels: instance_id: b9d98f68-ac25-4690-adf3-6b83cc7febce context: litmus_podtato-head workflow_name: podtato-head-1628954811 spec: appinfo: appns: litmus app_label: 'app=podtato-head' appkind: 'deployment' annotationCheck: 'false' engineState: 'active' chaosServiceAccount: litmus-admin jobCleanUpPolicy: 'retain' components: runner: imagePullPolicy: Always experiments: - name: pod-delete spec: probe: - name: "check-podtato-head-access-url" type: "httpProbe" httpProbe/inputs: url: "http://podtato.litmus.svc.cluster.local:9000" insecureSkipVerify: false method: get: criteria: "==" responseCode: "200" mode: "Continuous" runProperties: probeTimeout: 1 interval: 1 retry: 1 components: env: - name: TOTAL_CHAOS_DURATION value: '30' # set chaos interval (in sec) as desired - name: CHAOS_INTERVAL value: '10' # pod failures without '--force' & default terminationGracePeriodSeconds - name: FORCE value: 'false'
Watch the action on the cluster where the podtato-head application replica is killed
kubectl get pods -n litmus
NAME READY STATUS RESTARTS AGE chaos-exporter-6bb94bc74f-wkbmw 1/1 Running 0 40m chaos-operator-ce-6bc9c65776-642gl 1/1 Running 0 40m event-tracker-c8c7cdfb5-tkmqd 1/1 Running 0 40m litmusportal-frontend-5c96c9468c-84bb8 1/1 Running 0 40m litmusportal-auth-server-7594bf7c46-td8jv 1/1 Running 0 47m litmusportal-server-6cb8456cb-lh2s9 2/2 Running 0 40m mongo-0 1/1 Running 0 40m pod-delete-fgnx23-zvght 1/1 Running 0 26s podtato-hats-5f6c4d9ff-v7ckw 1/1 Running 0 11m podtato-hats-new-787797c7fd-fgfg2 1/1 Running 0 11m podtato-head-1628954811-1662063892 2/2 Running 0 33s podtato-head-1628954811-693279641 0/2 Completed 0 77s podtato-head-1628954811-740786171 0/2 Completed 0 71s podtato-left-arm-647c44c49f-kz9wm 1/1 Running 0 11m podtato-left-leg-5544c7c88c-kxgxf 1/1 Running 0 11m podtato-main-7bcb959bd8-4trlh 1/1 Running 0 9m28s podtato-main-7bcb959bd8-clj8t 1/1 Terminating 0 18s podtato-main-7bcb959bd8-nqdqg 0/1 ContainerCreating 0 0s podtato-main-pod-delete-chaos9wxpv-runner 1/1 Running 0 29s podtato-right-arm-98bdff545-jst9f 1/1 Running 0 11m podtato-right-leg-68bb97548f-j2p26 1/1 Running 0 11m subscriber-75bd7565cf-shl7t 1/1 Running 1 40m workflow-controller-78dbff5b5d-xcthg 1/1 Running 0 40m
Once the experiment completes, the chaos resources (ChaosEngine CR & accompanying chaos pods) are removed as part of a "revert-chaos" step. The sample application is also removed from the cluster.
View the Experiment Results
The scenario dashboard can be used to view the results of the chaos experiment. Click on the "pod-delete" node on the graph to launch a results console. Click on the "Chaos Results" tab to view the details around success/failure of the steady-state hypothesis constraints (podtato-head website availability through pod deletion period) and the experiment verdict.
Check the Resilience Score achieved for the chaos scenario by clicking on "Show the statistics".
Now click for the drop-down " Show Statistics".
Check the Experiment Statistics and resilience score of the scenario.
In this tutorial, we have covered the steps involved in installation/setup of LitmusChaos and execution of a sample chaos experiment. Refer to Litmus Documentation to learn more about the platform.
Please visit us on our Litmus Slack Channel (in Kubernetes workspace) and tell us how you like LitmusChaos and this tutorial! We are happy to hear your thoughts & suggestions!
Also, make sure to follow us on Twitter to get the latest news on LitmusChaos, our tutorials and newest releases!