Skip to content

AWS SSM Chaos By ID

Introduction

  • AWS SSM Chaos By ID contains chaos to disrupt the state of infra resources. The experiment can induce chaos on AWS EC2 instance using Amazon SSM Run Command This is carried out by using SSM Docs that defines the actions performed by Systems Manager on your managed instances (having SSM agent installed) which let us perform chaos experiments on the instances.
  • It causes chaos (like stress, network, disk or IO) on AWS EC2 instances with given instance ID(s) using SSM docs for a certain chaos duration.
  • For the default execution the experiment uses SSM docs for stress-chaos while you can add your own SSM docs using configMap (.spec.definition.configMaps) in chaosexperiment CR.
  • It tests deployment sanity (replica availability & uninterrupted service) and recovery workflows of the target application pod(if provided).

Scenario: AWS SSM Chaos

AWS SSM Chaos By ID

Uses

View the uses of the experiment

coming soon

Prerequisites

Verify the prerequisites
  • Ensure that Kubernetes Version > 1.16
  • Ensure that the Litmus Chaos Operator is running by executing kubectl get pods in operator namespace (typically, litmus).If not, install from here
  • Ensure that the aws-ssm-chaos-by-id experiment resource is available in the cluster by executing kubectl get chaosexperiments in the desired namespace. If not, install from here
  • Ensure that you have the required AWS access and your target EC2 instances have attached an IAM instance profile. To know more checkout Systems Manager Docs.
  • Ensure to create a Kubernetes secret having the AWS access configuration(key) in the CHAOS_NAMESPACE. A sample secret file looks like:

    apiVersion: v1
    kind: Secret
    metadata:
      name: cloud-secret
    type: Opaque
    stringData:
      cloud_config.yml: |-
        # Add the cloud AWS credentials respectively
        [default]
        aws_access_key_id = XXXXXXXXXXXXXXXXXXX
        aws_secret_access_key = XXXXXXXXXXXXXXX
    
  • If you change the secret key name (from cloud_config.yml) please also update the AWS_SHARED_CREDENTIALS_FILE ENV value on experiment.yamlwith the same name.

Default Validations

View the default validations
  • EC2 instance should be in healthy state.

Minimal RBAC configuration example (optional)

NOTE

If you are using this experiment as part of a litmus workflow scheduled constructed & executed from chaos-center, then you may be making use of the litmus-admin RBAC, which is pre installed in the cluster as part of the agent setup.

View the Minimal RBAC permissions
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: aws-ssm-chaos-by-id-sa
  namespace: default
  labels:
    name: aws-ssm-chaos-by-id-sa
    app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aws-ssm-chaos-by-id-sa
  labels:
    name: aws-ssm-chaos-by-id-sa
    app.kubernetes.io/part-of: litmus
rules:
# Create and monitor the experiment & helper pods
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create","delete","get","list","patch","update", "deletecollection"]
# Performs CRUD operations on the events inside chaosengine and chaosresult
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create","get","list","patch","update"]
# Fetch configmaps & secrets details and mount it to the experiment pod (if specified)
- apiGroups: [""]
  resources: ["secrets","configmaps"]
  verbs: ["get","list",]
# Track and get the runner, experiment, and helper pods log 
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get","list","watch"]  
# for creating and managing to execute comands inside target container
- apiGroups: [""]
  resources: ["pods/exec"]
  verbs: ["get","list","create"]
# for configuring and monitor the experiment job by the chaos-runner pod
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["create","list","get","delete","deletecollection"]
# for creation, status polling and deletion of litmus chaos resources used within a chaos workflow
- apiGroups: ["litmuschaos.io"]
  resources: ["chaosengines","chaosexperiments","chaosresults"]
  verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: aws-ssm-chaos-by-id-sa
  labels:
    name: aws-ssm-chaos-by-id-sa
    app.kubernetes.io/part-of: litmus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: aws-ssm-chaos-by-id-sa
subjects:
- kind: ServiceAccount
  name: aws-ssm-chaos-by-id-sa
  namespace: default

Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app) namespace. This example consists of the minimum necessary role permissions to execute the experiment.

Experiment tunables

check the experiment tunables

Mandatory Fields

Variables Description Notes
EC2_INSTANCE_ID Instance ID of the target ec2 instance. Multiple IDs can also be provided as a comma(,) separated values Multiple IDs can be provided as id1,id2
REGION The region name of the target instace

Optional Fields



Variables Description Notes
TOTAL_CHAOS_DURATION The total time duration for chaos insertion (sec) Defaults to 30s
CHAOS_INTERVAL The interval (in sec) between successive chaos injection Defaults to 60s
AWS_SHARED_CREDENTIALS_FILE Provide the path for aws secret credentials Defaults to /tmp/cloud_config.yml
DOCUMENT_NAME Provide the name of addded ssm docs (if not using the default docs) Default to LitmusChaos-AWS-SSM-Doc
DOCUMENT_FORMAT Provide the format of the ssm docs. It can be YAML or JSON Defaults to YAML
DOCUMENT_TYPE Provide the document type of added ssm docs (if not using the default docs) Defaults to Command
DOCUMENT_PATH Provide the document path if added using configmaps Defaults to the litmus ssm docs path
INSTALL_DEPENDENCIES Select to install dependencies used to run stress-ng with default docs. It can be either True or False Defaults to True
NUMBER_OF_WORKERS Provide the number of workers to run stress-chaos with default ssm docs Defaults to 1
MEMORY_PERCENTAGE Provide the memory consumption in percentage on the instance for default ssm docs Defaults to 80
CPU_CORE Provide the number of cpu cores to run stress-chaos on EC2 with default ssm docs Defaults to 0. It means it'll consume all the available cpu cores on the instance
SEQUENCE It defines sequence of chaos execution for multiple instance Default value: parallel. Supported: serial, parallel
RAMP_TIME Period to wait before and after injection of chaos in sec

Experiment Examples

Common and AWS-SSM specific tunables

Refer the common attributes and AWS-SSM specific tunable to tune the common tunables for all experiments and aws-ssm specific tunables.

Stress Instances By ID

It contains comma separated list of instances IDs subjected to ec2 stop chaos. It can be tuned via EC2_INSTANCE_ID ENV.

Use the following example to tune this:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: aws-ssm-chaos-by-id-sa
  experiments:
  - name: aws-ssm-chaos-by-id
    spec:
      components:
        env:
        # comma separated list of ec2 instance id(s)
        # all instances should belongs to the same region(REGION)
        - name: EC2_INSTANCE_ID
          value: 'instance-01,instance-02'
        # region of the ec2 instance
        - name: REGION
          value: '<region of the EC2_INSTANCE_ID>'
        - name: TOTAL_CHAOS_DURATION
          value: '60'