Where To Start

How to get started practicing Chaos Engineering in 5 simple steps

  1. Identify your top 5 critical services

  2. Choose one of these critical services

  3. Whiteboard the service with your team

  4. Select the Recommended Gremlin Scenario (based on your use cases)

  5. Determine the magnitude: number of servers/length of time

What uses cases are you focusing on for the next 12 months?

  1. Slow dependency

  2. Unavailable dependency

  3. Starved resources

  4. Auto-scaling for peak traffic

  5. Host failure within a fleet or cluster

  6. Disaster recovery (region failover)

  7. Alerting and monitoring observability

  8. Time-based issues (certificates and DST)

  9. Training for on-call

  10. Application request failures (ALFI)

  11. Playbook validation

What business goals are you focusing on for the next 12 months?

  1. Regulatory compliance

  2. Increase brand trust and reduce customer churn

  3. Validate disaster recovery

  4. Reduce incidents

  5. Automating Chaos Engineering experiments into a build pipeline

  6. Reduce downtime for customers

  7. Migration to a new technology platform

  8. Migration to a Public Cloud

What is your business justification for practicing Chaos Engineering?

  1. Utilize Chaos Engineering as part of your automated test coverage

  2. Lower costs from timely migration or DR testing

  3. Increase engineering velocity

  4. Reduce churned customers

  5. Reduce lost revenue from outages