Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Identify your top 5 critical services

  2. Choose one of these critical services:

    1. Monitoring & alerting (Datadog & PagerDuty)

    2. Cache (Redis or Memcache)

    3. Payments

  3. Whiteboard the service with your team

  4. Select the Gremlin Scenario:

    1. Validate Autoscaling

    2. Unavailable Dependency

    3. Host/Container Failure

  5. Determine the magnitude: number of servers/length of time

...

What is the value of Chaos Engineering?

Find your monitoring gaps, reduce signal to noise
“We’ll get paged if that breaks”, until you don’t.
A false sense of security is worse than nothing.

Validate Upstream & Downstream Dependencies
Validate that each new service can fail independently.
Protect against cascading failures and knock-on effects.

Train your teams
We run fire drills, train firefights, and first responders.
Are you investing in your operations teams?

Get A Good Night’s Sleep
We often can’t get a good night’s sleep due to our pager waking us up in the middle of the night, use Chaos Engineering to reduce incidents and increase time spent sleeping in your bed!

...