Getting Started with Chaos Engineering

Quick Start Guide

  1. Read the basic concept of Chaos Engineering (crisp version)

  2. Obtain your gremlin team credentials (as team manager)

  3. Install a gremlin agent

    • Note: use the gremlin team credentials provided by your gremlin team manager

  4. Login to App.Gremlin.com

    • Note: if don't have a gremlin user account, ask your gremlin team manager for it

  5. Read "Infrastructure Layer Attacks"

  6. Create and run your first Chaos Engineering experiment

  7. Consult the Gremlin documentation  (self-learning) for more

Beginner's Guide

  1. Run through the "Quick Start Guide" above

  2. Read an introduction to Chaos Engineering

  3. Read "How to Map Out Your Application’s Critical Path"

  4. Read "What is Fault Injection?"

  5. Read the Introduction to Game Days

  6. Read "How to Run a Game Day"

  7. Register for a free hands-on beginner's 101 bootcamp training (2h)
    (strongly recommended to all new gremlin users by the Chaos Engineering Working Group)

    • introduction to the basic concept of Chaos Engineering:

      1. how to approach introducing chaos engineering to a new product / system

      2. how to identify good components to test and think about what tests to run

    • exercising game days with various experiments and outcomes

      1. illustrating gremlin tools, practices, and metrics in Chaos Engineering

    • optional:
      become a Gremlin-certified Chaos Engineering Practitioner  (30 min)

Elaborated Introduction to Chaos Engineering

  1. Run through the "Beginner's Guide" above

  2. Read "Chaos Engineering: The History, Principles, and Practice"

  3. Read "Chaos Engineering Adoption Guide"

  4. Read "Introduction to Game Days"

  5. Read "Planning Your Own Game Day"

    • gameday workbook

  6. Nominate your Game Day Crew

  7. Example: Internal Chaos Engineering Report - 10x Reduction In Incidents

  8. Watch the Webinar: Automating Chaos Engineering in your CI/CD Environments (45min) (15-Dec-2020)

Manager's Guide to Chaos Engineering

  1. Read "Why run a Chaos Day?" (aka Game Day)

  2. Read "How to Convince Your Organization to Adopt Chaos Engineering"

  3. Maybe we should add the KPIs article for manager’s to read since managers/execs will be the ones creating those https://www.gremlin.com/blog/the-kpis-of-improved-reliability/