Paving the Road for Proactive Reliability
At Expedia Group, we have thousands of engineers and micro-services. Heterogeneity in terms of infrastructure and technologies used over the years created inefficiencies and posed the need for a set of automated best practices for our engineering teams. Over the past 2 years, using a data-driven approach, we’ve worked on creating a set of platforms that helps teams to adopt good reliability practices, including chaos engineering, release safety, or automatic failover between cloud regions. In this talk we will cover the platforms we’ve built, including how we used data to drive our investment decisions. We’ll also describe how they are integrated with other internal systems such as observability and continuous delivery. Finally, we’ll explain how, with the right buy-in from leadership, we got teams to adopt a proactive reliability mindset, helping us prevent and better prepare our team for incidents.