Aaron Brady

Senior Production Engineer


Presentation: Move to the Cloud, Double in Size, or Automate MySQL Scaling: Pick Three

Hear about Shopify's experience of moving to a cloud-based platform using Kubernetes while simultaneously scaling up and introducing automation into their systems. Learn the tools and processes Shopify used to complete their migration and rapidly gain confidence with brand-new systems in production. This talk specifically covers databases, which are an often overlooked part of the picture when organizations adopt Kubernetes for their applications.

Travis Dunn

Chief Technology Officer


Presentation: How DraftKings Solves the Microservices Murder Mystery with Circuit Breakers

When you adopt a microservices architecture in the cloud, things can and do go wrong. After DraftKings migrated from a monolithic architecture to one based on microservices, they faced some unique load challenges—including massive traffic spikes anytime a popular athlete scores, as users rush to their phones to check their fantasy sports scores. Learn how DraftKings developed a custom circuit breaker framework called Ground Fault that has improved resilience, monitoring, and problem diagnosis.

Liz Fong-Jones

Staff Site Reliability Engineer


Presentation: Solving Reliability Fears with Service Level Objectives

Service level objectives and error budgets are the cornerstone of Site Reliability Engineering and a critical tool for organizations to find an appropriate balance between reliability and rates of feature development. In this talk, you will learn from the Google Customer Reliability Engineering team how to set and measure useful service level indicators and objectives for needs ranging from interactive, latency-sensitive, query-based systems to batch throughput-oriented systems. You will learn how to set high-signal-to-noise-ratio alerting based on the error budget, and how to make longer-term changes to development priorities if your budget is overspent or underspent.

Calvin French-Owen

CTO & Co-Founder


Presentation: Fool Me Once: Building a Culture of "Shared Outages"

We all understand that outages happen. But the leading cause of regressions isn't poorly designed or implemented testing, alerting, or monitoring. Instead it’s a lack of “shared knowledge.” In this talk, Calvin French-Owen walks through the play-by-play of how his team responded to an actual production outage. He also explores Segment's incident response culture and dives deep into the tools they use to achieve a culture of shared knowledge.

Stacy Gorelik

Director of Engineering

Flatiron Health

Presentation: Nuts and Bolts of Building a Platform Team

In most companies, the move to building platforms that serve internal rather than external customers comes at a time when products fail to scale and development costs rise. But there isn't a one-size-fits-all solution when it comes to kickstarting the process of platformization. In this talk, Stacy Gorelik uses the story of building out a platform organization at Flatiron Health to illustrate strategies for fostering effective platform teams and shifting them from firefighting to innovation.

Sovit Jain

Sr. Engineering Manager

Capital One

Brian Chan

Master Software Engineer

Capital One

Presentation: Using Serverless Technologies for Banking Statement Workflows

Serverless technologies are playing an increasingly significant role in modernization efforts. At Capital One, we had the opportunity to completely redesign the engine that generates customers' banking statements as part of the bank's modernization journey. In this talk, we'll share our goals and lessons learned as we built an entirely new solution, which handles everything from the loading of data into Amazon RDS to customer-driven synchronous API calls, using serverless technologies like AWS Lambda and Step Functions. The new, loosely coupled system has allowed us to not only adopt more modern technologies, but to automate most of the formerly manual business processes involved.

Kirk Kaiser



Workshop: Hands on with Distributed Tracing and Datadog APM

Tracing is a specialized form of logging that is designed to work effectively in large, distributed environments. When done right, tracing follows the path of a request across process and service boundaries. This provides a big step -up in application observability and can help inform a developer why certain requests are slow, or why they might have behaved unexpectedly. This tutorial will familiarize users with the benefits of tracing, and describe a general toolkit for emitting traces from applications in a minimally intrusive way. We will walk through a simple example app, which receives an HTTP request, and gradually instrument it to be observable via traces. We will discuss language constructs that can generate traces—namely decorators, monkey-patching and context managers— and give users pointers on how they might add tracing to their own applications and libraries. In the process, users will become familiar with the existing standards for modeling traces, and some of the challenges involved in adhering to this model in a distributed, asynchronous environment.

Brent Langston

Senior Developer Advocate

Amazon Web Services

Workshop: Kubernetes the AWSome Way!

Kubernetes is a popular cloud-native open-source orchestration platform for container management, scaling and automated deployment. It includes a rich set of features such as service discovery, multi-tenancy, stateful containers, resource usage monitoring, and rolling updates. This workshop will get you started with operating a Kubernetes cluster on AWS. In addition, it also explains how to deploy applications to this cluster.
In this code-driven workshop, you will learn how to package, deploy, scale and monitor your application using Kubernetes and the AWS cloud.

Tiffany Low

Tech Lead, Shared Services


Willie Yao

Eng. Manager, Observability


Presentation: Journey to a Service-Oriented Architecture at 1,000-Engineer-Scale

In the evolution of a company’s architecture, there is an inflection point at which the increase in productivity and scalability of a decoupled service-oriented architecture outweighs the investment necessary to adopt a new architecture. At Airbnb, we are in the midst of a multi-year plan to feature-freeze our monolith at a time when we’ve already scaled to 1,000 engineers and have to support the high-velocity feature development and growth of 3 different businesses (Homes, Experiences, and Lux). We will share Airbnb’s learnings from adopting a service-oriented architecture under an environment of demanding development velocity and high organizational headwind.

Brian Lucas

Senior Staff Engineer


Presentation: Avoiding Continuous Disintegration

How do companies quickly deliver changes to complex applications while maintaining a high level of quality? Optimizely's Brian Lucas will discuss how industry leaders like Amazon, Netflix, and Facebook use experimentation to efficiently test releases before making them generally available. He will cover how integrating experimentation techniques, such as feature flagging and traffic splitting, into continuous delivery processes lets engineering teams implement quality safeguards while still being the first to bring new features to customers.

Ed Luna

Solutions Engineer


Workshop: Continuous Automation and Compliance with Chef & Inspec

This workshop will provide an overview of the capabilities of Chef Automate for compliance automation. We will cover how to initially configure the Chef Automates compliance server, perform compliance scans against Windows and Linux nodes, and remediate compliance issues with Chef, and run compliance reports. Learn how to use InSpec to create and modify compliance profiles. Attendees should bring a wifi-enabled laptop to the workshop. The laptop should have SSH/SCP (OpenSSH, puTTY/WinSCP, or equivalent) and the Microsoft Remote Desktop Client (RDP). Most Windows laptops come with RDP pre-installed.

Johan Mjönes

Senior Software Engineer

EA Dice

Presentation: Surviving Blockbuster Game Releases at EA

AAA game launches are huge, worldwide events that don't give you the luxury of gradual load increases or even hockey-stick growth. DICE, the studio behind the Battlefield series and Star Wars: Battlefront, has significant experience in this area with yearly AAA releases. Hear Johan talk about the challenges of surviving the initial peak of a blockbuster game launch while still remaining elastic enough to be cost-effective as the peaks in demand subside.

Daniel Rieder

Sr. Manager, Capacity & Performance


Anatoly Mikhaylov

Senior Operations Engineer


Presentation: Rediscovering the Hidden Capacity Within Your Systems

How much capacity to deploy is a critical business decision, but how well do you understand what that capacity is being used for? Find out how Zendesk analyzed customer behavior to discover capacity that we didn’t know we had. This talk covers techniques for analyzing and characterizing customer traffic. We will explore strategies for partnering with customers to improve their experience while reducing the need to deploy more capacity.

Brian Scanlan

Engineering Manager


Presentation: Volunteers, Not Conscripts: Fixing Out-of-hours On-call

Uptime matters. At Intercom, we believe that keeping our product online and working well at all times is critical to the success of our business. Out-of-hours on-call is inherently disruptive to your life as an engineer. You need to be ready to respond quickly and competently to an alert about something being broken. This means having a decent Internet connection, a computer, power for the computer, whatever you’re using for 2FA, and passwords available. However, we realized that we had ended up with an on-call setup that we weren’t proud of, and had a number of problems to solve. There were too many people on-call at any one moment in time. The quality of alarms and runbooks was inconsistent across teams and there were ad-hoc review processes for new and existing alarms. We decided to attempt to solve these problems by creating a new virtual team that would take over all out-of-hours on-call work, consisting of volunteers, not conscripts, from teams across the engineering organization. This talk goes into the process we applied, the positive impact to our on-call, and lessons learned.

Matt Williams



Workshop: Datadog 101

Transform yourself from a monitoring novice to a Datadog expert with hands-on training led by the engineers who build, maintain, and support Datadog. We’ll share best practices for building insightful dashboards and visualizations, tips for effective alerting, and dive into container monitoring with Autodiscovery. Attendees will leave with hands-on experience using these techniques that they can bring home to their own environments for more effective monitoring.

Bruce Wong

Director, Engineering


James Burns

Software Architect


Presentation: Another Journey of Chaos Engineering

Chaos engineering is here to stay. There's a thriving community, numerous open source projects, a few books, even a startup. Companies are hiring chaos engineers and creating entire teams focused on chaos engineering. This talk is about strategies for launching a chaos engineering movement at your company, as well as the challenges and results you can expect. Learn how chaos engineering and observability together let teams move faster and safer. Bruce Wong and James Burns share their experiences launching the practice of chaos engineering first at Netflix, later at Twilio, and most recently at Stitch Fix.

More speakers and workshops will be announced soon!
Sign up below to get notified.