Senior Production Engineer
Presentation: Move to the Cloud, Double in Size, or Automate MySQL Scaling: Pick Three
Hear about Shopify's experience of moving to a cloud-based platform using Kubernetes while simultaneously scaling up and introducing automation into their systems. Learn the tools and processes Shopify used to complete their migration and rapidly gain confidence with brand-new systems in production. This talk specifically covers databases, which are an often overlooked part of the picture when organizations adopt Kubernetes for their applications.
Chief Technology Officer
Presentation: How DraftKings Solves the Microservices Murder Mystery with Circuit Breakers
When you adopt a microservices architecture in the cloud, things can and do go wrong. After DraftKings migrated from a monolithic architecture to one based on microservices, they faced some unique load challenges—including massive traffic spikes anytime a popular athlete scores, as users rush to their phones to check their fantasy sports scores. Learn how DraftKings developed a custom circuit breaker framework called Ground Fault that has improved resilience, monitoring, and problem diagnosis.
Staff Site Reliability Engineer
Presentation: Solving Reliability Fears with Service Level Objectives
Service level objectives and error budgets are the cornerstone of Site Reliability Engineering and a critical tool for organizations to find an appropriate balance between reliability and rates of feature development. In this talk, you will learn from the Google Customer Reliability Engineering team how to set and measure useful service level indicators and objectives for needs ranging from interactive, latency-sensitive, query-based systems to batch throughput-oriented systems. You will learn how to set high-signal-to-noise-ratio alerting based on the error budget, and how to make longer-term changes to development priorities if your budget is overspent or underspent.
CTO & Co-Founder
Presentation: Fool Me Once: Building a Culture of "Shared Outages"
We all understand that outages happen. But the leading cause of regressions isn't poorly designed or implemented testing, alerting, or monitoring. Instead it’s a lack of “shared knowledge.” In this talk, Calvin French-Owen walks through the play-by-play of how his team responded to an actual production outage. He also explores Segment's incident response culture and dives deep into the tools they use to achieve a culture of shared knowledge.
Director of Engineering
Presentation: Nuts and Bolts of Building a Platform Team
In most companies, the move to building platforms that serve internal rather than external customers comes at a time when products fail to scale and development costs rise. But there isn't a one-size-fits-all solution when it comes to kickstarting the process of platformization. In this talk, Stacy Gorelik uses the story of building out a platform organization at Flatiron Health to illustrate strategies for fostering effective platform teams and shifting them from firefighting to innovation.
Presentation: Using Serverless Technologies for Banking Statement Workflows
Serverless technologies are playing an increasingly significant role in modernization efforts. At Capital One, we had the opportunity to completely redesign the engine that generates customers' banking statements as part of the bank's modernization journey. In this talk, we'll share our goals and lessons learned as we built an entirely new solution, which handles everything from the loading of data into Amazon RDS to customer-driven synchronous API calls, using serverless technologies like AWS Lambda and Step Functions. The new, loosely coupled system has allowed us to not only adopt more modern technologies, but to automate most of the formerly manual business processes involved.
Workshop: Hands on with Distributed Tracing and Datadog APM
Tracing is a specialized form of logging that is designed to work effectively in large, distributed environments. When done right, tracing follows the path of a request across process and service boundaries. This provides a big step -up in application observability and can help inform a developer why certain requests are slow, or why they might have behaved unexpectedly. This tutorial will familiarize users with the benefits of tracing, and describe a general toolkit for emitting traces from applications in a minimally intrusive way. We will walk through a simple example app, which receives an HTTP request, and gradually instrument it to be observable via traces. We will discuss language constructs that can generate traces—namely decorators, monkey-patching and context managers— and give users pointers on how they might add tracing to their own applications and libraries. In the process, users will become familiar with the existing standards for modeling traces, and some of the challenges involved in adhering to this model in a distributed, asynchronous environment.
Senior Developer Advocate
Amazon Web Services
Workshop: Kubernetes the AWSome Way!
Kubernetes is a popular cloud-native open-source orchestration platform for container management, scaling and automated deployment. It includes a rich set of features such as service discovery, multi-tenancy, stateful containers, resource usage monitoring, and rolling updates. This workshop will get you started with operating a Kubernetes cluster on AWS. In addition, it also explains how to deploy applications to this cluster.
In this code-driven workshop, you will learn how to package, deploy, scale and monitor your application using Kubernetes and the AWS cloud.
Presentation: Journey to a Service-Oriented Architecture at 1,000-Engineer-Scale
In the evolution of a company’s architecture, there is an inflection point at which the increase in productivity and scalability of a decoupled service-oriented architecture outweighs the investment necessary to adopt a new architecture. At Airbnb, we are in the midst of a multi-year plan to feature-freeze our monolith at a time when we’ve already scaled to 1,000 engineers and have to support the high-velocity feature development and growth of 3 different businesses (Homes, Experiences, and Lux). We will share Airbnb’s learnings from adopting a service-oriented architecture under an environment of demanding development velocity and high organizational headwind.
Senior Staff Engineer
Presentation: Avoiding Continuous Disintegration
How do companies quickly deliver changes to complex applications while maintaining a high level of quality? Optimizely's Brian Lucas will discuss how industry leaders like Amazon, Netflix, and Facebook use experimentation to efficiently test releases before making them generally available. He will cover how integrating experimentation techniques, such as feature flagging and traffic splitting, into continuous delivery processes lets engineering teams implement quality safeguards while still being the first to bring new features to customers.
Workshop: Continuous Automation and Compliance with Chef & Inspec
This workshop will provide an overview of the capabilities of Chef Automate for compliance automation. We will cover how to initially configure the Chef Automates compliance server, perform compliance scans against Windows and Linux nodes, and remediate compliance issues with Chef, and run compliance reports. Learn how to use InSpec to create and modify compliance profiles. Attendees should bring a wifi-enabled laptop to the workshop. The laptop should have SSH/SCP (OpenSSH, puTTY/WinSCP, or equivalent) and the Microsoft Remote Desktop Client (RDP). Most Windows laptops come with RDP pre-installed.
Senior Software Engineer
Presentation: Surviving Blockbuster Game Releases at EA
AAA game launches are huge, worldwide events that don't give you the luxury of gradual load increases or even hockey-stick growth. DICE, the studio behind the Battlefield series and Star Wars: Battlefront, has significant experience in this area with yearly AAA releases. Hear Johan talk about the challenges of surviving the initial peak of a blockbuster game launch while still remaining elastic enough to be cost-effective as the peaks in demand subside.
Sr. Manager, Capacity & Performance
Senior Operations Engineer
Presentation: Rediscovering the Hidden Capacity Within Your Systems
How much capacity to deploy is a critical business decision, but how well do you understand what that capacity is being used for? Find out how Zendesk analyzed customer behavior to discover capacity that we didn’t know we had. This talk covers techniques for analyzing and characterizing customer traffic. We will explore strategies for partnering with customers to improve their experience while reducing the need to deploy more capacity.
Presentation: Volunteers, Not Conscripts: Fixing Out-of-hours On-call
Uptime matters. At Intercom, we believe that keeping our product online and working well at all times is critical to the success of our business. Out-of-hours on-call is inherently disruptive to your life as an engineer. You need to be ready to respond quickly and competently to an alert about something being broken. This means having a decent Internet connection, a computer, power for the computer, whatever you’re using for 2FA, and passwords available. However, we realized that we had ended up with an on-call setup that we weren’t proud of, and had a number of problems to solve. There were too many people on-call at any one moment in time. The quality of alarms and runbooks was inconsistent across teams and there were ad-hoc review processes for new and existing alarms. We decided to attempt to solve these problems by creating a new virtual team that would take over all out-of-hours on-call work, consisting of volunteers, not conscripts, from teams across the engineering organization. This talk goes into the process we applied, the positive impact to our on-call, and lessons learned.
Workshop: Datadog 101
Transform yourself from a monitoring novice to a Datadog expert with hands-on training led by the engineers who build, maintain, and support Datadog. We’ll share best practices for building insightful dashboards and visualizations, tips for effective alerting, and dive into container monitoring with Autodiscovery. Attendees will leave with hands-on experience using these techniques that they can bring home to their own environments for more effective monitoring.
Presentation: Another Journey of Chaos Engineering
Chaos engineering is here to stay. There's a thriving community, numerous open source projects, a few books, even a startup. Companies are hiring chaos engineers and creating entire teams focused on chaos engineering. This talk is about strategies for launching a chaos engineering movement at your company, as well as the challenges and results you can expect. Learn how chaos engineering and observability together let teams move faster and safer. Bruce Wong and James Burns share their experiences launching the practice of chaos engineering first at Netflix, later at Twilio, and most recently at Stitch Fix.
More speakers and workshops will be announced soon!
Sign up below to get notified.