Move to the Cloud, Double in Size, or Automate MySQL Scaling: Pick Three

July 12
12:50pm–01:30pm

Aaron Brady
Senior Production Engineer
Shopify / Full Bio

In 2017 the Shopify Datastores team had three requirements:

Move to the cloud:
Shopify was moving from our self-hosted platform, Docker and dedicated servers, to using Kubernetes and virtual machines.

For Datastores, the MySQL team, that meant trading long-lived expensive hardware (pets) for disposable virtual machines (cattle) and challenging our assumptions about the stability of the network, performance of disks, and life cycle of our machines.

We had to provide a database service that was more resilient to network and hardware events that happen in a public cloud, and take advantage of opportunities that come from obtaining machines with an API call (instead of a purchase order).

Maintain all of our existing systems, while they nearly double in size:
Creating a new Shopify in the cloud didn't mean the existing infrastructure magically went away. It was a gradual process of moving out of the data center in a sustainable way.

We had to maintain and improve the existing infrastructure to continue to serve our customers while building out its replacement (including our best Black Friday weekend sales ever).

Automate all of the things:
Our databases in the cloud were going to be made of smaller machines, but far more of them. Shopify is sharded into several smaller datasets and the number of machines increased by 10x since the start of our move (including our natural growth).

It was no longer going to be possible to do maintenance by hand. Formerly rare and labor intensive things like creating a new shard went from quarterly to something we would do ten times in a single day.

It's not enough to just build a Shopify in the cloud. We had to move 12 years of our existing merchant data—without stopping the service. And we had to automate it while running it in multiple locations. Without burning out the team from pager fatigue or losing data, and with a better level of confidence and monitoring than in the data centers.

This is the story of our journey, the tools that we had to build to get here, and the processes we used to gain confidence with them.

Back to Agenda Page >