Deploying to 10 or 20 servers can be complicated. But what are some of the tools, tips and patterns for successfully deploying to 1,000s of servers around the world? Last month I participated in an online panel on the subject of Huge Scale Deployments, as part of Continuous Discussions (#c9d9), a series of community panels about Agile, Continuous Delivery and DevOps hosted by Electric Cloud. You can watch a recording of the panel:
The discussion progressed through a short deck of slides with big ticket titles. Plenty of interesting things were said (mostly by other people), but the main insight I had was that the practices discussed are applicable at any scale. Meanwhile, here are a few edited extracts from my musings on the video.
How do you practice huge scale deployments? How did the teams at Amazon make it work?
“I was a software developer at a little-known Amazon development center in Edinburgh, Scotland. … at Amazon, teams are responsible for provisioning environments, continuous deployment, everything on their own. Because we knew that we were picking up the pieces if it went wrong, we were going in making sure our deployment was good in early stages of the pipeline, because we had responsibility. When people have responsibility they make sure that things don’t break.”
“With regard to incentives – at Amazon there’s no need to dangle carrots, the teams know that everything that happens, designing, provisioning for expected loads, all the way to fixing problems in production, is their responsibility. So everyone practices ‘if it hurts do it more often’. It’s a natural human tendency to back off if there’s pain, but in software development, if something causes you pain you need to really work at that, you’re just not good enough at it yet.”
Fidelity of environments
“It would really be nice to have fidelity of environments, but what I’m seeing more and more with customers is that they have virtualised, containerised environments in dev … and then you get to this legacy of hardware and external services still managed using spreadsheets, by teams that call themselves DevOps now, but essentially they are infrastructure teams, with sign-offs and lots of paper being pushed around. Everybody knows you need high fidelity environments, otherwise you get pain. But the stakeholders are still not paying for it. I try to explain to customers that this pain is also costing a lot of money … that’s not an easy thing to change.”
Fidelity of process
“Here too I would say, ‘yes, please, it would be nice’. I was consulting with a client …. parts of the environment were owned by [different] teams … with different processes – that’s painful. Let’s have a single automated process to deploy to any environment.”
“Feature toggles – these are extremely important for Continuous Delivery. But a lesson learned the hard way is that if you’re going to use feature toggling, remember to tidy up, clean up your feature toggles afterwards when you don’t need them anymore. [Otherwise you’l end up in] a mess, people don’t know what needs to be turned on, things are turned off in production by mistake. So remember to clean up after yourself.”