Tuesday, September 23, 2008

Rolling Restarts, Migrations, and Deployments

I was looking into a question on stackoverflow (sweet site!) and it brought to mind the awesomeness and challenges of dealing with restarting a load balanced array of servers. The awesome part is you can do a deployment with any downtime by taking down a server, updating it, bringing it back up, and moving on to the next one. One issue with this is that you end up having multiple versions of the code live simultaneously. This can get a little weird if a user hits a new view page then the request ends up being loadbalanced to a server that isn't up to date yet.

The solution we came up with (before we heard about SeeSaw) was to take half of the mongrels off line from the load balancer. Shut them down. Update them. Start them up. Put those mongrels back online in the load balancer and take the other half off. Shut the second half down. Update the second half. Start them up. This greatly minimizes the time where you have two different versions of the application running simultaneously. I wrote a windows bat file to do this. (Deploying on Windows is not recommended, btw)

A truly awesome solution to this would be a load balancer that is somehow aware of the version level of the balanced set and just makes the switch for you. Until that is invented, Apache mod_proxy_balancer is easy enough to control remotely.

It is very important to note that having database migrations can make the whole approach a little dangerous. If you have only additive migrations, you can run those at any time before the deployment. If you are removing columns, you need to do it after the deployment. If you are renaming columns, it is better to split it into a create a new column and copy data into it migration to run before deployment and a separate script to remove the old column after deployment. In fact, it may be dangerous to use your regular migrations on a production database in general if you don't make a specific effort to organize them. All of this points to making more frequent deliveries so each update is lower risk and less complex, but that's a subject for another response.

blog comments powered by Disqus