NOTE: Databases can be fickle beasts and I take no responsibility for any damage this article may cause. Always backup your data and check that your backup plan works!

Databases can be hard to manage when you have multiple instances to keep up to date. You might have several development databases (usually one on each developer's machine), a test database, one or more production databases, and you're really in trouble if the client databases are located offsite on a client's network inside their firewall. Needless to say you really need to have some way to keep all of these databases consistent. This is where database versioning helps out.

Database versioning (like source code versioning) helps us to keep a record of what the database looked like at a given point of time and allow us to update existing databases to that state. It's almost painfully simple to do and saves so many headaches.

Why Should You Do This?

The first reason you should be versioning your database is to allow all the developers to have the same database schema. If any of your Agile projects are like mine, your clients are always asking for changes and new features that require some kind of schema change. Usually, they're small but we still have to get these changes out to everyone on the development team and it's a real pain to have to reset your development database or email out changes to everyone every time a schema change is required.

It's also helpful to have for deployments to testing and production. Because you already have a way to quickly alter the database updating your testing and production databases is simple and (hopefully) painless. It's always a good idea to backup the database before you run your script (I run mysqldump before my script updates the database in addition to the daily backups) and to occasionally pull down a copy of the database to test the changes on the live data. I recommend doing this at least one every other push to production (if you have long times between pushes you should check it every time).

Finally, it's good for testing. If you're extremely lucky you can force clients to update to the latest version of your software but it's not always that easy. They might have a really good reason to stay at the older version and it might really suck to loose them. By versioning your database you can revert your code to the tag/branch that client is using, find and fix the bug in that version, and create a patch/hotfix for that client.

Just a Bunch of SQL Files

The simplest way to do this is to create an SQL file for each group of changes you want to perform to the database. I try to stick to what I would keep in a commit so that the file doesn't get too large but make sure you don't have hundreds of files with single changes in them if you can help it.

I organize them based on the year, month, and day the file was created and then a zero padding serial number. I used to have one giant year folder but that can become unwieldily if you have a lot of files.

If you have a lot of developers making changes to the schema it can be helpful to include your initials in the filename (2013/03/25-0001.smk.sql) so you don't have to deal with a lot of problems related to duplicate filenames when you pull down changes.

We don't usually use views and stored procedures but you would want to save these in separate files that contain the entire call to create the view or stored procedure. Because there isn't any actual data stored inside the stored procedures and views you don't have to worry about losing them so you can delete them all, perform the database schema changes, and then recreate them.

A Way to Keep Track

The second thing that you need to do is create a database table that is outside of this process (don't version control it) to keep track of what files have been applied to the database. Make sure it's name is completely outside your normal database naming scheme so you don't have conflicts. I recommend picking something crazy like databaseHistoryListing42. We ended up using this name because our database tables were named with underscores so this shouldn't ever conflict with anything else we want to do.

This table really only needs to have a single column named filename that contains all the files that have been applied to the database. You could add additional fields like when the change was applied but we haven't ever needed them.

You could keep track of this information in an external file but if it's all in the database you can easily restore a client's database to your development machine to track down a tricky bug.

A Script To Apply Them

The last thing is a simple script to apply these changes. I highly recommend you BACKUP your database BEFORE you apply and changes files in the script so you can restore if something goes wrong. This has saved our butt repeatedly.

The basic process is:

  1. Enable maintenance mode (if you have one)
  2. Backup the database (if it feels like I'm beating a dead horse then good)
  3. Delete the stored procedures and views
  4. Apply any changes that haven't already been applied
  5. Recreate the stored procedures and views
  6. Get out of maintenance mode

Now Go Do It

Now that you have an understanding of why, go out and do it. It may not help you but always remember the person who might have to take over your project next.