Friday, April 8, 2011

so, you wanna deploy your schema?

Maintaining production databases is pretty easy when your product is running at a small number of sites, on systems you control, on identical platforms. Typically you'll maintain a set of SQL scripts that do the dirty work, deploy them to each site and run them at upgrade time. Depending on how savvy you are, you might even write SQL that checks whether or not each mod has already been applied, and moves on gracefully if so; non-destructive re-runnable SQL is always nice.

But what do you do when your product may be running at thousands of different sites, self-managed by end-users, on a variety of database platforms? Maintaining a bunch of SQL scripts - one for each platform - would be painful and near-impossible to test robustly.

Here's my solution, for what it's worth. I'm sure others have come up with something more elegant, but maybe you'll find this useful or interesting.

1. Establish syntax for a schema definition file

If we're going to create schemas we can deploy on any database platform, we're going to need a generic way of describing them, that can then be translated into platform-specific SQL at runtime by our management code. Whatever entities/data types/etc. we settle on are going to need to be supported by all platforms, too; so, simpler is better.

I'll spare you the full spec I came up with, but here's a snippet. There's no support for stored procedures or triggers, because I don't believe in burying business rules in the schema. Ever. As for views ... well, more on those later. For now, let's just say they'll appear in a different layer of this framework.

add table _user_config
add column _user_config.line_id int notnull primarykey identity
add column _user_config.setting_name varchar 50 notnull
add column _user_config.username varchar 50 notnull
add column _user_config.val text
add index _user_config.idx_user_config_setting unique setting_name,username
add index _user_config.idx_user_config_username username
add reference _user_config.ref_user_config_setting_name setting_name _config setting_name ondelete:cascade onupdate:cascade
add reference _user_config.ref_user_config_username username _users username ondelete:cascade onupdate:cascade

As you can see, it bears some resemblance to typical CREATE TABLE syntax, but each entity in the final schema occupies its own line. This is to make it easier to identify changes later on; e.g. if I were to add a column to this table, I could simply add a new "add column" line in place, and have management code detect it as an unapplied entry on existing installs (ADD COLUMN) or create it as part of a CREATE TABLE query on new installs.

There are "drop" directives too, and rules to aid in parsing, e.g. entities should be dropped after being added, so the schema manager will know not to create them in the first place on new installs.

2. Write schema management code

Sounds simple enough, right? It actually ended up being a non-trivial exercise (for me, anyway), but my schema_Manager class is now working. It has a Prepare method that works sequentially through a schema definition file, identifies new entries via a lookup table stored in a core database (yes, there are smarts to get that deployed before trying to store stuff in it), compiles them into instructions ready to pass to a platform-specific "provider" class, then uses the relevant provider class to build SQL ready to execute. An Execute method does the dirty work, handles errors and populates the lookup table.

3. Write platform-specific providers

For now, I've only written a MySQL provider, but additional ones should take about 2 hours each to add. All they do is build CREATE TABLE, ALTER COLUMN, etc. SQL for the schema manager to execute.

That's it! Obviously there's plenty of detail I'm not boring you with (maybe if I ever go open source with this project?), but hopefully that gives you some idea of how I've tackled schema deployment.

Feel free to hit me with questions or comments.

No comments:

Post a Comment