thewhitelily | Microsoft Sync Services, part 1: The Problem

Warning: this is the first in a series of technical entries regarding the details of data synchronisation. Look away if you don’t care.

I’ve been working on and off over the past six months at including offline capabilities in our web-based athlete management system. The idea is that a coach or a doctor should be able to unplug their machine from the network, go out to a training camp or on a scouting trip and use the system completely as normal, then come back and push the changes they’ve made into the main system and pull back any changes made by other people in the meantime.

The obvious technology to use, given we’re in a .NET environment, is the new Microsoft Sync Services framework based around SQL Server CE. Plenty of examples around - and now it's all got a wizard, including bidirectional synchronization should be as easy as ABC: click through the wizard, add a web service on one end and a reference on the other, run the autogen upgrade scripts on your server database, and go!

Reality is never so simple.

The main problem is that our database structure is extremely fluid. We have a sophisticated database convert structure which makes various data and schema changes depending on each user’s product type and licence options. These changes also come through very frequently – we work on an extremely agile development cycle ranging from days down to hours. We work in the fast-paced sports industry: our clients rely on being able to ring us with a problem ten minutes before the game starts and get upgrade with at least an interim solution before the whistle. The convert structure works wonders for us in this agile situation, running as the recently upgraded application starts to make any necessary database changes, so that we can make code changes completely free from the worry of database differences – if the user doesn’t have the new structure, it doesn’t matter because they can’t be running our new code which relies on it.

It’s not just changes to the schema as new features are added, either – there’s also data massaging as bugs are discovered or meanings evolve. For example, at any point we may discover a bug that was awarding every penalty immediately following a scrum to the wrong team and introduce a convert to switch them back. In other words, there can sometimes be quite subtle but pervasive and necessary changes to the meaning of the data which must take place to all of the data or be known to have taken place for none.

Extending the converts over the server database was moderately easy – any database marked as a sync server uses an inherited version of our DDL class to keep the additional structure for tracking, tombstone tables and triggers up to date with every change made, ensuring that nothing gets lost in the cracks – otherwise, it just converts the data as normal.

More difficult is the local cache database, for several reasons:

Any relevant converts must be performed on the data from the local cache before it is merged back into the server, or the data will end up as a hopeless hodgepodge of half-conversions. (As a side note, this also means that the local application must be running at the same version as the server application before any synchronization can take place. This is a problem for another day: for the moment, application version match is just a precondition to performing a sync.)

If the client performs an upgrade of the application without performing a synchronization of the data at the same time, the client cache will nonetheless need to be fully converted, or the assumptions made by the new version of the program about the structure and meaning of the data will be invalid.

Given the above, it seems likely that these data conversions will need to take place on both the server and every client at disparate times and interleaved with changes users have made and expect to be synchronized.

This means that as the synchronization process currently works, it will be littered with false data conflicts – and the most recently touched record will bear no relation to the reality of which if any of the changes have been made by the user and actually need to be synchronized. The solution here is that operations performed during a conversion cannot be considered ‘real’ touches to the database and must leave the ‘last updated’ tracking on each record untouched.

As a final complication, when Sync Services is used to synchronize an SqlCE database, the database schema is marked as read only for some essential DDL operations. It is simply an unacceptable solution for me to assert that we can no longer use, for example, “drop column” statements in converts.

So. Um. Not really as easy as advertised. The most obvious – and perhaps even simplest – solution is to give up on using the Sync Services framework and rewrite our own synchronization framework from the ground up. Ergh. I’ve probably learned enough just from working with Sync Services to have a much better idea of where to start, but it’s so nearly there!

I’ve chosen to first investigate a path slightly less radical. I’m subclassing just one class in the framework: the one which handles the client side database. I’m rewriting it to do exactly the same thing it used to do – but without marking the schema as read only and without recording operations performed during a DB convert as a real ‘touch’ on the data. Hopefully that means that I can get the rest of the business logic, the comms, the server side handling, etc. for free. If any of them do things I need them not to, I’ll subclass or just plain duplicate them as necessary and gradually build our own propriatory framework. Precisely what Object Orientation’s all about, eh?

Unfortunately, as far as I can tell, I’m one of the first people crazy enough to write their own ClientSyncProvider. The only example available is the generally used SqlCeClientSyncProvider, a complicated framework class for which the source code is unavailable, the documentation sparse and the community microscopic. Under a thousand hits? It’s practically a googlewhack.

So, after working my way through available doco, forums, and blogs, the most common way I have of working out whether I’m managing to completely reverse engineer the behaviour is constructing a skeleton subclass with breakpoints before and after every passthrough to the superclass, and examining the pre and post condition of every framework object parameter and visible state variable to work out how things have changed; hoping there's no private variables held by the superclass; tearing my hair out at the errors that occur each time I run in search of any semblance of sense; and, if all else fails, making stuff up in the vain hope it’ll make good things happen.

It's been a week and a half of intensive, soul-destroying reverse engineering, but I think I'm out the other side - at least in replicating the existing behaviour and plumbing it all up. Sometimes, I've wished I’d decided to go with Google Gears as the sync framework instead. Then I remember: Javascript. Life can always be worse.

But perhaps some good can come of this, apart from the solution for our clients and a better product for our company. Perhaps there’s someone else out there who wants to use the Sync Services framework but finds it's just a trifle too restrictive in some aspect. Perhaps the unspecified 'they' could find my experiences in subclassing ClientSyncProvider helpful. I know just writing about it is helpful to sorting out the problem in my mind. And why not, I might say, try my hand at a spot of technical blogging?

Coming up: what the ClientSyncProvider doco should tell you, and a short history of SyncAnchor.

Flat | Top-Level Comments Only

Nothing to add - I don't do .Net, other than as a consumer of my services - other than "i feel your pain".

Insufficient and inadequate documentation for these complex frameworks is, I think, looming as one of the big problems, and a generally unrecognised one. There is very much an emphasis in the broad development community on, well, developing. A constant battle I have waged is the knee-jerk tendency of developers (myself included) to constantly build new products, but that is a different rant I may tangentially lurch off on at some time. Maintaining legacy software is very hard. It gets even harder if the documentation is misleading, missing or just plain wrong.

Oh, and I feel your pain. Data synchronisation is a very, very non-trivial problem.

But I like Javascript!

I also know nothing of .Net, and it's been a while since I've held an IT job, but I can sympathise, too. Sparse documentation is becoming more and more common, as new frameworks that are the next greatest thing get rushed out the door with all the documentation perfectly up to scratch so long as your needs exactly match the samples.

I don't envy your reverse engineering task. I always hated that class of error that occurs because of hidden or monolithic state, making it tedious and life-threatening to discover where the real problem is.

Oh, the joys of synchronisation...

Fortunately my needs are pretty low-key compared with what you're trying to achieve (I'm talking three machines, one portable hard drive, and one user -- about as low-key as it gets) but even so, I too feel your pain!

I'd be pulling it out in sympathy about now. :)

TPWFL

Microsoft Sync Services, part 1: The Problem

no subject

no subject

no subject

If I had any hair to spare