Wednesday, September 20, 2006

Announcing HoboCopy

For a while now, I've been unhappy with the state of my backup strategy. I have a simple script that runs ntbackup every night on a rotating, 14-day schedule: one full backup followed by six incremental backups, then repeat with a different target. In this way, I can always recover to any day in (at least) the last week.

 

So why am I unhappy? Two reasons:

 


  1. ntbackup isn't exactly robust - I've seen all sorts of failures, and restoring files is sort of flaky.

  2. ntbackup locks its information up in a proprietary format. I just want my files, dammit, not some .bks file.

 

Being a cheap bastard with decent programming skills, I started looking at what it would take to wire up my own solution. I thought about using robocopy, the king of copy tools. But it has one major drawback: it can't copy any files that are locked by another program. Some of my friends solve this problem by shutting down all their programs every night, but I knew that there was no way I (or my wife) was going to remember to do that, and of course the one time I'd forget was the day my hard drive would tank. Plus, what about programs like SQL Server that I don't want to ever shut down?

 

Obviously, it's possible to copy files that are in use. After all, ntbackup does it. So I started poking around, and came across VSS. The good VSS (Volume Shadow Service), not the unbelievably crappy one (Visual Source Safe). The Volume Shadow Service is a very cool piece of Windows XP/2003 that lets you "snapshot" a hard drive, creating what's essentially a point-in-time image of what's on the disk. You can then copy files from that image at your leisure. Better, it's done in an efficient manner, so that it doesn't actually copy anything unless someone does a write, and even then it only copies at a block level. Which is good, because otherwise you'd need 50GB free to snapshot 50GB of data.

 

But it's even better than that. VSS includes an API that programs like SQL Server 2005 can use to find out when a snapshot is about to occur. When so notified, VSS-aware programs can flush their state to disk, so you get a consistent backup. Of course, not every program is aware of VSS. For those, you get what the docs call a "crash consistent" snapshot. Translation: whatever the hell was on the disk. In my book, that's still better than not backing up at all. After all, my computer seems to do just fine after a BSOD, which is no different.

 

Armed with the Platform SDK docs, I set out to achieve my goal of a VSS-based backup utility. I would have liked to have used robocopy to do the copy, but I couldn't get it to copy using source paths like \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1\data\backup\SmtpSend\Properties\Resources.resx,

which how you get to the files in the snapshot.

 

Next, I tried to write a managed tool that would do the copy. That wouldn't have been too bad - recursive file copy is pretty simple to implement. Unfortunately, the VSS API is completely broken for CLR interop - the main object you need to access implements one COM interface,  IVssBackupComponents. But if you try to query that object for that interface, it returns E_NOINTERFACE. Which is wrong, wrong, wrong. And also means that there's no way to use the object from straight managed code.

 

So I decided to write the tool in unmanaged C++. Maybe I could have done it in managed C++, or a combination of C++ and C#, but in the end I decided to make this a project that would help me get a little rust off my C++ skills. Plus, the documentation suggests that paths starting with \\?\GLOBALROOT aren't safe to use from managed code, although it worked fine when I tested it. The end result is HoboCopy. You pass it a source directory and a destination directory, and it makes a recursive copy using VSS. I've run it against my entire hard drive, and it's able to copy everything except files I didn't have permission to copy (e.g. some stuff under Documents and Settings), and for those situations I've got the /skipdenied switch. Some day I'll add a switch that enables the SE_BACKUP privilege to make that behavior even better. Unless one of you wants to submit a patch. :)

 

Speaking of patches. As of now, the project is hosted at SourceForge under the MIT license, so have at it! (Make sure you download the right one - there are separate binaries for XP and W2K3.) Just go easy on me if you look at the source - it's been years since I wrote much C++. :)

Tuesday, September 12, 2006

FolderShare + Bazaar = Tasty

As a consultant, I face the occasional problem of having to work on source code that lives behind my clients' firewalls. Often, getting access to their VPN is difficult, like when the engagement is very short and the VPN red tape copious.

 

I've tried emailing zips, setting up a WebDAV share, and various other strategies, all of which had one shortcoming or other. The best solution is, of course, a source control system of some sort, but even that I've had problems with (e.g. the combination of the client's firewall and my ISP blocks the port(s) we need).

 

For a couple of weeks now, I've been trying a new strategy. Using FolderShare, we create a shared folder that we both have write access to. Then we turn it into a Bazaar repository. On each end, we pull from and push to this folder/repository, making it look a lot like a source control server. And so far, it works great. It even runs over HTTPS, which should make any suspicious sysadmins slightly happier.

 

I imagine one could do the same thing with CVS or Subversion [1], although I haven't tried it. Bazaar seemed like a reasonable choice for two reasons: a) it's more "folder-y" than most SCC systems, so it seemed like a natural fit, and b) Sam Ruby thinks it's cool, so I wanted to try it out. :)

 

[1] You might even be able to make it work with SourceSafe. But then you'd have to use SourceSafe, and why would you use something that's both worse and more expensive?

Friday, September 8, 2006

Inside MSDN - Consuming MSDN Web Services

Well, it's early September, and you know what that means: time for the October MSDN issue. I've been checking the website frequently to see when the new issue would be released, because I've got an article in it! You can read it here. It's the latest installment of "Inside MSDN", a column that talks about how the suite of applications and services that make up MSDN itself is built. My article covers how to use the MTPS Content Service, which I announced here on CraigBlog a few months ago.

 

This is my second MSDN article, but my first solo one. I wrote the first one with Tim on an obscure feature of COM+. ("What's COM+?" you say? Yeah, I feel the same way.) 

 

I hope I can be excused for thinking it's sort of cool to have published in MSDN Magazine. :)