Friday, 8 February 2013

Rewriting History - in an SVN Repository

I've been thinking about moving one of my personal projects from SVN source control to git. Mainly because of the offline commit ability in git which has been 'coming soon' for ever in SVN.

Git has a fairly nice import functionality to pull an SVN repository into git, but it's really designed around the standard SVN repository layout of /trunk, /branches and /tags. My repository doesn't look like that - for some reason (most likely it was the first time I had ever used SVN, many years ago) I created it without any of those conventional folders. Bad idea #1....

My SVN repo also has a lot of separate projects in it as folders at the top level - this is probably bad idea #2. Then, a couple of years back I needed to branch one of my projects, so I created a top level /trunk folder, moved all the folders into that, created a top level /branches and made a branch there. Bad idea #3 I'm thinking.

So, my repository is a mess and the project I'm most interested in - SharpCap - can be found in /trunk/SharpCap and /branches/SharpCap/[branch]. Yuck.

Maybe I could pull the whole lot into git and prune and chop it into shape afterwards - however my git-fu is still weak and I'm not going to try that yet. Instead I decided to try to rewrite the history of my SVN repo to make it fit the standard layout more nicely. Of course I'm doing this on a *copy* of the SVN repository, not the real thing.

Firstly, I'm working on Linux - Ubuntu 12.04 LTS to be precise. I expect all the steps below can be run on Windows, but you'll be messing about installing perl and python and goodness knows what else - easier on Linux by far.

Anyway, first the tools for the job :

The svndumpfilter tools you'll just need to download the script files - svn-dump-reloc can be installed via cpan - install perl via apt-get if you don't have it, then run cpan, work through the initial questions and then do install SALVA/SVN-DumpReloc-0.02.tar.gz.





The basic technique is to use svnadmin dump to dump the repository to file, use one of the filter tools to modify the repository, then use svnadmin load to bring the modified dump back into a new repository. So, assuming that you repository is at /home/svn/myrepo, you might do this...

svnadmin dump /home/svn/myrepo > /tmp/myrepo.dmp
cat /tmp/myrepo.dmp | svndumpfilter3 --untangle /home/svn/myrepo /path/to/keep/in/repository > /tmp/filtered.dmp
mkdir /tmp/filtered && svnadmin create /tmp/filtered
svnadmin load /tmp/filtered < /tmp/filtered.dmp

What's going on there? First we dump the original repository to a file, then we pass it through svndumpfilter3 to only keep a particular path or paths, then load back into a new repository. svndumpfilter3 needs to know about the location of the actual repository the dump file came from - in some cases it goes back to the repository to dig out extra information to help it deal with moves, copies, etc in the repository.

So, in my case the svndumpfilter command is

cat /tmp/myrepo.dmp | svndumpfilter3 --untangle /home/svn/myrepo /SharpCap /trunk/SharpCap /branches/SharpCap > /tmp/filtered.dmp

This pulls out the bits of the repository I'm interested in and throws out the rest. 

Now, it's not quite as simple as it looks to reload this dump into a repository - if you try it, you'll find it just fails. This is because we haven't included the creation of /trunk or /branches in our filter, so the first revision that tries to do something into one of those folders will fail to load because the folder is missing. You'll get an error like this :

svnadmin: File not found: transaction '307-8j', path 'branches/SharpCap'

Here's how to step around that by creating the parent folders first.

mkdir /tmp/filtered && svnadmin create /tmp/filtered
svn mkdir -m "make trunk" file:///tmp/filtered/trunk
svn mkdir -m "make branches" file:///tmp/filtered/branches
svnadmin load /tmp/filtered < /tmp/filtered.dmp


Now, while svndumpfilter3 seems to be the best choice for pruning the repository (it gets confused much less often than svndumpfilter2 or the original svndumpfilter), it doesn't have an option to drop empty revisions from the dump file. If you've pruned out a significant chunk of a repository, you'll most likely want to get rid of those, and this is where svndumpfilter2 comes in handy.

cat /tmp/filtered.dmp | svndumpfilter2  --drop-empty-revs --renumber-revs /tmp/filtered trunk branches SharpCap > /tmp/renumbered.dmp
So, what we've done there is reload the 'filtered' dump into a temporary repository - this is because we need the repository for svndumpfilter2 to work with - and then process the dump again to drop the empty revisions. By specifiying 'trunk' and 'branches' and 'SharpCap' to svndumpfilter2 I have told it to include everything in the source dump (add tags too, if you use those), so I'm just using it to renumber revisions rather than filter anything here.


 With me so far? Good... Now for the tricky bit - we need to re-arrange the history of the repository folder structure. Basically the mapping I want to do is as follows:

/SharpCap -> /trunk
/branches/SharpCap/<branch> -> /branches/branch

First try is just to use svn-dump-reloc three times - ie.

cat /tmp/renumbered.dmp | svn-dump-reloc '/trunk/SharpCap' '/trunk' | svn-dump-reloc '/SharpCap' '/trunk' | svn-dump-reloc '/branches/SharpCap' '/branches' > moved.dmp

This should move any item that was historically in /trunk/SharpCap into /trunk, and the same with anything in /SharpCap. The final bit should move anything in /branches/SharpCap into /branches. Unfortunately the dump won't load. 

The reason the dump won't load is that I have a very interesting revision in it right now. Before the relocation, that revision used to say 'copy the contents of /SharpCap to /trunk/SharpCap and then delete /SharpCap'.  After the relocation it now says 'copy the contents of /trunk to /trunk and then delete /trunk'. Ooops. In the actual dumpfile, the revision looked like this :

Revision-number: 149
Prop-content-length: 126
Content-length: 126

K 8
svn:date
V 27
2010-09-14T19:39:50.182787Z
K 7
svn:log
V 26
move main folders to trunk
K 10
svn:author
V 5
robin
PROPS-END

Node-path: trunk
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 148
Node-copyfrom-path: trunk

Node-path: trunk
Node-action: delete

So, of course the next revision after that one that tries to do anything to /trunk fails with the old
svnadmin: File not found: transaction '159-4f', path '/trunk'
 error. Basically all I needed to do was to get rid of this revision from my dump file - vi was quite sufficient to do that on my (~150Mb) dump. For bigger dumps you might need to use a more powerful editor or work out another way to remove it - the svnadmin dump command allows you to specify a revision range, so you could load the renumbed.dmp file and then dump it in two parts -r 0:148 and -r 150:HEAD sort of thing, then cat the two together.

So, finally I have an svn repository that has 'always' had the structure I want. Now to try loading it into git...