[Geoserver-devel] Removal of svn:externals: stale directory and DVCS support

Today I removed the svn:externals used in GeoServer app-schema-test to access test data from the GeoTools repository. This was always a bit nasty, as the right way to depend on other projects is via maven artifacts (I now get all the GeoTools app-schema test data from the GeoTools test jar). Getting rid of the svn:externals allows me to handle inter-project test data dependencies in my local maven repo, without manual copying or going through the svn repo.

One consequence of this change is that developers will be left with a stale external directory in an existing GeoServer svn working directory:
src/extension/app-schema/app-schema-test/src/test/resources/test-data
The presence of this directory will cause build failures. Please remove it if you have it. I like to look for non-version-controlled files with "svn stat". I just had to do this on our buildbot:
rm -rf src/extension/app-schema/app-schema-test/src/test/resources/test-data
Sorry for any inconvenience.

One other advantage of the removal of svn:externals is improved support of DVCS such as git, especially through tools such as git-svn. I have reliable reports that svn:externals in GeoServer causes problems with DVCS interoperability.
http://www.mail-archive.com/geoserver-devel@lists.sourceforge.net/msg08952.html
See also the GeoServer log snippet below, in which Andrea convinces Ben of the merits of a DVCS and the problems with svn:externals, and gives some useful advice on how to use git-svn. (I recommend reading this to anyone contemplating branch maintenance.)

- Can we use maven resource handling as a substitute for svn:externals?

- Will this make it easier to use git for GeoServer branch maintenance?

Kind regards,
Ben.

Appendix A: Discussion in GeoServer on freenode, in which Andrea convinces Ben of the merits of a DVCS and the problems with svn:externals, and gives some useful advice on how to use git-svn:

[2010-05-26 14:44:56] <aaime> bencaradocdavies, wondering, is the use of app-schema resolver going to eliminate the usage of svn externals?
[2010-05-26 14:46:55] <bencaradocdavies> aaime, no, but it will remove their size, as we should (in time) be able to remove all the schemas.
[2010-05-26 14:47:20] <aaime> sigh, ok
[2010-05-26 14:47:31] <bencaradocdavies> aaime, We still like to share mapping files between geotools and geoserver as they are such a pain to write.
[2010-05-26 14:48:07] <aaime> I wonder if a jar dependency could be used... but yeah, it's my own personal problem, I work much better using git but it does not work with externals
[2010-05-26 14:48:13] <aaime> so I have to keep the up to date manually
[2010-05-26 14:48:35] <bencaradocdavies> aha, now I see the motivation. I was looking at git earlier this week.
[2010-05-26 14:49:41] <bencaradocdavies> Let me think about it. I suppose we could ship test resources in a test jar, depend on that, then load them off the classpath. That would work.
[2010-05-26 14:49:55] <aaime> Eh, lately I would have killed myself with it
[2010-05-26 14:50:17] <bencaradocdavies> It was the schemas that were preventing this, as we had no support for loading them off the classpath while encoding. Now we do.
[2010-05-26 14:50:19] <aaime> I've been working on some major new features and generally speaking have 4-5 concurrent branches of each trunk
[2010-05-26 14:50:47] <bencaradocdavies> Ooh manual externals == pain.
[2010-05-26 14:51:01] <aaime> each one of them needs to go through review so the old develop-commit-develop-commit cycle would have been broken solid for me if it wasn't for git local branching abilities
[2010-05-26 14:51:31] |<-- walterdeane has left freenode (Quit: walterdeane)
[2010-05-26 14:51:32] <aaime> generally speaking teh current attention on patch review made svn not a viable option anymore...
[2010-05-26 14:51:54] <aaime> I cannot keep 5 different patch sets up to date with svn easily while I wait for a review on them...
[2010-05-26 14:52:08] <aaime> but with git that is doable and not hard
[2010-05-26 14:52:31] <bencaradocdavies> I have been getting away with svn diff | patch -p2 for branch maintenance, but it rapidly becomes horrible as branch and trunk diverge. I'd love to use git instead, so I am learning it.
[2010-05-26 14:53:07] <aaime> yeah, the nice thing about git is that somehow it manages to rebuild the commit history with minimal conflicts
[2010-05-26 14:53:10] <aaime> unlike patch
[2010-05-26 14:53:53] <aaime> so you take a bit patch you did not work for a couple of weeks on, rebase, and most of the time the history gets rewritten with your commits at the end and with no conflicts
[2010-05-26 14:54:02] <aaime> (bit -> big)
[2010-05-26 14:54:13] <bencaradocdavies> And patch is completely useless when deleting or renaming files, and requires manual intervention when new files are added. A little operator error can easily break the build when a file is not svn added.
[2010-05-26 14:54:59] <aaime> another thing I like is the fast ability to put aside what you're working on, switch to a branch that has completely different work going, and come back
[2010-05-26 14:55:08] <bencaradocdavies> Getting ready to drink the DVCS Kool-Aid. :slight_smile:
[2010-05-26 14:55:08] <aaime> with just a refresh in eclispe... it's fast
[2010-05-26 14:55:24] <aaime> the thing is that I'm not root for git D abilities
[2010-05-26 14:55:36] <aaime> but for its abilities to be a great svn client
[2010-05-26 14:56:12] <bencaradocdavies> What is the size of a local repo for GT or GS?
[2010-05-26 14:56:26] <aaime> I don't give a damn about the distributed part, I like the local branching, speed and history management abilities
[2010-05-26 14:56:36] <aaime> smaller than a svn checkout
[2010-05-26 14:56:44] <aaime> but I don't have full history
[2010-05-26 14:56:54] <aaime> getting gt full history would take days
[2010-05-26 14:57:05] <aaime> getting full gs history took like 8 hours
[2010-05-26 14:57:18] <aaime> but you can instruct it to start from a certain revision, say one year ago
[2010-05-26 14:57:23] <aaime> most of the time it's all you need
[2010-05-26 14:58:38] <bencaradocdavies> That is nice. I was worried I'd be forced to get the entire history and fill up my poor little 60 GB SSD
[2010-05-26 14:58:46] <aaime> nah, it's pretty efficient
[2010-05-26 14:58:57] <aaime> a GS with full history is just slightly larger than a SVN checkout
[2010-05-26 14:59:11] <aaime> (as far as I can remember)
[2010-05-26 14:59:23] <aaime> but it's just too painful to grab (at least with my connection)
[2010-05-26 14:59:55] <bencaradocdavies> He he, we have 10 Gbps all the way to the US. :smiley:
[2010-05-26 15:00:18] * aaime barely reaches 4Mbits
[2010-05-26 15:00:46] <bencaradocdavies> *pain*
[2010-05-26 15:00:59] <aaime> now you know why I hate externals
[2010-05-26 15:01:03] <aaime> they take forever to be handled
[2010-05-26 15:01:33] <bencaradocdavies> It isn't your speed, it is the remote server per-query overhead. They take forever for me too.
[2010-05-26 15:02:05] <bencaradocdavies> GeoTools has no externals and is much quicker.
[2010-05-26 15:02:06] <aaime> I see... I thought it was a latency issue
[2010-05-26 15:02:21] <aaime> yep, it is, despite being located on a slower server
[2010-05-26 15:02:34] <bencaradocdavies> Well, I am assuming it is the per-query overhead.
[2010-05-26 15:02:56] <aaime> can be, not sure, afaik svn has a very chatty protocol
[2010-05-26 15:03:00] <bencaradocdavies> I think svn is optimised for one query with a lot of content.
[2010-05-26 15:03:27] <aaime> he he
[2010-05-26 15:03:33] <aaime> it's slow anyways :slight_smile:
[2010-05-26 15:03:44] <bencaradocdavies> I have this nasty feeling that each external is a new connection. Or at least that is how it feels.
[2010-05-26 15:03:57] <aaime> cloning a git repo is an entirely differen world -> designed for speed
[2010-05-26 15:04:37] <aaime> (but making a git clone of an svn repo is of course just as slow as using svn)
[2010-05-26 15:04:57] <aaime> (actually, much slower since git tries to get the whole history)

Timestamps are AWST (UTC+8).

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineering Team Leader
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre