Software Configuration Management

Welcome to my SCM blog - Promises to be a no-nonsense resource center for configuration management matters.

Name:
Location: San Francisco, California, United States

Saturday, March 25, 2006

Subversion Network Protocol De-constructed

So I have been chewing the FUD on Subversion site lately. Now don't get me wrong I think Subversion has many a positives going for it (more on this latter) but what beats me is the following - for an SCM whose sole objective was to be a replacement for CVS, how can it regress on things that CVS does well (I am not making this up see - http://subversion.tigris.org/faq.html#why :

Why does this project exist?

To take over the CVS user base. Specifically, we're writing a new
version control system that is very similar to CVS, but fixes many
things that are broken. See our front page.


For instance :
  1. CVS opens a single client-server connection for any operation. In contrast Subversion is TCP connection hungry. Using the SVN RA protocol, a checkout can trigger 4-5 new TCP connections. On a WAN if you have a Subversion repository being accessed from say China to USA, this can be a killer. Each TCP connection can take 1.5 Round trips. The SVN DeltaV Http protocol fares better but it too can open multiple TCP connections for operations.
  2. Chatty protocol. And I thought CVS was chatty, wait till you sniff the Subversion network protocol. I noticed for the SVN RA protocol, repeated invocation on new connections and on each connection the authentciation happens again (if you are using SSH it really slows you down), commands like get-latest-rev, set-path are sent multiple times. Sometimes the same command is sent multiple times on the same connection. Harmless ? Its not an error yeah but why be so wasteful ? I mean its not like this is legacy code that was written 20 years ago. Why not have a clean network protocol model
  3. What's with this lisp like list syntax on the wire :
    ( 2 ( edit-pipeline ) 14:svn://zen/mod )
    ( CRAM-MD5 ( ) )
    38:rachel 6195af60930ad5367267948ddacf30ca
    ( get-latest-rev ( ) )
    ( update ( ( 19 ) 0: true ) )
    ( set-path ( 0: 16 false ( ) ) ) ( set-path ( 11:dir1/f1.txt 19 false ( ) ) ) ( set-path ( 9:dir1/foo2 19 false ( ) ) ) ( finish-report ( ) )
    ( success ( ) )
    Why not use a more compact representation like CVS. I used to complain the CVS protocol does 4-5 Round trips to execute a command, Subversion can do litereally 14-15 Round Trips in a checkout. Agreed you don't checkout the whole repository often but why the sloppiness in initial design.

If the goal was to do better CVS, the Subversion team should have started with looking at how the network protocol is dealt with in CVS and learned from it.

More ramblings on Subversion to follow soon.

1 Comments:

Anonymous Anonymous said...

Doesn't the Subversion HTTP protocol offer better model ? I mean it uses http/1.1 pipelining to connect to Apache/SVN thereby reducing the need for multiple connections. Of course setting up Apache for an SCM admin is more learning curve, configuring it properly (security) is more challenging.

2:17 PM  

Post a Comment

<< Home