Centralized vs. distributed version control systems
Martin ˇZatkul´ak
In the world of programming and engineering, it is necessary to keep track of changes.
This allows developers and programmers revert unwanted applications changes in the
development process.
Version Control Systems (VCS) are standalone applications that have made the job
easier by providing systemized management of multiple revisions of the same unit of
information, whether it is a simple document or a digital document, such as source
code of application or blueprint of electronic model. VCS allow developers to work
collaboratively on the same application from diﬀerent locations using repositories.
1 Centralized and distributed nature
Git was designed from the ground up as a distributed version control system. Being
a distributed version control system means that multiple redundant repositories and
branching are ﬁrst class concepts of the tool.
In a distributed VCS like Git every user has a complete copy of the repository data
stored locally, thereby making access to ﬁle history extremely fast, as well as allowing
full functionality when disconnected from the network. It also means every user has a
complete backup of the repository. Have 20 users? You probably have more than 20
complete backups of the repository as some users tend to keep more than one repository
for the same project. If any repository is lost due to system failure only the changes
which were unique to that repository are lost. If users frequently push and fetch changes
with each other this tends to be an incredibly small amount of loss, if any.
In a centralized VCS like Subversion only the central repository has the complete
history. This means that users must communicate over the network with the central
repository to obtain history about a ﬁle. It also means that having 20 users does not
automatically imply 20 active backups. Backups must be maintained independently of
the VCS. If the central repository is lost due to system failure it must be restored from
backup and all changes since that last backup are likely to be lost. Depending on the
backup policies in place this could be several man-weeks worth of work.
2 Repositories
Since Subversion only supports a single repository there is little doubt about where
something is stored. Once a user knows the repository URL they can reasonably assume
1
that all materials and all branches related to that project are always available at that
location. Backup to tape/CD/DVD is also simple as there is exactly one location that
needs to be backed up regularly.
Since Git is distributed by nature not everything related to a project may be stored
in the same location. Therefore there may be some degree of confusion about where to
obtain a particular branch, unless repository location is always explicitly speciﬁed. There
may also be some confusion about which repositories are backed up to tape/CD/DVD
regularly, and which aren’t.
3 Access control
Due to being distributed, you inherently do not have to give commit access to other
people when using Git. Instead, you decide when to merge what from whom. (There
exist diﬀerent mechanisms of control in case you do want to have a repository into which
multiple people can push to.)
Since Subversion has a single central repository it is possible to specify read and
write access controls in a single location and have them be enforced across the entire
project.
4 Branch handling and accountability
Branches in Git are a core concept used everyday by every user. In Subversion they
are more cumbersome and used sparingly. Branching can still be done with Subversion
and within a company or organization this might make sense because a manager could
check progress of new features by looking at the commits on branches and merges to
the trunk. With open source however, you don’t care what others are doing or about
tracking their progress, you are simply thankful for any contribution.
The reason branches are so core in Git is every developer’s working directory is
itself a branch. Even if two developers are modifying two diﬀerent unrelated ﬁles at the
same time it’s easy to view these two diﬀerent working directories as diﬀerent branches
stemming from the same common base revision of the project.
Consequently Git:
• Automatically tracks the project revision the branch started from. Knowing
the starting point of a branch is necessary in order to successfully merge the
branch back to the main trunk that it came from.
• Automatically records branch merge events. Merge records always include
the following details:
– Who performed the merge.
– What branch(es) and revision(s) were merged. (All changes made on the
branch(es) remain attributed to the original authors and the original timestamps
of those changes.)
2
– What additional changes were made to complete the merge successfully. (Any
changes made during the merge that is beyond those made on the branch(es)
being merged is attributed to the user performing the merge.)
– When the merge was done.
– Why the merge was done.
• Automatically starts the next merge at the last merge. Knowing what revision
was last merged is necessary in order to successfully merge the same branches
together again in the future.
This is quite contrary to Subversion’s handling of branches. As of Subversion 1.5:
• Automatically tracks the project revision the branch started from. Like
Git, Subversion remembers where a branch originated.
• Incomplete merge event record. Although Subversion records a merge as a
commit and thus associates a username and a timestamp to it (like Git) there are
some serious ﬂaws in this record:
– All changes made on the branch appear to be made by the merging user. This
means that from a historical perspective every line of code modiﬁed on the
branch will appear in the trunk as though it was written by the user who
merged the branch. This is wrong if there were other users working on that
branch.
– It’s impossible to see only merge related changes. If the merging user had
to modify 12 lines of code to complete the merge successfully you can’t tell
what those 12 lines were, or how those 12 lines diﬀer from the versions on the
branches being merged.
In Subversion, branches and tags all are copies, it’s a smart idea, but sometimes it’s
not convenient, many newbies checkout the whole repository by mistake or are confused
when updating or merging a moved branch. Branch path and ﬁle path lie in same
namespace but they have diﬀerent semantics indeed and should be taken care in diﬀerent
way.
5 Performance
Git is extremely fast. Since all operations (except for push and fetch) are local there
is no network latency involved to:
• Perform a diff.
• View ﬁle history.
• Commit changes.
3
• Merge branches.
• Obtain any other revision of a ﬁle (not just the prior committed revision).
• Switch branches.
6 Space requirements
Git’s repository and working directory sizes are extremely small when compared to
Subversion.
For example the Mozilla repository is reported to be almost 12 GB when stored in
Subversion using the FSFS backend. Previously, the FSFS backend also required over
240000 ﬁles in one directory to record all 240000 commits made over the 10 year project
history. This was ﬁxed in Subversion 1.5, where every 1000 revisions are placed in a
separate directory. The exact same history is stored in Git by only two ﬁles totaling just
over 420 MB. Subversion requires 30 times more disk space to store the same history.
An Subversion working directory always contains two copies of each ﬁle: one for the
user to actually work with and another hidden in .svn/ to aid operations such as status,
diff and commit. In contrast a Git working directory requires only one small index ﬁle
that stores about 100 bytes of data per tracked ﬁle. On projects with a large number of
ﬁles this can be a substantial diﬀerence in the disk space required per working copy.
As a full Git clone is often smaller than a full checkout, this means that Git working
directories (including the repositories) are typically smaller than the corresponding Subversion
working directories. There are even ways in Git to share one repository across
many working directories, but in contrast to Subversion, this requires the working directories
to be colocalized.
7 Line ending conversion
Subversion can be easily conﬁgured to automatically convert line endings to CRLF or
LF, depending on the native line ending used by the client’s operating system. This
conversion feature is useful when Windows and UNIX users are collaborating on the
same set of source code. It is also possible to conﬁgure a ﬁxed line ending independent
of the native operating system. Files such as a Makeﬁle need to only use LFs, even when
they are accessed from Windows. This can be adjusted in a global conﬁg and overridden
in user conﬁgs. Binary ﬁles are checked in with a binary ﬂag (like with CVS except
that Subversion does this almost always automatically) and such never get converted or
keyword substituted. Although additionally Subversion allows the user to specify line
ending conversion on a ﬁle-by-ﬁle basis. But if the user does not check binary ﬂag on
adding (Subversion prints for every added ﬁle whether it recognized it as binary) binary
content might get corrupted.
Whilst Git versions prior 1.5.1 never convert ﬁles and always assume that every ﬁle is
opaque and should not be modiﬁed. Git 1.5.1 and onwards make this conﬁgurable. For
4
users on Windows they should set core.autocrlf = true so that text ﬁles are automatically
checked out with CRLF and checked in as LF. Git’s advantage over Subversion
is that you do not have to manually specify which ﬁles this conversion should be applied
to, it happens automatically (hence autocrlf).
8 User interfaces
Currently Subversion has a wider range of user interface tools than Git. For example
there are Subversion plugins available for most popular IDEs. There is a Windows
Explorer shell extension. There are a number of native Windows and Mac OS X GUI
tools available in ready-to-install packages.
Git’s primary user interface is through the command line. There are two graphical
interfaces: git-gui (distributed with Git) and qgit, which is making great strides towards
providing another feature-complete graphical interface. Also gitk, the graphical history
browser, can be more than just a fancy log reader. git-gui and gitk usually work outof-box
for common operating systems, and qgit is being ported to Qt4, which improves
its portability. There are some user interface tools in development for Git, namely
TortoiseGit, a port of TortoiseSVN. There is also Git Extensions, another explorer shell
extension.
9 Partial checkout
With Subversion, you can check out just a subdirectory of a repository. Such a thing is
not possible with Git. For a large project, this means that you always have to download
the whole repository, even if you only need the current version of some sub-directory. In
times where fast Internet connections are only available in most cities and traﬃc over
mobile internet connections is expensive, Git can cost much more time and money in
rural areas or with mobile devices.
10 Revision numbering
First, as Subversion assigns revision numbers sequentially (starting from 1) even very
old projects such as Mozilla have short unique revision numbers (Mozilla is only up to
6 digits in length). Many users ﬁnd this convenient when entering revisions for historical
research purposes. They also ﬁnd this number easy to embed into their product,
supposedly making it easy to determine which sources were used to create a particular
executable. However since the revision number is global to the entire repository,
including all branches, there is still a question of which branch the revision number corresponds
to. (Unless the last committed revision is recorded. Since revisions are global
for a repository, the last committed revision makes it possible to determine which branch
was used)
5
As Git uses a SHA1 to uniquely identify a commit each speciﬁc revision can only be
described by a 40 character hexadecimal string, however this string not only identiﬁes
the revision but also the branch it came from. In practice the ﬁrst 8 characters tends
to be unique for a project, however most users try to not rely on this over the long
term. Rather than embedding long commit SHA1s into executables Git users generate
a uniquely named tag. This is an additional step, but a simple one.
Secondly, Subversion’s revision numbers are predictable. If the current commit is
435 the next one will be 436. It’s very easy then to go through a few sequential revisions
to, e.g. look at diﬀerences, revert to an old revision to ﬁnd when a regression was
introduced, etc. Furthermore, without looking up any additional information, you know
that commit 436 was done after 435. Similar actions and knowledge from Git requires
looking at the log.
11 Conclusion
By looking at an overview of the features of Git and Subversion, we can see that Git
is preferable in most circumstances. Developers, researchers, engineers and other users
of VCS have more inclination towards Git, but generally it strongly depends on your
personal needs. Neither is best, but one is often better for what you are doing.
6
References
[1] Azad, S., SVN vs. Git: Who Will Be the Future of Revision Control?, 2008,
available at http://www.richappsconsulting.com/blog/blog-detail/
svn-vs-git-who-will-be-the-future-of-revision-control/.
[2] Deceth, Git vs SVN – Which is Better?, 2009, available at
http://www.looble.com/git-vs-svn-which-is-better/.
[3] Git Wiki contributors, GitSvnComparison, 2006–2010, available at
http://git.wiki.kernel.org/index.php/GitSvnComparsion.
[4] The Git Community Book contributors, The Git Community Book, 2010, available
at http://book.git-scm.com/.
7