In my pursuit to understand Git, it’s been helpful for me to understand it from the bottom up — rather than look at it only in terms of its high-level commands. And since Git is so beautifully simple when viewed this way, I thought others might be interested to read what I’ve found, and perhaps avoid the pain I went through finding it.

The following article offers what I’ve learned on this journey so far. I hope it can help others to comprehend this wonderful system, and discover some of the joy I’ve experienced in the past few weeks. NOTE: After receiving more than fifty corrections by e-mail from very helpful readers, I’ve updated the PDF to reflect their input. The date at the front should read “Fri, 2 May 2008″ if you have the latest version.

Here is a summary from the table of contents:

  • Introduction
  • Repository: Directory content tracking
  • Introducing the blob
  • Blobs are stored in trees
  • How trees are made
  • The beauty of commits
  • A commit by any other name…
  • Branching and the power of rebase
  • Index Cache: Meet the middle man
  • Taking the index cache farther
  • To reset, or not to reset
  • Last links in the chain: Stashing and the reflog
 

37 Responses to Git from the bottom up

  1. entropie says:

    Your link to the pdf file is broken when accessed via feedreader.

    http://feeds.feedburner.com/blog_assets/git.from.bottom.up.pdf

    greets

  2. John Wiegley says:

    Please check to see if the link is fixed now.

  3. jason says:

    awesome, I look forward to reading it :)

  4. piyo says:

    I read the PDF and it was helpful in solidifying my mental image of git and introducing me to other tools. Thank you.

    I think an interesting explanation to add would be blob injection, that is, two or more HEADs that have no shared history between them. I do that when I want to add meta-info about a cloned upstream repository.

    It doesn’t help me understand if git knows about “moving a function from one file to another”? Or where (or if) compression and differencing comes into play. Perhaps these are details not necessary for this level of understanding.

    Your git-stash usage as a backup system sounds novel, but I suppose it doesn’t track files that are not registered (untracked, like new files).

    Perhaps nitpicks: Is there an extraneous quote in the title or is this some special font for “t”? “Git’ …”? On page 25, “they’re” is broken across lines incorrectly.

  5. John,

    Thanks for this resource: as a newcomer to Git, the PDF was a very comprehensible and eye opening read.

    I spotted one very minor grammar mistake: on p24 it should be “This approach has two distinct advantages” rather than “This approach two distinct advantages”.

    I was a bit confused by your use of HEAD@{1} notation in the section on hard resets until I came to the section on stashing. Perhaps the material on recovering from an inadvertent hard reset could be moved into the stashing section?

    Thanks again!
    Max

  6. John Wiegley says:

    @piyo I’m not sure I follow what you mean about “blob injection”, could you share an example?

    Also, you’re right, git-stash does not track unregistered files.

    The curly “t” is a “stylistic alternate” within the OpenType font Garamond Premier Pro.

    I’ve added yours and Max’s grammatical corrections. Thank you!

    @Max I added a short note on the usage of HEAD@{1}, that it will be explained in the next section. I just thought it important to point out there so that people would connect it with the idea of restoring from an accidental reset.

  7. MB says:

    Hey the pdf _looks_ very nice, how did you produce it?

    Do you know if there is something similar to this for mercurial?

    thanks in advance

  8. John Wiegley says:

    @MB The PDF was written using the word processor Mellel, the font Garamond Premier Pro from Adobe, and the drawing program OmniGraffle Pro for the diagrams.

    I know of nothing similar for Mercurial, but I haven’t really looked either.

  9. piyo says:

    Regarding what I call “blob injection”, it is a way of adding blobs that are unrelated to the current repository’s history. In other words, adding another track of blobs with no relation between the existing blobs.

    Actually, you’ve already touch on that, with your CVS and Subversion import example in “Diving into Git”. However, you speedily sow these two tracks together with git-rebase. I however, keep the tracks separate, because I use it for separate data, like where I got this repository from and how it should be checked out. This is a limited technique probably only suitable for metadata.

    An example:

    $ git clone git://repo.or.cz/git.git
    # or any other repo
    $ mkdir checkout_info && cd checkout_info
    $ git init
    $ echo “This repository is from git://repo.or.cz/git.git” > .git_checkout_info
    $ git add .git_checkout_info && git commit -m “Checkout info”
    $ git tag checkout_info
    $ cd ../git
    $ git remote add checkout_info ../checkout_info/.git
    $ git fetch checkout_info # this is where we insert unrelated info.
    $ rm ../checkout_info # no longer needed
    $ gitk –all -d # notice there is nothing connecting the two tracks together.

  10. John Wiegley says:

    Ah, I actually use this feature too, to keep “side-band” data which is related to the repository, but doesn’t belong in any regular working tree.

    I could have added a note about it in the PDF, but I think it might have added a bit more complexity than necessary. This is the kind of thing for you to blog about so I can link there! :)

  11. Would it be possible to see the article in a format better suited to on-screen reading, like HTML or plain text? Maybe just the words without the pictures?

  12. John Wiegley says:

    I have no easy way to convert it to HTML at present.

  13. Excellent document even for people accustomed to Git, in order to have a deeper understanding of the beast.

    Keep up the good work!

  14. Ron says:

    Thank you very much for writing this. It really increased my understanding of git.

  15. Masci says:

    Thank you for this paper, it was very useful to me!

    I was wondering if you could license your work under a cc license (or something similar) so I could translate and share your work in italian language.

  16. John Wiegley says:

    Masci, consider it done. Just e-mail me so that I’m certain to get done what you need for the translation to happen.

  17. Juergen Salecker says:

    Very well written paper, thanks a lot. I was very glad to see that a lot of the needs of developers are taking into account by GIT.

    The funny stuff is that I have written a pattern language about CM and not all but most techniques are covered by GIT. If you would like to have a look into my paper, just drop me an e-mail.

    Conerning you expression about the “the index”, instead I would like to call this a “temporary repository” which might be a better fit for its itention. But any way now it is too late.

    However from my point of view “Staging area” might not be realy covered by “the index”. I see a staging area devloped with the use of a multi dimensional file system (UnionFS, or aufs) because then the time required to share a change with other developers is

    1) independent of the size of the code

    2) indepedent of the number of developers

    I significant adavantage for high speed developments, I have used something like for over a century.

    best regards

    Juergen

  18. iñigo says:

    Thank you very much for the work.

    Nice and very well thought. :-)

  19. Ben says:

    Thanks John

    Very helpful!

  20. Amr Mostafa says:

    Your bottom-up tutorial helped me grok git’s guts in an easy (and entertaining) way.

    Thank you a lot for taking the time to put together this quality document.

  21. Rich says:

    Thank you so much! This document was what I needed to actually get git. I knew it was good but without the understanding of how it works, it was a real confusion whenever anything went a bit wrong.

    Thank you.

  22. David Bruce says:

    Great exposition; many thanks.

    I do have one question, though. In “Branching and the power of rebase”, you note that “the “base” of the Z branch is A, while the base of the D branch continues back to some unlabeled commit in the past”. I’m probably being dense, but I don’t see how/why the “base”(s) of these two branches should be different: Why does the Z branch ‘stop’ at A when the D branch doesn’t?

  23. John Wiegley says:

    David, technically speaking you are right, they can both be viewed as independent branches with the same ancestor commit. I suppose I meant that Z stops at A only for the sake of considering Z a “branch off of A”. I’ll see about rewording it.

  24. Laust Rud says:

    Excellent writing, thanks!

    The url for the git-core tutorial has changed slightly – it is now at http://www.kernel.org/pub/software/scm/git/docs/gitcore-tutorial.html

  25. John Wiegley says:

    Thanks for the update. I’ll include this among the next round of edits.

  26. Thank you very much for this; as a bottom-up guy it made me understand git better than dozens of other tutorials. A small correction:

    > This means that when you create a tree from your index and store it under a commit (all of which is done by commit), you are also, inadvertently adding that commit to the reflog, which can be viewed using the following command:

    You got an extra comma after “also”.

  27. Uwe Kleine-König says:

    Hi John,

    a comment to the paragraph about reset:

    $ git reset HEAD foo.c

    actually isn’t a mixed reset. If you specify a path, HEAD is never touched.

    So

    $ git reset HEAD~3

    makes a (mixed) reset to HEAD~3, so your HEAD changes.

    $ git reset HEAD~3 foo.c

    only changes the index entry for foo.c though.

  28. Rudi Farkas says:

    Hello John

    Above, you say “The date at the front should read “Fri, 2 May 2008” if you have the latest version.”

    I took this to mean that the file http://ftp.newartisans.com/pub/git.from.bottom.up.pdf
    would carry that date.
    However, the file downloaded today contains “Thu, 11 Sep 2008″ just after the title.

    I am confused as to which or where is the latest version of your document.

  29. John Wiegley says:

    Thanks for letting me know, I’ll try to rectify the situation this week, as well as incorporating several recent corrections that were sent in.

  30. Tony says:

    this is an excellent write up; I’ve been reading much into git and i found this to be conceptually useful and pleasing to the visual whole of my brain.

    i am in the process on converting my company to git, and i intend on using git hooks to sync merges on specific branches to our dev server to ALL distributed production servers (web servers) transparently, verify syntax in certain files, and update our custom internal ticketing system. we will be able to develop, sync to staging, and sync to production without ever leaving the dev server.

    thank you for your investment into this document; i expect many others will find it as useful as i have.

  31. Sigi says:

    You deserve a lot of praise for this article. It’s well written and looks great. Far too few authors care about aesthetics although it makes a text so much easier to read and follow.

    I will give this to my interested co-workers.

    Thank you!

  32. Kyle Bennett says:

    John, thanks for the work you put into this. Since I am coming from SVN, I’m really struggling to “git” git, and this approach is usually what works better for me than tutorials and the like.

    One issue that is confusing me. You defined several terms at the beginning, but did not define “commit”. Coming from SVN, there appears to be a subtle but important difference. In paticular, this, from page six:

    “This first commit added my greeting file to the repository. It contains one Git
    tree…”

    is ambiguous. On first reading, I took “it” to refer to the repository, but subsequent statements are inconsistent with that, and seem to point toward “it” referring to the commit. Which further implies that a commit is more a piece of data than an action, or simply a change in the state of the single repository tree as it is in SVN.

    I’m moving on with a big asterisk next to my tentative understanding of what a commit actually is in the system.

  33. Przemek says:

    Great article, indeed.

    @Kyle: I’m a git newbie and SVN convert too, but looking at http://eagain.net/articles/git-for-computer-scientists/ I guess that each commit has one tree.

  34. mr.design says:

    Hi John,
    I just started to read your GFTBU, it’s so very well written and has such a good intro, really is the right way things should be taught. Brilliant. Just a feedback… thanks.-

  35. Rose says:

    This is wonderful, thank you so much for writing it!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.

Set your Twitter account name in your settings to use the TwitterBar Section.