Jan 28, 2009

Understanding Git Concepts

Git in fact is a file system with history.

All data is saved in git objects. All git objects has a 40bits id which generated by SHA-1 hashing the object's content. There're 4 types of objects:
  • blob object: File contents is saved in blob object. No filenames/permissions etc, only contents is saved here.
  • tree object: Directory structure is saved here. Tree object's content is just a list of its children, either blob object or tree object. A list item will contain either a SHA-1 hash point to a blob object with filename/permissions/etc. or a hash point to a tree object. Here we have got a data structure(tree) which can represent a file system.
  • commit object: Now we need history. A commit object simple contains a pointer to tree, one or many pointer to parents(also commits) and some booking data like commiter. Commit objects in fact forms a tree graph on a higher layer of blob/tree.
  • tag object: Tag object is just for referencing an object conviniently. A tag object can have a pointer point to any other git object and a tag, then you can use the tag to reference any object(like an important commit) in your git repo.
A git object is immutable. Another concept in git system is Reference, for referencing mutable things like branch and remote.
  • Branch is just a file in .git/refs/heads/ dir contains the SHA-1 hash of the most recent commit to that branch. When you create a branch in git, git just create a file contains a 40 bytes hash in .git/refs/heads/, and update .git/HEAD to point to it. With your development moves on, git will find current branch in HEAD and update the branch file in refs/heads correctly.
  • Remote is a pointer to branch(so it's also a branch) in other people's copies of the same repo. If you get the code by clone instead of 'git init', git will add a default 'origin/master' remote branch for you automatically. 'origin' point to the remote copy location, and 'master' means which branch on remote you cloned from.
When you ask for checking out, git will lookup the argument you provided in .git/refs or .git/HEAD, find the corresponding object/branch/tag/whatever, read the SHA-1 hash which points to a tree from its content, then traverse the tree.

A fetch will merge all updates on a remote branch to your local. By default it will merge in changes on origin/master, but you can fetch updates on other place like origin/cool. After a series of fetch/merge your history graph will looks like a mess, rebase will help. Rebase will leave orphan objects in your repo(you can use 'git gc' to clean it) and should not be used on a repo which can be fetched by others.

2 comments:

  1. Summarized by you? Good point.

    ReplyDelete
  2. Thanks. You can read git community book for more details, it's free.

    ReplyDelete