“Foundations” is a little joke on a religious theme; our page borrows heavily from the git parable - so - why not a foundation myth?
On the first day - the repository and the working tree¶
I’m young and the world is fresh and I have lots of time.
I have so much time, that I decide that I want to write a book. The project I
am working on is a modest history and explanation of everything, with
provisional title “The Book”. It has a table of contents
contents.txt and a
single book chapter:
. ├── chapter1.txt └── contents.txt
As I start to write, I begin to think I should keep track of my changes. I need some sort of version control system. How hard can it be?
I start with some names. I’m going to call this set of files that I’m working
on, the working tree. Because I currently lack any shame about body issues, I
will call my new versioning system
ahole 1 .
I decide that I need to store the state of The Book at the end of each
day. To do this, I make a new directory in my working tree, called
This directory will store The Book as it evolves into a world-wide best-seller.
I will use the name repository for the contents of
At the end of the day, I make a copy of all the files in the working tree, and
save it in my new
.ahole repository. In fact, what I will do is, make a new
.ahole named for today’s date, then store a copy of the book
files in there. On unix, that might look like this:
mkdir .ahole/year0-jan-01 mkdir .ahole/year0-jan-01/files cp * .ahole/year0-jan-01/files
So I’ve still got the contents of The Book in the working tree, but now, in the repository, I have a copy of the files that is a snapshot of The Book as of today:
. ├── .ahole │ └── year0-jan-01 │ └── files │ ├── chapter1.txt │ └── contents.txt ├── chapter1.txt └── contents.txt
On the second day - staging and commits¶
Today I do some more work on the book. I start work on chapter 2, and, while
I’m thinking about things, I find that I am also writing some notes to myself
about this character “Eve” that I have seen wandering around. I save those
notes in a file called
something_about_eve.txt. When I get to the end of
the day, I get ready to store my work. At the moment, my directory looks like
. ├── .ahole │ └── year0-jan-01 │ └── files │ ├── chapter1.txt │ └── contents.txt ├── something_about_eve.txt ├── chapter2.txt ├── chapter1.txt └── contents.txt
For some reason I can’t put my finger on, I don’t want to put
something_about_eve.txt into the repository at the moment. In fact, in
general, I want to choose which changes I back up into the repository, and which
changes I leave for another day. In the end I come up with an idea. I’ll make
a directory in
staging_area. When I start work at the
beginning of the day, I copy the previous backed-up version of my files from the
staging_area. These files are now ready for storing in the
cp .ahole/year0-jan-01/* .ahole/staging_area
I now have:
├── .ahole │ ├─── staging_area │ │ ├── chapter1.txt │ │ └── contents.txt │ └── year0-jan-01 │ └── files │ ├── chapter1.txt │ └── contents.txt ├── something_about_eve.txt ├── chapter2.txt ├── chapter1.txt └── contents.txt
As I work, I decide what I’m going to put into tonight’s snapshot. For example,
maybe I changed
chapter1.txt and I think it’s ready to back up. I copy my
modified version of
chapter1.txt from the working tree to
I’ll call that stage-ing the file. I’ll also ‘stage’ the new
file (copy it to the staging area). I’m not going to stage
something_about_eve.txt at the moment.
Now I’ve done that, all the stuff I want to store in the backup is ready. I just need to put it into its own backup snapshot directory. To do that, I just do something like (Unix again):
mkdir .ahole/year0-jan-02 mkdir .ahole/year0-jan-02/files cp .ahole/staging_area/* .ahole/year0-jan-02/files
I end up with a directory that looks like this:
. ├── .ahole │ ├── year0-jan-02 │ │ └── files │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ ├─── staging_area │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ └── year0-jan-01 │ └── files │ ├── chapter1.txt │ └── contents.txt ├── something_about_eve.txt ├── chapter2.txt ├── chapter1.txt └── contents.txt
I decide that I’ll use the name commit for each of the daily snapshot
year0-jan-02). The action of adding files
to the staging area, I will call staging files for the commit. I will use
the term committing for the action of making the snapshot directory, and
copying the files from the staging area to the snapshot directory.
On the third day - history¶
As a result of certain events yesterday evening, I have a new friend, Eve. She
wants to help out. Of course Eve has her own computer, and I send her my
.ahole directory. I thank myself for my wisdom in not adding
something_about_eve.txt to the repository.
Eve checks out our book (reconstructs my working tree) with something like:
cp .ahole/year0-jan-02/files/* .
Now she’s got the book files as I committed them last night. She also copies the last commit files into the staging area, as I did:
├── .ahole │ ├─── staging_area │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt
She works hard on a new file
chapter1_discussion.txt. It’s good to see she’s
enjoying the work. As the afternoon turns to evening, she gets ready to save
her work, so she copies
Now she is ready to do a commit:
mkdir .ahole/year0-jan-03 mkdir .ahole/year0-jan-03/files cp .ahole/staging_area/* .ahole/year0-jan-03/files
That is what Eve was going to do, but Eve is smart, and she immediately realizes
that there is a problem. After she has done her commit, both of us will likely
have a commit directory
.ahole/year0-jan-03 - but they will have different
contents. If she later wants to share work with me, that could get confusing.
The two of us are a little tired after all our work, and we meet for a beer. We talk about it for a while. At first we think we can just add the time to the date, because that’s likely to be unique for each of us. Then we realize that that’s going to get messy too, because, if Eve does a commit on her computer, then I do a commit on mine, and she does another one on hers, the times will say that these are all in one sequence, but in fact there are two sequences, mine, and Eves. We need some other way to keep track of the sequence of commits, that will work even if two of us are working independently.
In the end we decide that we are going to give the commits some unique
identifier string instead of the date. We might have a problem in making sure
that the unique identifier string is actually unique, but let’s assume we can
solve that somehow. We’ll store the contents of the working tree in the same
way as we have done up till now, in the
files subdirectory, but we’ll add a
new file to each commit, called
info.txt, that will tell us who did the
commit, and when, and, most importantly, what the previous commit was. We’ll
call the previous commit the parent.
Eve was right to predict that I had made my own commit today. I’ve been happily working on chapter 3. So, before our conversation, my directory looked like this:
. ├── .ahole │ ├── year0-jan-03 │ │ └── files │ │ ├── chapter3.txt │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ ├─── staging_area │ │ ├── chapter3.txt │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ ├── year0-jan-02 │ │ └── files │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ └── year0-jan-01 │ └── files │ ├── chapter1.txt │ └── contents.txt ├── something_about_eve.txt ├── chapter3.txt ├── chapter2.txt ├── chapter1.txt └── contents.txt
but now we’ve worked out the new way, it looks like this:
. ├── .ahole │ ├── 5d89f8 │ │ ├── info.txt │ │ └── files │ │ ├── chapter3.txt │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ ├─── staging_area │ │ ├── chapter3.txt │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ ├── 7ef41f │ │ ├── info.txt │ │ └── files │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ └── 6438a4 │ ├── info.txt │ └── files │ ├── chapter1.txt │ └── contents.txt ├── something_about_eve.txt ├── chapter3.txt ├── chapter2.txt ├── chapter1.txt └── contents.txt
.ahole/5d89f8/info.txt looks like this:
committer = Adam message = Third day date = year0-jan-03 parent = 7ef41f
Meanwhile, Eve’s directory looks like this:
. ├── .ahole │ ├── 0a01a0 │ │ ├── info.txt │ │ └── files │ │ ├── chapter1_discussion.txt │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ ├─── staging_area │ │ ├── chapter1_discussion.txt │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ ├── 7ef41f │ │ ├── info.txt │ │ └── files │ │ ├── chapter2.txt │ │ ├── chapter1.txt │ │ └── contents.txt │ └── 6438a4 │ ├── info.txt │ └── files │ ├── chapter1.txt │ └── contents.txt ├── chapter1_discussion.txt ├── chapter2.txt ├── chapter1.txt └── contents.txt
.ahole/0a01a0/info.txt looks like this:
committer = Eve message = Eve day 3 date = year0-jan-03 parent = 7ef41f
After a little thought, Eve and I realize that, when we make our new commit, we
are going to have to know what the current commit is, so we can use that as the
parent. When we make a new commit, we store the commit identifier in a file.
We’ll call this file
.ahole/HEAD, so, after my last commit above, the file
.ahole/HEAD will have the contents
5d89f8. We use the contents of
.ahole/HEAD to identify the last (current) commit. And of course, when we
make a new commit, we can get the parent of the new commit, from the current
def ahole_commit(committer, message): # Make a unique identifier for this commit somehow new_id = make_unique_id() # Make a new directory in ahole with the new unique name commit_dir = '.ahole/' + new_id mkdir(commit_dir) mkdir(commit_dir + '/files') # Copy the files from the staging area to the new snapshot directory copy_tree('.ahole/staging_area', commit_dir + '/files') # Get previous (parent) commit id from .ahole/HEAD head_id = file('.ahole/HEAD').read() # Make info with parent set to HEAD info_str = 'committer = ' + committer + '\n' info_str += 'message = ' + message + '\n' info_str += 'date = ' + date.today() + '\n' info_str += 'parent = ' + head_id + '\n' # Write info to info.txt file info_file = file(commit_dir + '/info.txt', 'w') info_file.write(info_str) info_file.close() # Set .ahole/HEAD to contain new commit id file('.ahole/HEAD', 'w').write(new_id)
When we want to go back to an earlier state of the book, we can do a checkout, with something like:
def ahole_checkout(commit_id): commit_dir = '.ahole/' + commit_id # copy .ahole/$commit_id/files into working tree delete_tree('.') copy_tree(commit_dir + '/files', '.') # make .ahole/HEAD contain commit_id file('.ahole/HEAD', 'w').write(commit_id) # copy commit snapshot into staging area delete_tree('.ahole/staging_area') copy_tree(commit_dir + '/files', '.ahole/staging_area')
So, when we run
ahole_checkout('7ef41f`) we will get the copy of the working
tree corresponging to
.ahole/HEAD will just contain the
In our excitement, we immediately realize that it’s really easy to see the
history of the book now. We can easily fetch out
info.txt from the current
commit, print it, then find its parent, and fetch
info.txt from the parent,
print it, and so on.
Now we are tired, but happy, and we rest.
On the fourth day - references¶
We wake with a strange excitement. The idea, of keeping a reference to the
current commit in
.ahole/HEAD, seems that it could be more general. I
talk to Eve over breakfast (she stayed in her own place of course, but she came
over for work). Together we work out the concept of references. A reference
Something that points to a commit
.ahole/HEAD is a reference - to the current commit. But what if I decide
that I want to give out some preliminary version of our book. Let’s say I want
to release the book stored in
.ahole/7ef41f/files as ‘release-0.1’. I’m
going to send this out to all my friends (to be honest, I don’t have many
friends just yet, but still). I want to be able to remember what version of the
book I sent out. I can make a reference to this commit. I’ll call this a
tag. I make a new directory in
refs, and another
tags, and then, in
.ahole/refs/tags/release-0.1 I just put
7ef41f - a reference to the
release commit. That way, if I ever need to go back to the version of the book
I released, I just have to read the
release-0.1 file to find the commit, and
then checkout that commit.
Wait, but, there’s a problem. If I checkout the commit in
.ahole/HEAD, and I will lose track of what commit I was
working on before.
Let’s store that in another reference. Let’s use the name ‘master’ for my main
line of development. I store where this is, by making a new file
.ahole/refs/heads/master that is a reference to the last commit. It just
contains the text ‘5d89f8’. So that I know that I am working on ‘master’, I
.ahole/HEAD have the text
ref: refs/heads/master. Now, when I make
a new commit, I first check
.ahole/HEAD; if I see
refs/heads/master, then first, I get the commit id in
.ahole/refs/heads/master - and I use that as the parent id for the commit.
When I’ve saved the new commit, I set
.ahole/refs/heads/master to have the
new commit id. So, I need to modify my commit procedure slightly:
def ahole_commit(committer, message): # *** this stuff down to the next *** line is new # Get previous (parent) commit id from .ahole/HEAD head_contents = file('.ahole/HEAD').read() # Check if this is a reference, de-reference if so # Also, get file into which to write the new commit id if head_contents.startswith('ref: '): head_ref = head_contents.replace('ref: ', '') head_ref_file = '.ahole/' + head_ref head_id = file(head_ref_file).read() else: head_ref_file = '.ahole/HEAD' head_id = head_contents # *** the stuff below you've seen before (until *** again) # Make a unique identifier for this commit somehow new_id = make_unique_id() # Make a new directory in ahole with the new unique name commit_dir = '.ahole/' + new_id mkdir(commit_dir) mkdir(commit_dir + '/files') # Copy the files from the staging area to the new snapshot directory copy_tree('.ahole/staging_area', commit_dir + '/files') # Make info.txt with parent set to HEAD info_str = 'committer = ' + committer + '\n' info_str += 'message = ' + message + '\n' info_str += 'date = ' + date.today() + '\n' info_str += 'parent = ' + head_id + '\n' # Write info to info.txt file info_file = file(commit_dir + '/info.txt', 'w') info_file.write(info_str) info_file.close() # Set the file that points to the current commit, to point to our commit # *** a little new, in that we might be writing to .ahole/HEAD, or # something like .ahole/refs/heads/master, depending on what .ahole/HEAD # contained at the top of this routine file(head_ref_file, 'w').write(new_id)
So, let’s say that I’m currently on commit ‘5d89f8’.
I run my commit procedure:
ahole_commit('Adam', 'Night follows day')
The commit procedure has made a new commit ‘dfbeda’;
to have text
ref: refs/heads/master, but now
dfbeda. In this way, we keep track of which commit we are on, by
constantly updating ‘master’.
Ok - now let’s return to me checking out the released version of the book. I
first get the contents of
.ahole/refs/tags/release-0.1 - it’s ‘5d89f8’.
Then I checkout the working tree for that version, using my nice
The checkout procedure will make
.ahole/HEAD contain the text
Now I want to go back to working on my current version of the book. That’s the
set of files pointed to by
.ahole/refs/heads/master. I can
check the contents of
.ahole/refs/heads/master - it is
dfbeda. Then I
get the current version with the normal checkout procedure:
Finally, I’ll have to set
.ahole/HEAD to be
ref: refs/heads/master. All
Of course, I could automate this, by modifying my checkout procedure slightly:
def ahole_checkout(commit_reference): # If this is a reference, dereference if commit_reference in listdir('.ahole/refs/heads'): # it's a head reference, maybe 'master' head_reference = True fname = '.ahole/refs/heads/' + commit_reference commit_id = file(fname).read() elif commit_reference in listdir('.ahole/refs/tags'): # it's a tag reference head_reference = False fname = '.ahole/refs/tags/' + commit_reference commit_id = file(fname).read() else: # Just a standard commit id head_reference = False commit_id = commit_reference commit_dir = '.ahole/' + commit_id # copy .ahole/$commit_id/files into working tree delete_tree('.') copy_tree(commit_dir + '/files', '.') # make ahole/HEAD point to commit id if head_reference: # Point HEAD at head reference file('.ahole/HEAD').write('ref: refs/heads/' + commit_reference) # Write commit id into head reference file file('.ahole/refs/heads/' + commit_reference, 'w').write(commit_id) else: file('.ahole/HEAD', 'w').write(commit_id) # copy commit snapshot into staging area delete_tree('.ahole/staging_area') copy_tree(commit_dir + '/files', '.ahole/staging_area')
What then, is the difference, between a tag - like our release - and the moving target like ‘master’? The ‘tag’ is a static reference - it does not change when we do a commit and always points to the same commit. ‘master’ is a dynamic reference - in particular, it’s a head reference:
A head is a reference that updates when we do a commit
My head is hurting a little, after Eve explains all this, but after a little
while and a nice apple pie, I’m feeling positive about
On the fifth day - branches, merges and remotes¶
Yesterday was a little exhausting, so today there was some time for reflection.
As Eve and I relax with the other animals, who are all getting on very well with each other, we begin to realize that this head thing could be very useful.
For example, what if one of my very small number of friends tells me that there’s a serious conceptual error in the version of the book that I released - ‘release-0.1’. What if I want to go back and fix it - that is - do another commit on top of the released book, instead of the version of the book that I’m currently working on? I can just make a new head. I’ll do it like this:
cp .ahole/refs/tags/release-0.1 .ahole/refs/heads/working-on-0.1
Then, I look at what commit
working-on-0.1 contains - of course it’s
7ef41f. I get that state of the book with my new checkout procedure:
.ahole/HEAD to be
ref: refs/heads/working-on-0.1. Now,
when I do a commit with
ahole_commit, that will update the file
.ahole/refs/heads/working-on-0.1 to have the new commit identifier. Despite the
apple pie being a bit bitter last night, we’re feeling good.
As we think about this, we come to think of ‘master’ and ‘working-on-0.1’ as branches - because they can each be thought of as identifying a tree or graph of commits, which can grow. All I need, to make a new branch, is make a new head reference to a commit. For example, if I want to make new branch starting at the current position of ‘master’, all I need is:
cp .ahole/refs/tags/master .ahole/refs/heads/my-new-branch
If I want to work on this branch, I need to check it out, with:
That will get the commit identifier in
the commit tree into the working tree, and set
.ahole/HEAD to contain the
I’ve got my branches, but Eve will have her own branches, and this will help us know where each of us is working.
That’s good, because Eve is now asking me if I can have a look at her changes, and whether I’ll include them in my version of the book. Unwisely I end up suggesting that women don’t contribute to books, and ask her why her hair isn’t covered with an as-yet not-invented headscarf. In the end we patch it up, and I agree to go back and try and put in her changes.
Luckily, despite the lack of basics like clothing, there is an excellent local
network, so I can see the contents of her version of the book at
/eves_computer/our_book/.ahole. She wants me to look at her ‘master’
branch. Just because the network might fail, I need to fetch what I need from
her computer to mine. So, to keep track of things, I’ll make a new directory,
.ahole/refs/remotes/eve, and I’ll copy all her heads - in this case
master - to that directory. So now, I’ve got
.ahole/refs/remotes/eve/master, and in fact, it points to the commit that
she did on the third day; this was commit ‘0a01a0’. I don’t have this
commit in my
.ahole directory, so I’ll copy that from
/eves_computer/our_book/.ahole/0a01a0. I look in the
for that commit, and check what the parent is. It is ‘7ef41f’. I check if I
have that, and yes, I have, so I can stop copying stuff from Eve’s directory.
So, what I just did was:
Copy Eve’s head references from
For each of the references in
.ahole/refs/remotes/eve, I check whether I have the referenced commit, and the parents of that commit, and, if not, I copy them to
We decide to call that two-step sequence - a fetch.
Now I want to look at her version of the book. I have her head references and
the commits they point to, so I can checkout her latest version. I first get the
commit identifier from
.ahole/refs/remotes/eve/master - ‘0a01a0’. Then:
This will put ‘0a01a0’ into
.ahole/HEAD. I can look at her version of the
book, and decide if I like it. If I do, then I can do a merge.
What is a merge? It’s the join of two commits. First I work out where Eve’s
tree diverged from mine, by going back in her history, following the parents of
the commits. In this case it’s easy, because the parent commit (‘7ef41f’) of
this commit (‘0a01a0’) is one that is also in my history (the history for my
‘master’ branch). This most recent shared commit I will call the common
ancestor. Then I work out the difference between the common ancestor commit
(‘7ef41f’) and this commit (‘0a01a0’) - let’s call that
I go back to my own ‘master’ - which turns out to be
.ahole/refs/heads/master) - ‘dfbeda’:
This will change
.ahole/HEAD to be
ref: refs/heads/master - and I will
have just got the working tree from
.ahole/dfbeda/files. Then I take
eves_diff and apply it to my current working tree. If there were any
conflicts, I resolve them, but in my world, there are no conflicts. I have a
feeling there may be some later. That apple pie is making me feel a little
Finally, I make a new commit, with a new unique ID - say ‘80cc85’, with the merged working tree. But, there’s a trick: here the new commit ‘80cc85’ - has two parents, first - ‘dfbeda’ - the previous commit in my ‘master’, and second ‘0a01a0’ - the last commit in Eve’s master. Now, the next time I look at Eve’s tree, I will be able to see that I’ve got her ‘0a01a0’ commit in my own history, and won’t need to apply it again.
On the sixth day - saving time and space with objects¶
I am now very happy with
ahole, but Eve clearly doesn’t think we’ve got it
As she’s thinking, she decides to make a couple of illustrations for The Book, so she adds some photos to her working tree:
. ├── .ahole │ ... ├── images │ ├── adam_with_apple.jpg │ └── lion_with_lamb.jpg ├── chapter1_discussion.txt ├── chapter2.txt ├── chapter1.txt └── contents.txt
As soon as she does this, she realizes what’s wrong with
ahole. The photos
are large files. At the moment, every time we make a commit, we’re copying all
the files into the commit
files directory to make the snapshot. With big
files, this is going to lead to many identical copies and lots of wasted space.
Eve realizes that what we need to do, is to make the commit use references to files, rather than the files themselves. That way, when the commit has files that have not changed, it can just point to the unchanged file, rather than carrying a wasteful copy of the file.
If the commits just store references, we need a way to store the contents of the
files, so they can be referenced. Maybe we could store the files for our
snapshots in a directory, and use some sort of unique filename so that the
commits can reference that filename? For example, maybe we could make a
.ahole like this:
and use this directory to store the contents of the files for our snapshots.
Then we could store the commits as something like a table, where the entries
would tell us how to get the matching files from the
We could have some structure for the commits like this:
├── .ahole │ ├── 5d89f8 │ │ ├── info.txt │ │ └── file_list
.ahole/5d89f8/file_list would be a list of references to files in the
.ahole/objects directory, along with the filename that the contents has when
reconstructed back into the snapshot. For example, maybe
have a series of (object reference, filename) pairs like this:
contents_version1 contents.txt chapter1_version1 chapter1.txt chapter2_version2 chapter2.txt chapter3_version1 chapter3.txt chapter1_discussion_version1 chapter1_discussion.txt
These references in the first column could match filenames in the
│ ├── objects │ │ ├── chapter1_version1 │ │ ├── chapter2_version1 │ │ ├── chapter2_version2 │ │ ├── chapter3_version1 │ │ ├── chapter1_discussion_version1 │ │ └── contents_version1
We could think of the
.ahole/objects directory as a very simple form of
database, where the keys are the filenames, and the file contents are the
We think about this for a while and realize that it’s going to be annoying
trying to find unique names to use as filenames in
there will be many versions of many files. For example
chapter1_version3 and so on is clearly not going to work, because when Eve
and I work independently, at some point we’re both going to have something like
chapter1_version3 in our respective
.ahole/objects directories, but
they will be different, and that will be confusing.
At this stage, Eve reveals that she has some training in computer science. Of course I have no idea what that is, or who did the training, but she’s in too much of a rush to explain that now. She proposes that we make the filenames (database keys) by doing hashes of the file contents. It turns out that hashing algorithms can take a stream of bytes such as the contents of a file, and create a string that is near-enough unique to that stream of bytes. That’s really good, because it means that, if Eve and I have an object with the same filename (hash) that means it almost certainly contains the exact same contents.
Eve recommends the ‘SHA1’ hashing algorithm, and I’m in no position to disagree with her. Now we’ve got a unique string to use as a key for each file. For example, we run the SHA1 algorithm over the current book files and we get these:
If we change the file at all, then the hash changes, and we have a new unique string and therefore we have a new unique filename with which to store the new contents. For example, the original version of chapter 2 was a bit shorter, and had a hash of ‘1cf01a1dfbe135b6132362fa8e17eaefcaf00a7f’.
Now we have got a nice way of making the references that will go into
.ahole/5d89f8/file_list. First we store the file versions in our
.ahole/objects directory, using their hash values as filenames:
│ ├── objects │ │ ├── 9e398c7cf8d56e960aa7769839cc0c38b8e12f11 (chapter1 version 1) │ │ ├── 1cf01a1dfbe135b6132362fa8e17eaefcaf00a7f (chapter2 version 1) │ │ ├── 65735b3705284cdf4a66c2e4812ca13cbaa7cd5d (chapter2 version 2) │ │ ├── 3c2e09cc43568f13444c075c84b047957f7995a5 (chapter1_discussion version 1) │ │ └── f31bfa1225f9e0eb6741a0ab1122f8cd2cbedc04 (contents version 1)
Next we create
.ahole/5d89f8/file_list with one row per file in our
directory. Each row contains first - the hash value (and therefore filename in
.ahole/objects) which allows me to get the file contents, then the type of
thing this is - here a file - and lastly, the filename as it was in the
9e398c7cf8d56e960aa7769839cc0c38b8e12f11 file chapter1.txt 65735b3705284cdf4a66c2e4812ca13cbaa7cd5d file chapter2.txt 3c2e09cc43568f13444c075c84b047957f7995a5 file chapter1_discussion.txt f31bfa1225f9e0eb6741a0ab1122f8cd2cbedc04 file contents.txt
Now, what about Eve’s new working tree with the photos in it? The photos are in
images subdirectory, and we don’t have a way of storing subdirectories
yet. Aha - why not store directories in the object database too? Directories
can just be tree files like
file_list. tree files are lists, one entry
per row, where each row contains the hash reference for the file contents, the
type of thing it is (tree or file), and the filename as it was in the snapshot.
So, for Eve’s new commit, we’d first store the contents of the two photo files
│ ├── objects │ │ ├── 82e6792faa893070dcd6fe3e614b6f147be1a0a9 (adam_with_apple.jpg) │ │ ├── e8b23357995db47e70906d4c7a08114c0c0ba376 (lion_with_lamb.jpg) │ │ ├── 9e398c7cf8d56e960aa7769839cc0c38b8e12f11 (chapter1 version 1)
etc. Then we make a new tree file called - say - ‘images_listing’ like this:
82e6792faa893070dcd6fe3e614b6f147be1a0a9 file adam_with_apple.jpg e8b23357995db47e70906d4c7a08114c0c0ba376 file lion_with_lamb.jpg
and we make a hash for that tree file too, and put that into
│ ├── objects │ │ ├── be242dba385bc0689be16454e959f4b64c87abce (images_listing) │ │ ├── 82e6792faa893070dcd6fe3e614b6f147be1a0a9 (adam_with_apple.jpg) │ │ ├── e8b23357995db47e70906d4c7a08114c0c0ba376 (lion_with_lamb.jpg) │ │ ├── 9e398c7cf8d56e960aa7769839cc0c38b8e12f11 (chapter1 version 1)
etc. Now maybe our whole commit listing can include files and directories for the root directory of our project, something like:
9e398c7cf8d56e960aa7769839cc0c38b8e12f11 file chapter1.txt 65735b3705284cdf4a66c2e4812ca13cbaa7cd5d file chapter2.txt 3c2e09cc43568f13444c075c84b047957f7995a5 file chapter1_discussion.txt f31bfa1225f9e0eb6741a0ab1122f8cd2cbedc04 file contents.txt be242dba385bc0689be16454e959f4b64c87abce tree images
Oh - but wait - that’s just a tree listing too, let’s make a hash for that, and
put it into the
│ ├── objects │ │ ├── e52dc9dbe358c549df65307652ff2709322812b3 (root listing) │ │ ├── be242dba385bc0689be16454e959f4b64c87abce (images_listing) │ │ ├── 82e6792faa893070dcd6fe3e614b6f147be1a0a9 (adam_with_apple.jpg)
Right - so now our whole commit boils down to our
info.txt file, and the
hash for the root tree (the one starting ‘e52dc’ above). We can get rid of the
files subdirectory in the commit, and add the hash for the root tree
instead - something like:
committer = Eve message = Adding funny pictures date = year0-jan-06 root_tree = e52dc9dbe358c549df65307652ff2709322812b3 parent = 0a01a0
Now we can solve the annoying problem of finding an unique commit id for each
commit. We just make a hash for the
info.txt file, and put that into the
.ahole/objects directory too, as a commit file:
│ ├── objects │ │ ├── 7e0cda8c145b300b519ed28998a31f801b6d626f (latest commit) │ │ ├── e52dc9dbe358c549df65307652ff2709322812b3 (root listing) │ │ ├── be242dba385bc0689be16454e959f4b64c87abce (images_listing)
The unique id for the commit is the hash for its contents. In this case the commit id is ‘7e0cda8c145b300b519ed28998a31f801b6d626f’. Don’t forget that the hash is more or less unique to the contents, so this commit will have an id that is unique to the combination of the committer, message, date, root tree hash and commit parent. The root tree hash is unique to the contents of the root tree listing, and the root tree listing contains file hashes, which are in turn unique to the file contents, so the root tree hash will be unique to the file contents of the commit. Thus, the commit id is unique to all the things that go into the commit, including the contents. It’s clever isn’t it?
We can now have three types of files in the
.ahole/objects directory -
files, trees, and commits.
OK - so things are now a little more complicated than our previous setup with
file copies, but lots of things have just got much easier. For example, we can
now get rid of the
staging_area directory. The staging area can just be a
single file containing the root tree listing of the snapshot. Let’s call that
.ahole/index. Now Eve has done her new commit, that file can just be
the root directory listing of the previous commit (the commit we have just
9e398c7cf8d56e960aa7769839cc0c38b8e12f11 file chapter1.txt 65735b3705284cdf4a66c2e4812ca13cbaa7cd5d file chapter2.txt 3c2e09cc43568f13444c075c84b047957f7995a5 file chapter1_discussion.txt f31bfa1225f9e0eb6741a0ab1122f8cd2cbedc04 file contents.txt be242dba385bc0689be16454e959f4b64c87abce tree images
When Eve makes an edit to
chapter1.txt, instead of copying the file to the
staging_area directory, she makes a hash for the new
contents, she stores the new
chapter1.txt contents in the
directory using the hash as a filename, and then she edits the
file to point to her new chapter 1 contents instead of the old. She might
automate this with a small command like
def ahole_stage(fname): # Get the hash for the file contents file_contents = file(fname).read() file_hash = sha1_hash(file_contents) # (assuming that the new file is going in the root directory) new_root_entry = file_hash + ' file ' + fname root_listing = file('.ahole/index').read() if new_root_entry in root_listing: # This exact file contents and filename already present return # Make an entry for these file contents in the objects database database_fname = '.ahole/objects/' + file_hash file(database_fname, 'w').write(file_contents) # Write index listing with new entry root_listing = root_listing + new_root_entry + '\n' file('.ahole/index', 'w').write(root_listing)
Making a new commit involves taking the contents of
.ahole/index and using
it to make a new commit file in
.ahole/objects. Using the structure of our
ahole_commit routine, that might look like:
def ahole_commit(committer, message): # *** this stuff is the same as before *** # Get previous (parent) commit id from .ahole/HEAD head_contents = file('.ahole/HEAD').read() # Check if this is a reference, de-reference if so # Also, get file into which to write the new commit id if head_contents.startswith('ref: '): head_ref = head_contents.replace('ref: ', '') head_ref_file = '.ahole/' + head_ref head_id = file(head_ref_file).read() else: head_ref_file = '.ahole/HEAD' head_id = head_contents # *** the stuff below is different *** # Make root tree entry in objects database from .ahole/index index_contents = file('.ahole/index').read() index_hash = sha1_hash(index_contents) file('.ahole/objects/' + index_hash, 'w').write(index_contents) # Make commit information with parent set to HEAD info_str = 'committer = ' + committer + '\n' info_str += 'message = ' + message + '\n' info_str += 'date = ' + date.today() + '\n' info_str += 'root_tree = ' + index_hash + '\n' info_str += 'parent = ' + head_id + '\n' # Write commit file into objects database, with hash commit_hash = sha1_hash(info_str) file('.ahole/objects/' + commit_hash, 'w').write(info_str) # Set the current commit file to contain new id file(head_ref_file, 'w').write(commit_hash)
How about doing a merge? Remember that, in the bad old days, we had to compare lots of files between the branches, and the common ancestor? No more. Now we are using the hash file references, all we need to do, is look at the tree listing. If the tree listing has the same entry (filename and hash) that means that the file is identical between the two trees, and we don’t have to load the contents to check. That makes it very fast to do comparisons between trees that haven’t changed much.
Eve was right of course. Now, if we make a new commit, when one file is changed, all we store is the contents of the file that has changed and a new tree listing with the updated hash for the changed file. That makes the storage for lots and lots of similar trees very efficient.
Someone ought to write this up and give it to the world. Wait, that’s just us.
On the seventh day - there was git¶
The seventh day is for resting. You are all done now, and the hard stuff is over. In a state of deep inner peace, you can think about all that you’ve discovered in ahole:
A commit refers to a snapshot of the complete set of files for your project
The staging area (index) defines what will change between your upcoming commit and the previous commit
A branch is just a pointer to a commit, that moves when you do another commit.
Version control is very easy to understand
You remind yourself that life is very good, because you don’t have to use a version control system called ahole, you can use a very similar system called git.
If you use git, you’ll notice that you have lots of ahole friends. You’ll
see git creates a
.git subdirectory that contains the repository. You’ll
.git/objects directory containing filenames with SHA1 hashes.
You’ll see that commits have SHA1 hashes. You’ll recognize the
.git/refs/heads/master. There is a
.git/index file, and it is the
.git/index is a little more complicated than
because it’s adapted to helping with difficult merges, but it’s the same idea.
You now live in the garden of Eden of version control. Remember to stay away from that apple tree.
aholemight seem a bit rude to you, but I was born in the UK, and, where I come from, ‘ahole’ is roughly as rude as ‘git’.
In case you are interested, for the commit and checkout code to actually run, you would need some python definitions. First some standard python imports:
from datetime import date from os import mkdir, listdir
Then we need some simple custom commands for deleting our working tree, and for copying files into the working tree:
from os import remove from os.path import isfile, isdir from shutil import copyfile, copytree, rmtree def delete_tree(path): # Delete everything in path unless it's an '.ahole' directory for name in listdir(path): full_name = path + '/' + name if isfile(name): remove(full_name) elif isdir(name): if name != '.ahole': rmtree(full_name) def copy_tree(src_path, dst_path): # Copy everything in src_path to dst_path for name in listdir(src_path): src_name = src_path + '/' + name dst_name = dst_path + '/' + name if isfile(src_name): copyfile(src_name, dst_name) elif isdir(src_name): copytree(src_name, dst_name)
We also need some definition of
Now you need to add:
import hashlib def sha1_hash(contents): return hashlib.sha1_hash(contents)