Curious git

In A curious tale, you built your own content management system. Now you have done that, you know how git works – because it works in exactly the same way as your own system. You will recognize hashes for files, directories and commits, commits linked by reference to their parents, the staging area, the objects directory, and bookmarks (branches).

Armed with this deep understanding, we retrace our steps to do the same content management tasks in git.

Basic configuration

We need to tell git our name and email address before we start.

Git will use this information to fill in the author information in each commit message, so we don’t have to type it out every time.

$ git config --global user.name "Matthew Brett"
$ git config --global user.email "matthew.brett@gmail.com"

The --global flag tells git to store this information in its default configuration file for your user account. On Unix (e.g. OSX and Linux) this file is .gitconfig in your home directory. Without the --global flag, git only applies the configuration to the particular repository you are working in.

Every time we make a commit, we need to type a commit message. Git will open our text editor for us to type the message, but first it needs to know what text editor we prefer. Set your own preferred text editor here:

# gedit is a reasonable choice for Linux
# "vi" is the default.
git config --global core.editor gedit

Next we set the name of the default branch. We will explain branches later on, but, for now, just apply this configuration to be compatible with newer versions of Git:

$ # Set the default branch name to "main"
$ git config --global init.defaultBranch main

We also turn on the use of color, which is very helpful in making the output of git easier to read:

$ git config --global color.ui "auto"

Getting help

$ git help
usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           [--super-prefix=<path>] [--config-env=<name>=<envvar>]
           <command> [<args>]

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)
   clone             Clone a repository into a new directory
   init              Create an empty Git repository or reinitialize an existing one

work on the current change (see also: git help everyday)
   add               Add file contents to the index
   mv                Move or rename a file, a directory, or a symlink
   restore           Restore working tree files
   rm                Remove files from the working tree and from the index
   sparse-checkout   Initialize and modify the sparse-checkout

examine the history and state (see also: git help revisions)
   bisect            Use binary search to find the commit that introduced a bug
   diff              Show changes between commits, commit and working tree, etc
   grep              Print lines matching a pattern
   log               Show commit logs
   show              Show various types of objects
   status            Show the working tree status

grow, mark and tweak your common history
   branch            List, create, or delete branches
   commit            Record changes to the repository
   merge             Join two or more development histories together
   rebase            Reapply commits on top of another base tip
   reset             Reset current HEAD to the specified state
   switch            Switch branches
   tag               Create, list, delete or verify a tag object signed with GPG

collaborate (see also: git help workflows)
   fetch             Download objects and refs from another repository
   pull              Fetch from and integrate with another repository or a local branch
   push              Update remote refs along with associated objects

'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help <command>' or 'git help <concept>'
to read about a specific subcommand or concept.
See 'git help git' for an overview of the system.

Try git help add for an example.

Note

The git help pages are famously hard to read if you don’t know how git works. One purpose of this tutorial is to explain git in such a way that it will be easier to understand the help pages.

Initializing the repository directory

We first set this nobel_prize directory to be version controlled with git. We start off the working tree with the original files for the paper:

Note

I highly recommend you type along. Why not download nobel_prize.zip and unzip the files to make the same nobel_prize directory as I have here?

nobel_prize
├── clever_analysis.py [618B]
├── expensive_data.csv [244K]
└── fancy_figure.png [183K]

To get started with git, create the git repository directory with git init:

$ cd nobel_prize
$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/nobel_prize/.git/

What happened when we did git init? Just what we were expecting; we have a new repository directory in nobel_prize called .git

.git
├── refs
│   ├── tags
│   └── heads
├── objects
│   ├── pack
│   └── info
├── info
│   └── exclude [240B]
├── hooks
│   (13 files)
├── HEAD [21B]
├── config [137B]
└── description [73B]

The objects directory looks familiar. It has exactly the same purpose as it did for your SAP system. At the moment it contains a couple of empty directories, because we have not added any objects yet.

Updating terms for git

Working directory

The directory containing the files you are working on. In our case this is nobel_prize. It contains the repository directory, named .git.

Repository directory

Directory containing all previous commits (snapshots) and git private files for working with commits. The directory has name .git by default, and almost always in practice.

git add – put stuff into the staging area

In the next few sections, we will do our first commit (snapshot).

First we will put the files for the commit into the staging area.

The command to put files into the staging area is git add.

To start, we show ourselves that the staging area is empty. We haven’t yet discussed the git implementation of the staging area, but this command shows us which files are in the staging area.

$ git ls-files --stage

As expected, there are no files in the staging area yet.

Note

git ls-files is a specialized command that you will not often need in your daily git life. I’m using it here to show you how git works.

Now we do our add:

$ git add clever_analysis.py

Sure enough:

$ git ls-files --stage
100755 6560135a5943c0509608fee6d900b775e3041197 0	clever_analysis.py

The git staging area

It is time to think about what the staging area is, in git. In your SAP system, the staging area was a directory. You also started off by using directories to store commits (snapshots). Later you found you could do without the commit directories, because you could store the files in repo/objects and the directory structure in directory_listing.txt text files.

In git, the staging area is a single file called .git/index. This file contains a directory listing that is the equivalent of the staging directory in SAP. When we add a file to the staging area, git backs up the file with its hash to .git/objects, and then changes the directory listing inside .git/index to point to this backup copy.

If all that is true, then we now expect to see a) a new file .git/index containing the directory listing and b) a new file in the .git/objects directory corresponding to the hash for the clever_analysis.py file. We saw from the output of git ls-files --stage above that the hash for clever_analysis.py is 6560135a5943c0509608fee6d900b775e3041197. So – do we see these files?

First – there is now a new file .git/index that was not present in our first listing of the .git directory above:

$ ls .git/index
.git/index

Second, there is a new directory and file in .git/objects:

objects
├── pack
├── info
└── 65
    └── 60135a5943c0509608fee6d900b775e3041197 [335B]

The directory and filename in .git/objects come from the hash of clever_analysis.py. The first two digits of the hash form the directory name and the rest of the digits are the filename 3. So, the file .git/objects/65/60135a5943c0509608fee6d900b775e3041197 is the copy of clever_analysis.py that we added to the staging area.

For extra points, what do you think would happen if we deleted the .git/index file (answer 1)?

Git objects

Git objects are nearly as simple as the objects you were writing in your SAP. The hash is not the hash of the raw file, but the raw file prepended with a short housekeeping string. See Reading git objects for details.

We can see the contents of objects with the command git cat-file -p. For example, here are the contents of the backup we just made of clever_analysis.py:

$ git cat-file -p 6560135a5943c0509608fee6d900b775e3041197
# The brain analysis script
import numpy as np

import matplotlib.pyplot as plt

FUDGE = 42

# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')

# First column is something from world, second is something from brain
from_world, from_brain = data.T

# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi

# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
plt.plot(from_world, from_brain_processed, 'bx')
plt.xlabel('Data from the outside world')
plt.ylabel('Data from inside the brain')
plt.title('Important finding')
plt.savefig('fancy_figure.png')

Note

I will use git cat-file -p to display the content of nearly raw git objects, to show the simplicity of git’s internal model, but cat-file is a specialized command that you won’t use much in daily work.

Just as we expected, it is the current contents of the clever_analysis.py.

The 6560135a5943c0509608fee6d900b775e3041197 object is a hashed, stored raw file. Because the object is a stored file rather than a stored directory listing text file or commit message text file, git calls this type of object a blob – for Binary Large Object. You can get the object type from the object hash with the -t flag to git cat-file:

$ git cat-file -t 6560135a5943c0509608fee6d900b775e3041197
blob

Hash values can usually be abbreviated to seven characters

We only need to give git enough hash digits for git to identify the object uniquely. 7 digits is nearly always enough, as in:

$ git cat-file -p 6560135
# The brain analysis script
import numpy as np

import matplotlib.pyplot as plt

FUDGE = 42

# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')

# First column is something from world, second is something from brain
from_world, from_brain = data.T

# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi

# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
plt.plot(from_world, from_brain_processed, 'bx')
plt.xlabel('Data from the outside world')
plt.ylabel('Data from inside the brain')
plt.title('Important finding')
plt.savefig('fancy_figure.png')

git status – showing the status of files in the working tree

The working tree is the contents of the nobel_prize directory, excluding the .git repository directory.

git status tells us about the relationship of the files in the working tree to the repository and staging area.

We have done a git add on clever_analysis.py, and that added the file to the staging area. We can see that this happened with git status:

$ git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   clever_analysis.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	expensive_data.csv
	fancy_figure.png

Sure enough, the output tells us that new file: clever_analysis.py is in the changes to be committed. It also tells us that the other two files in the working directory are untracked.

An untracked file is a file with a filename that is not listed the staging area directory listing. Until you run git add on an untracked file, git will ignore these files and assume you don’t want to keep track of them.

Staging the other files with git add

We do want to keep track of the other files, so we stage them:

$ git add fancy_figure.png
$ git add expensive_data.csv
$ git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   clever_analysis.py
	new file:   expensive_data.csv
	new file:   fancy_figure.png

We have staged all three of our files. We have three objects in .git/objects:

objects
├── pack
├── info
├── 7b
│   └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 65
│   └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
    └── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]

git commit – making the snapshot

[desktop]$ git commit -m "First backup of my amazing idea"
[main (root-commit) 75206bc] First backup of my amazing idea
 3 files changed, 5023 insertions(+)
 create mode 100755 clever_analysis.py
 create mode 100644 expensive_data.csv
 create mode 100644 fancy_figure.png

Note

In the line above, I used the -m flag to specify a message at the command line. If I had not done that, git would open the editor I specified in the git config step above and ask me to enter a message. I’m using the -m flag so the commit command runs without interaction in this tutorial, but in ordinary use, I virtually never use -m, and I suggest you don’t either. Using the editor for the commit message allows you to write a more complete commit message, and gives feedback about the git status of the commit to remind you what you are about to do.

Following the logic of your SAP system, we expect that the action of making the commit will generate two new files in .git/objects, one for the directory listing text file, and another for the commit message:

objects
├── pack
├── info
├── ff
│   └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│   └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│   └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│   └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
    └── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]

Here is the contents of the commit message text file for the new commit. Git calls this a commit object:

$ git cat-file -p 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
tree ffc871b48a6b9df8dc4a13e8e5da99ccf2ce458d
author Matthew Brett <matthew.brett@gmail.com> 1333287013 +0100
committer Matthew Brett <matthew.brett@gmail.com> 1333287013 +0100

First backup of my amazing idea
$ # What type of git object is this?
$ git cat-file -t 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
commit

As for SAP, the commit message file contains the hash for the directory tree file (tree), the hash of the parent (parent) (but this commit has no parents), the author, date and time, and the note.

Here’s the contents of the directory listing text file for the new commit. Git calls this a tree object.

$ git cat-file -p ffc871b48a6b9df8dc4a13e8e5da99ccf2ce458d
100755 blob 6560135a5943c0509608fee6d900b775e3041197	clever_analysis.py
100644 blob 7b37886351b3df2463fd29c87bc5184b637f0926	expensive_data.csv
100644 blob 1ed447c15c125991b8a292bdb433aaf19998a3e9	fancy_figure.png
$ git cat-file -t ffc871b48a6b9df8dc4a13e8e5da99ccf2ce458d
tree

Each line in the directory listing gives the file permissions, the type of the entry in the directory (where “tree” means a sub-directory, and “blob” means a file), the file hash, and the file name (see Types of git objects).

git log – what are the commits so far?

$ git log
commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Sun Apr 1 14:30:13 2012 +0100

    First backup of my amazing idea

Notice that git log identifies each commit with its hash. The hash is the hash for the contents of the commit message. As we saw above, the hash for our commit was 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8.

We can also ask to see the parents of each commit in the log:

$ git log --parents
commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Sun Apr 1 14:30:13 2012 +0100

    First backup of my amazing idea

Why are the output of git log and git log --parents the same in this case? (answer 2).

git branch - which branch are we on?

Branches are bookmarks. They associate a name (like “my_bookmark” or “main”) with a commit (such as 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8).

The default branch (bookmark) for git is called main. Git creates it automatically when we do our first commit.

$ git branch
* main

Asking for more verbose detail shows us that the branch is pointing to a particular commit (where the commit is given by a hash):

$ git branch -v
* main 75206bc First backup of my amazing idea

In this case git abbreviated the 40 character hash to the first 7 digits, because these are enough to uniquely identify the commit.

A branch is nothing but a name that points to a commit. In fact, git stores branches as we did in SAP, as tiny text files, where the filename is the name of the branch, and the contents is the hash of the commit that it points to:

$ ls .git/refs/heads
main
$ cat .git/refs/heads/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8

We will soon see that, if we are working on a branch, and we do a commit, then git will update the branch to point to the new commit.

A second commit

In our second commit, we will add the first draft of the Nobel prize paper. As before, you can download this from nobel_prize.md. If you are typing along, download nobel_prize.md to the nobel_prize directory.

The staging area does not have an entry for nobel_prize.md, so git status identifies this file as untracked:

$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	nobel_prize.md

nothing added to commit but untracked files present (use "git add" to track)

We add the file to the staging area with git add:

$ git add nobel_prize.md

Now git status records this file being in the staging area, by listing it under “changes to be committed”:

$ git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   nobel_prize.md

Finally we move the changes from the staging area into a commit with git commit:

[desktop]$ git commit -m "Add first draft of paper"
[main 7919d37] Add first draft of paper
 1 file changed, 29 insertions(+)
 create mode 100644 nobel_prize.md

Git shows us the first 7 digits of the new commit hash in the output from git commit – these are 7919d37.

Notice that the position of the current main branch is now this last commit:

$ git branch -v
* main 7919d37 Add first draft of paper
$ cat .git/refs/heads/main
7919d37dda9044f00cf2dc0677eed18156f75404

We use git log to look at our short history.

$ git log
commit 7919d37dda9044f00cf2dc0677eed18156f75404
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Mon Apr 2 18:03:00 2012 +0100

    Add first draft of paper

commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Sun Apr 1 14:30:13 2012 +0100

    First backup of my amazing idea

We add the --parents flag to show that the second commit points back to the first via its hash. Git lists the parent hash after the commit hash:

$ git log --parents
commit 7919d37dda9044f00cf2dc0677eed18156f75404 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Mon Apr 2 18:03:00 2012 +0100

    Add first draft of paper

commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Sun Apr 1 14:30:13 2012 +0100

    First backup of my amazing idea

git diff – what has changed?

Our next commit will have edits to the clever_analysis.py script. We will also refresh the figure with the result of running the script.

I open the clever_analysis.py file in text editor and adjust the fudge factor, add a new fudge factor, and apply the new factor to the data.

Now I’ve done these edits, I can ask git diff to show me how the files in my working tree differ from the files in the staging area.

Remember, the files the staging area knows about so far are the files as of the last commit.

$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 6560135..99cd07b 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -3,7 +3,8 @@ import numpy as np
 
 import matplotlib.pyplot as plt
 
-FUDGE = 42
+FUDGE = 106
+MORE_FUDGE = 2.0
 
 # Load data from the brain
 data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -13,6 +14,8 @@ from_world, from_brain = data.T
 
 # Process data
 from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
+# Apply the new factor
+from_brain_processed = from_brain_processed / MORE_FUDGE
 
 # Make plot
 plt.plot(from_world, from_brain_processed, 'r:')

A - at the beginning of the git diff output means I have removed this line. A + at the beginning means I have added this line. As you see I have edited one line in this file, and added three more.

Open your text editor and edit clever_analysis.py. See if you can replicate my changes by editing the file, and checking with git diff.

Now check the status of clever_analysis.py with:

$ git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   clever_analysis.py

no changes added to commit (use "git add" and/or "git commit -a")

You need to git add a file to put it into the staging area

Remember that git only commits stuff that you have added to the staging area.

git status tells us that clever_analysis.py has been “modified”, and that these changes are “not staged for commit”.

There is a version of clever_analysis.py in the staging area, but it is the version of the file as of the last commit, and so that version is different from the version we have in the working tree.

If we try to do a commit, git will tell us there is nothing to commit, because there is nothing new in the staging area:

$ git commit
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   clever_analysis.py

no changes added to commit (use "git add" and/or "git commit -a")

To stage this version of clever_analysis.py we use git add:

$ git add clever_analysis.py

Git status now shows these changes as “Changes to be committed”.

$ git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   clever_analysis.py

We can update the figure by running the analysis_script.py script. The script analyzes the data and writes the figure to the current directory. If you have Python installed, with the numpy and matplotlib packages, you can run the analysis yourself with:

python clever_analysis.py

If not, you can download a version of the figure I generated earlier. After you have generated or downloaded the figure:

$ git add fancy_figure.png

Do a final check with git status, then make the commit with:

[desktop]$ git commit -m "Add another fudge factor"
[main 003c54a] Add another fudge factor
 2 files changed, 4 insertions(+), 1 deletion(-)
 rewrite fancy_figure.png (96%)

The branch bookmark has moved again:

$ git branch -v
* main 003c54a Add another fudge factor

An ordinary day in gitworld

We now have the main commands for daily work with git;

  • Make some changes in the working tree;

  • Check what has changed with git status;

  • Review the changes with git diff;

  • Add changes to the staging area with git add;

  • Make the commit with git commit.

Commit four

For our next commit, we will add some more changes to the analysis script and figure, and add a new file, references.bib.

To follow along, first download references.bib.

Next, edit clever_analysis.py again, to make these changes:

$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 99cd07b..e5b2efa 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -5,6 +5,7 @@ import matplotlib.pyplot as plt
 
 FUDGE = 106
 MORE_FUDGE = 2.0
+NUDGE_FUDGE = 1.25
 
 # Load data from the brain
 data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -14,8 +15,8 @@ from_world, from_brain = data.T
 
 # Process data
 from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
-# Apply the new factor
-from_brain_processed = from_brain_processed / MORE_FUDGE
+# Apply the new factor(s)
+from_brain_processed = from_brain_processed / MORE_FUDGE / NUDGE_FUDGE
 
 # Make plot
 plt.plot(from_world, from_brain_processed, 'r:')

Finally regenerate fancy_figure.png, or download the updated copy from here.

What will git status show now?

$ git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   clever_analysis.py
	modified:   fancy_figure.png

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	references.bib

no changes added to commit (use "git add" and/or "git commit -a")

The staging area does not list a file called references.bib so this file is “untracked”. The staging area does contain an entry for clever_analysis.py and fancy_figure.png, so these files are tracked. Git has checked the hashes for these files, and they are different from the hashes in the staging area, so git knows these files have changed, compared to the versions listed in the staging area.

Before we add our changes, we confirm that they are as we expect with:

$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 99cd07b..e5b2efa 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -5,6 +5,7 @@ import matplotlib.pyplot as plt
 
 FUDGE = 106
 MORE_FUDGE = 2.0
+NUDGE_FUDGE = 1.25
 
 # Load data from the brain
 data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -14,8 +15,8 @@ from_world, from_brain = data.T
 
 # Process data
 from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
-# Apply the new factor
-from_brain_processed = from_brain_processed / MORE_FUDGE
+# Apply the new factor(s)
+from_brain_processed = from_brain_processed / MORE_FUDGE / NUDGE_FUDGE
 
 # Make plot
 plt.plot(from_world, from_brain_processed, 'r:')
diff --git a/fancy_figure.png b/fancy_figure.png
index 81fc437..d1f40df 100644
Binary files a/fancy_figure.png and b/fancy_figure.png differ

Notice that git does not try and show the line-by-line differences between the old and new figures, guessing correctly that this is a binary and not a text file.

Now we have reviewed the changes, we add them to the staging area and commit:

$ git add references.bib
$ git add clever_analysis.py
$ git add fancy_figure.png
[desktop]$ git commit -m "Change analysis and add references"
[main 2d9e1df] Change analysis and add references
 3 files changed, 13 insertions(+), 2 deletions(-)
 rewrite fancy_figure.png (89%)
 create mode 100644 references.bib

The branch bookmark has moved to point to the new commit:

$ git branch -v
* main 2d9e1df Change analysis and add references

Undoing a commit with git reset

As you found in the SAP story, this last commit doesn’t look quite right, because the commit message refers to two different types of changes. With more git experience, you will likely find that you like to break your changes into commits where the changes have a particular theme or purpose. This makes it easier to see what happened when you look over the history and the commit messages with git log.

So, as in the SAP story, you decide to undo the last commit, and replace it with two commits:

  • One commit to add the changes to the script and figure;

  • Another commit on top of the first, to add the references file.

In the SAP story, you had to delete a snapshot directory manually, and reset the staging area directory to have the contents of the previous commit. In git, all we have to do is reset the current main branch bookmark to point to the previous commit. By default, git will also reset the staging area for us. The command to move the branch bookmark is git reset.

Pointing backwards in history

The commit that we want the branch to point to is the previous commit in our commit history. We can use git log to see that this commit has hash 003c54a. So, we could do our reset with git reset 003c54a. There is a simpler and more readable way to write this common idea, of one commit back in history, and that is to add ~1 to a reference. For example, to refer to the commit that is one step back in the history from the commit pointed to by the main branch, you can write main~1. Because main points to commit 2d9e1df, you could also append the ~1 to 2d9e1df. You can imagine that main~2 will point two steps back in the commit history, and so on.

So, a readable reset command for our purpose is:

$ git reset main~1
Unstaged changes after reset:
M	clever_analysis.py
M	fancy_figure.png

Notice that the branch pointer now points to the previous commit:

$ git branch -v
* main 003c54a Add another fudge factor

Remember in SAP that your procedure for breaking up the snapshot was to 1) delete the old snapshot and 2) reset the staging area to reflect the previous commit. After you did this, the working tree contains your changes, but the staging area does not. You could make your new commits in the usual way, by adding to the staging area, and doing the commits.

Notice that git reset has done the same thing. It has reset the staging area to the state as of the older commit, but it has left the working tree alone. That means that git status will show us the changes in the working tree compared to the commit we have just reset to:

$ git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   clever_analysis.py
	modified:   fancy_figure.png

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	references.bib

no changes added to commit (use "git add" and/or "git commit -a")

We have the changes from our original fourth commit in our working tree, but we have not staged them. We are ready to make our new separate commits.

A new fourth commit

As we planned, we make a commit by adding only the changes from the script and figure:

$ git add clever_analysis.py
$ git add fancy_figure.png
[desktop]$ git commit -m "Change parameters of analysis"
[main d0ef727] Change parameters of analysis
 2 files changed, 3 insertions(+), 2 deletions(-)
 rewrite fancy_figure.png (89%)

Notice that git status now tells us that we still have untracked (and therefore not staged) changes in our working tree:

$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	references.bib

nothing added to commit but untracked files present (use "git add" to track)

The fifth commit

To finish our work splitting the fourth commit into two, we add and commit the references.bib file:

$ git add references.bib
[desktop]$ git commit -m "Add references"
[main c3f19b2] Add references
 1 file changed, 10 insertions(+)
 create mode 100644 references.bib

Getting a file from a previous commit – git checkout

In the SAP story, we found that the first version of the analysis script was correct, and we made a new commit after restoring this version from the first snapshot.

As you can imagine, git allows us to do that too. The command to do this is git checkout

If you have a look at git checkout --help you will see that git checkout has two roles, described in the help as “Checkout a branch or paths to the working tree”. We will see checking out a branch later, but here we are using checkout in its second role, to restore files to the working tree.

We do this by telling git checkout which version we want, and what file we want. We want the version of clever_analysis.py as of the first commit. To find the first commit, we can use git log. To make git log a bit less verbose, I’ve added the --oneline flag, to print out one line per commit:

$ git log --oneline
c3f19b2 Add references
d0ef727 Change parameters of analysis
003c54a Add another fudge factor
7919d37 Add first draft of paper
75206bc First backup of my amazing idea

Now we have the abbreviated commit hash for the first commit, we can checkout that version to the working tree:

$ git checkout 75206bc clever_analysis.py
Updated 1 path from ffc871b

We also want the previous version of the figure:

$ git checkout 75206bc fancy_figure.png
Updated 1 path from ffc871b

Notice that the checkout also added the files to the staging area:

$ git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   clever_analysis.py
	modified:   fancy_figure.png

We are ready for our sixth commit:

[desktop]$ git commit -m "Revert to original script & figure"
[main 677c9a6] Revert to original script & figure
 2 files changed, 1 insertion(+), 5 deletions(-)
 rewrite fancy_figure.png (97%)

Using bookmarks – git branch

We are at the stage in the SAP story where Josephine goes away to the conference.

Let us pretend that we are Josephine, and that we have taken a copy of the nobel_prize directory to the conference. The copy includes the .git subdirectory, containing the git repository.

Now we (as Josephine) start doing some work. We don’t want to change the previous bookmark, which is main:

$ git branch -v
* main 677c9a6 Revert to original script & figure

We would like to use our own bookmark, so we can make changes without affecting anyone else. To do this we use git branch with arguments:

$ git branch josephines-branch main

The first argument is the name of the branch we want to create. The second is the commit at which the branch should start. Now we have a new branch, that currently points to the same commit as main:

$ git branch -v
  josephines-branch 677c9a6 Revert to original script & figure
* main              677c9a6 Revert to original script & figure

The new branch is nothing but a text file pointing to the commit:

$ cat .git/refs/heads/josephines-branch
677c9a6ac3004e5d84d9739751a89867e09238f4

Now we have two branches, git needs to know which branch we are working on. The asterisk next to main in the output of git branch means that we are working on main at the moment. If we make another commit, it will update the main bookmark.

Git stores the current branch in the file .git/HEAD:

$ cat .git/HEAD
ref: refs/heads/main

Git commands often allow you to write HEAD meaning “the branch or commit you are currently working on”. For example, git log HEAD means “show the log starting at the branch or commit you are currently working on”. In fact, this is also the default behavior of git log.

We now want to make josephines-branch current, so any new commits will update josephines-branch instead of main.

Changing the current branch with git checkout

We previously saw that git checkout <commit> <filename> will get the file <filename> as of commit <commit>, and restore it to the working tree. This was the second of the two uses of git checkout. We now come to the first and most common use of git checkout, which is to:

  • change the current branch to a given branch or commit;

  • restore the working tree and staging area to the file versions from the given commit.

We are about to do git checkout josephines-branch. When we do this, we are only going to see the first of these two effects, because main and josephines-branch point to the same commit, and so have the same file contents:

$ git checkout josephines-branch
Switched to branch 'josephines-branch'

The asterisk has now moved to josephines-branch:

$ git branch -v
* josephines-branch 677c9a6 Revert to original script & figure
  main              677c9a6 Revert to original script & figure

This is because the file HEAD now points to josephines-branch:

$ cat .git/HEAD
ref: refs/heads/josephines-branch

If we do a commit, git will update josephines-branch, not main.

Making commits on branches

Josephine did some edits to the paper. If you are typing along, make these changes to nobel_prize.md:

$ git diff
diff --git a/nobel_prize.md b/nobel_prize.md
index 3ef5df2..19fd4c5 100644
--- a/nobel_prize.md
+++ b/nobel_prize.md
@@ -5,6 +5,12 @@
 The brain thinks in straight lines once you do some poorly-motivated
 corrections on some brain recordings.
 
+Other people have done brain recordings and claimed that they were
+interesting, but they are not as interesting as our recordings.
+
+In our previous work, we have done some other brain recordings, that were also
+interesting, but in a different way.
+
 == Methods
 
 We took some recordings of someone's brain while we showed them stuff

As usual, we add the file to the staging area, and check the status of the working tree:

$ git add nobel_prize.md
$ git status
On branch josephines-branch
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   nobel_prize.md

Finally we make the commit:

[desktop]$ git commit -m "Expand the introduction"
[josephines-branch 0a5a1b9] Expand the introduction
 1 file changed, 6 insertions(+)

The main branch has not changed, but josephines-branch changed to point to the new commit:

$ git branch -v
* josephines-branch 0a5a1b9 Expand the introduction
  main              677c9a6 Revert to original script & figure

Now we go back to being ourselves, working in the lab. We change back to the main branch:

$ git checkout main
Switched to branch 'main'

The asterisk now points at main:

$ git branch -v
  josephines-branch 0a5a1b9 Expand the introduction
* main              677c9a6 Revert to original script & figure

If you look at the contents of nobel_prize.md in the working directory, you will see that we are back to the contents before Josephine’s changes. This is because git checkout main reverted the files to their state as of the last commit on the main branch.

Now we make our own changes to the script and figure. Here are the changes to the script:

$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 6560135..cf163af 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -4,6 +4,7 @@ import numpy as np
 import matplotlib.pyplot as plt
 
 FUDGE = 42
+HOT_FUDGE = 1.707
 
 # Load data from the brain
 data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -13,6 +14,7 @@ from_world, from_brain = data.T
 
 # Process data
 from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
+from_brain_processed = from_brain_processed / HOT_FUDGE
 
 # Make plot
 plt.plot(from_world, from_brain_processed, 'r:')

If you are typing along, then you will also want to regenerate the figure with python clever_analysis.py or download the new version.

This gives:

$ git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   clever_analysis.py
	modified:   fancy_figure.png

no changes added to commit (use "git add" and/or "git commit -a")

As usual, we add the files and do the commit:

$ git add clever_analysis.py
$ git add fancy_figure.png
[desktop]$ git commit -m "More fun with fudge"
[main 7598e4e] More fun with fudge
 2 files changed, 2 insertions(+)
 rewrite fancy_figure.png (96%)

Because HEAD currently current points to main, git updated the main branch with the new commit:

$ git branch -v
  josephines-branch 0a5a1b9 Expand the introduction
* main              7598e4e More fun with fudge

Merging lines of development with git merge

We next want to get Josephine’s changes into the main branch.

We do this with git merge:

$ git merge josephines-branch
Merge made by the 'recursive' strategy.
 nobel_prize.md | 6 ++++++
 1 file changed, 6 insertions(+)

This commit has the changes we just made to the script and figure, and the changes the Josephine made to the paper.

The commit has two parents, which are the two commits from which we merged:

$ git log --oneline --parents
9bcf78c 7598e4e 0a5a1b9 Merge branch 'josephines-branch'
0a5a1b9 677c9a6 Expand the introduction
7598e4e 677c9a6 More fun with fudge
677c9a6 c3f19b2 Revert to original script & figure
c3f19b2 d0ef727 Add references
d0ef727 003c54a Change parameters of analysis
003c54a 7919d37 Add another fudge factor
7919d37 75206bc Add first draft of paper
75206bc First backup of my amazing idea

The commit parents make the development history into a graph

As you saw in your SAP system, we can think of the commits as nodes in a graph. Each commit stores the identity of its parent commit(s). The pointers from each commit back to its parent(s) link the commits (nodes) to form edges.

It is common to see a git history shown as a graph, and it is often useful to think of this graph when we are working with a git repository.

There are a lot of graphical tools to show the git history as a graph, but git log has a useful flag called --graph which shows the commits as a graph using text characters:

$ git log --oneline --graph
*   9bcf78c Merge branch 'josephines-branch'
|\  
| * 0a5a1b9 Expand the introduction
* | 7598e4e More fun with fudge
|/  
* 677c9a6 Revert to original script & figure
* c3f19b2 Add references
* d0ef727 Change parameters of analysis
* 003c54a Add another fudge factor
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea

This kind of display is so useful that many of us have a shortcut to this command, that we use instead of the standard git log. You can make customized shortcuts to git commands by setting alias entries using git config. For example, you may want to set up an alias like this:

$ git config --global alias.slog "log --oneline --graph"

Now you can use the command git slog to mean git log --oneline --graph. Because of the --global flag, this command sets up the slog alias as the default for your user account, so you can use git slog whenever you are using git as this user on this computer.

$ git slog
*   9bcf78c Merge branch 'josephines-branch'
|\  
| * 0a5a1b9 Expand the introduction
* | 7598e4e More fun with fudge
|/  
* 677c9a6 Revert to original script & figure
* c3f19b2 Add references
* d0ef727 Change parameters of analysis
* 003c54a Add another fudge factor
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea

Other commands you need to know

This tutorial gives you the basics on working with files on your own computer, and on your own repository.

You will also need to know about:

You will probably also find use for:

Git: are you ready?

If you followed this tutorial, you now have a good knowledge of how git works. This will make it much easier to understand why git commands do what they do, and what to do when things go wrong. You know all the main terms that the git manual pages use, so git’s own help will be more useful to you. You will likely lead a long life of deep personal fulfillment.

Other git-ish things to read

As you’ve seen, this tutorial makes the bold assumption that you’ll be able to understand how git works by seeing how it is built. These documents take a similar approach to varying levels of detail:

You might also try:

  • For windows users, an Illustrated Guide to Git on Windows is useful in that it contains also some information about handling SSH (necessary to interface with git hosted on remote servers when collaborating) as well as screenshots of the Windows interface.

  • Git ready A great website of posts on specific git-related topics, organized by difficulty.

  • QGit: an excellent Git GUI Git ships by default with gitk and git-gui, a pair of Tk graphical clients to browse a repo and to operate in it. I personally have found qgit to be nicer and easier to use. It is available on modern Linux distros, and since it is based on Qt, it should run on OSX and Windows.

  • Git Magic : Another book-size guide that has useful snippets.

  • Github Guides have tutorials on a number of topics, some specific to Github hosting but much of it of general value.

  • A port of the Hg book’s beginning The Mercurial book has a reputation for clarity, so Carl Worth decided to port its introductory chapter to Git. It’s a nicely written intro, which is possible in good measure because of how similar the underlying models of Hg and Git ultimately are.

  • Intermediate tips: A set of tips that contains some very valuable nuggets, once you’re past the basics.

Footnotes

1

What would happen if we delete the .git/index file? Remember, the .git/index file contains the directory listing for the staging area. If we delete the file, git will assume that the directory listing is empty, and therefore that there are no files in the staging area.

2

Why are the output of git log and git log --parents the same in this case? They are the same because this is the first commit, and the first commit has no parents.

3

When git stores a file in the .git/objects directory, it makes a hash from the file, takes the first two digits of the hash to make a directory name, and then stores a file in this directory with a filename from the remaining hash digits. For example, when adding a file with hash d92d079af6a7f276cc8d63dcf2549c03e7deb553 git will create .git/objects/d9 directory if it doesn’t exist, and stores the file contents as .git/objects/d9/2d079af6a7f276cc8d63dcf2549c03e7deb553. It does this so that the number of files in any one directory stay in a reasonable range. If git had to store hash filenames for every object in one flat directory, the directory would soon have a very large number of files.