Curious git¶
In A curious tale, you built your own content management system. Now
you have done that, you know how git works – because it works in exactly
the same way as your own system. You will recognize hashes for files,
directories and commits, commits linked by reference to their parents, the
staging area, the objects
directory, and bookmarks (branches).
Armed with this deep understanding, we retrace our steps to do the same content management tasks in git.
Basic configuration¶
We need to tell git our name and email address before we start.
Git will use this information to fill in the author information in each commit message, so we don’t have to type it out every time.
$ git config --global user.name "Matthew Brett"
$ git config --global user.email "matthew.brett@gmail.com"
The --global
flag tells git to store this information in its default
configuration file for your user account. On Unix (e.g. OSX and Linux) this
file is .gitconfig
in your home directory. Without the --global
flag,
git only applies the configuration to the particular repository you are
working in.
Every time we make a commit, we need to type a commit message. Git will open our text editor for us to type the message, but first it needs to know what text editor we prefer. Set your own preferred text editor here:
# gedit is a reasonable choice for Linux
# "vi" is the default.
git config --global core.editor gedit
Next we set the name of the default branch. We will explain branches later on, but, for now, just apply this configuration to be compatible with newer versions of Git:
$ # Set the default branch name to "main"
$ git config --global init.defaultBranch main
We also turn on the use of color, which is very helpful in making the output of git easier to read:
$ git config --global color.ui "auto"
Getting help¶
$ git help
usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
[--super-prefix=<path>] [--config-env=<name>=<envvar>]
<command> [<args>]
These are common Git commands used in various situations:
start a working area (see also: git help tutorial)
clone Clone a repository into a new directory
init Create an empty Git repository or reinitialize an existing one
work on the current change (see also: git help everyday)
add Add file contents to the index
mv Move or rename a file, a directory, or a symlink
restore Restore working tree files
rm Remove files from the working tree and from the index
sparse-checkout Initialize and modify the sparse-checkout
examine the history and state (see also: git help revisions)
bisect Use binary search to find the commit that introduced a bug
diff Show changes between commits, commit and working tree, etc
grep Print lines matching a pattern
log Show commit logs
show Show various types of objects
status Show the working tree status
grow, mark and tweak your common history
branch List, create, or delete branches
commit Record changes to the repository
merge Join two or more development histories together
rebase Reapply commits on top of another base tip
reset Reset current HEAD to the specified state
switch Switch branches
tag Create, list, delete or verify a tag object signed with GPG
collaborate (see also: git help workflows)
fetch Download objects and refs from another repository
pull Fetch from and integrate with another repository or a local branch
push Update remote refs along with associated objects
'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help <command>' or 'git help <concept>'
to read about a specific subcommand or concept.
See 'git help git' for an overview of the system.
Try git help add
for an example.
Note
The git help pages are famously hard to read if you don’t know how git works. One purpose of this tutorial is to explain git in such a way that it will be easier to understand the help pages.
Initializing the repository directory¶
We first set this nobel_prize
directory to be version controlled with git.
We start off the working tree with the original files for the paper:
Note
I highly recommend you type along. Why not download
nobel_prize.zip
and unzip the
files to make the same nobel_prize
directory as I have here?
nobel_prize
├── clever_analysis.py [618B]
├── expensive_data.csv [244K]
└── fancy_figure.png [183K]
To get started with git, create the git repository directory with git
init
:
$ cd nobel_prize
$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/nobel_prize/.git/
What happened when we did git init
? Just what we were expecting; we have a
new repository directory in nobel_prize
called .git
.git
├── refs
│ ├── tags
│ └── heads
├── objects
│ ├── pack
│ └── info
├── info
│ └── exclude [240B]
├── hooks
│ (13 files)
├── HEAD [21B]
├── config [137B]
└── description [73B]
The objects
directory looks familiar. It has exactly the same purpose as
it did for your SAP system. At the moment it contains a couple of empty
directories, because we have not added any objects yet.
Updating terms for git¶
- Working directory
The directory containing the files you are working on. In our case this is
nobel_prize
. It contains the repository directory, named.git
.- Repository directory
Directory containing all previous commits (snapshots) and git private files for working with commits. The directory has name
.git
by default, and almost always in practice.
git add
– put stuff into the staging area¶
In the next few sections, we will do our first commit (snapshot).
First we will put the files for the commit into the staging area.
The command to put files into the staging area is git add
.
To start, we show ourselves that the staging area is empty. We haven’t yet discussed the git implementation of the staging area, but this command shows us which files are in the staging area.
$ git ls-files --stage
As expected, there are no files in the staging area yet.
Note
git ls-files
is a specialized command that you will not often need in
your daily git life. I’m using it here to show you how git works.
Now we do our add:
$ git add clever_analysis.py
Sure enough:
$ git ls-files --stage
100755 6560135a5943c0509608fee6d900b775e3041197 0 clever_analysis.py
The git staging area¶
It is time to think about what the staging area is, in git. In your SAP
system, the staging area was a directory. You also started off by using
directories to store commits (snapshots). Later you found you could do
without the commit directories, because you could store the files in
repo/objects
and the directory structure in directory_listing.txt
text
files.
In git, the staging area is a single file called .git/index
. This file
contains a directory listing that is the equivalent of the staging
directory in SAP. When we add a file to the staging area, git backs up the
file with its hash to .git/objects
, and then changes the directory listing
inside .git/index
to point to this backup copy.
If all that is true, then we now expect to see a) a new file .git/index
containing the directory listing and b) a new file in the .git/objects
directory corresponding to the hash for the clever_analysis.py
file. We
saw from the output of git ls-files --stage
above that the hash for
clever_analysis.py
is 6560135a5943c0509608fee6d900b775e3041197
. So – do we see these files?
First – there is now a new file .git/index
that was not present in our
first listing of the .git
directory above:
$ ls .git/index
.git/index
Second, there is a new directory and file in .git/objects
:
objects
├── pack
├── info
└── 65
└── 60135a5943c0509608fee6d900b775e3041197 [335B]
The directory and filename in .git/objects
come from the hash of
clever_analysis.py
. The first two digits of the hash form the directory
name and the rest of the digits are the filename 3. So, the
file .git/objects/65/60135a5943c0509608fee6d900b775e3041197
is the copy of clever_analysis.py
that we added to
the staging area.
For extra points, what do you think would happen if we deleted the
.git/index
file (answer 1)?
Git objects¶
Git objects are nearly as simple as the objects you were writing in your SAP. The hash is not the hash of the raw file, but the raw file prepended with a short housekeeping string. See Reading git objects for details.
We can see the contents of objects with the command git cat-file -p
. For
example, here are the contents of the backup we just made of
clever_analysis.py
:
$ git cat-file -p 6560135a5943c0509608fee6d900b775e3041197
# The brain analysis script
import numpy as np
import matplotlib.pyplot as plt
FUDGE = 42
# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')
# First column is something from world, second is something from brain
from_world, from_brain = data.T
# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
plt.plot(from_world, from_brain_processed, 'bx')
plt.xlabel('Data from the outside world')
plt.ylabel('Data from inside the brain')
plt.title('Important finding')
plt.savefig('fancy_figure.png')
Note
I will use git cat-file -p
to display the content of nearly raw git
objects, to show the simplicity of git’s internal model, but cat-file
is a specialized command that you won’t use much in daily work.
Just as we expected, it is the current contents of the
clever_analysis.py
.
The 6560135a5943c0509608fee6d900b775e3041197
object is a hashed, stored raw file. Because the object
is a stored file rather than a stored directory listing text file or commit
message text file, git calls this type of object a blob – for Binary
Large Object. You can get the object type from the object hash with the
-t
flag to git cat-file
:
$ git cat-file -t 6560135a5943c0509608fee6d900b775e3041197
blob
Hash values can usually be abbreviated to seven characters¶
We only need to give git enough hash digits for git to identify the object uniquely. 7 digits is nearly always enough, as in:
$ git cat-file -p 6560135
# The brain analysis script
import numpy as np
import matplotlib.pyplot as plt
FUDGE = 42
# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')
# First column is something from world, second is something from brain
from_world, from_brain = data.T
# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
plt.plot(from_world, from_brain_processed, 'bx')
plt.xlabel('Data from the outside world')
plt.ylabel('Data from inside the brain')
plt.title('Important finding')
plt.savefig('fancy_figure.png')
git status
– showing the status of files in the working tree¶
The working tree is the contents of the nobel_prize
directory, excluding
the .git
repository directory.
git status
tells us about the relationship of the files in the working
tree to the repository and staging area.
We have done a git add
on clever_analysis.py
, and that added the file
to the staging area. We can see that this happened with git status
:
$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: clever_analysis.py
Untracked files:
(use "git add <file>..." to include in what will be committed)
expensive_data.csv
fancy_figure.png
Sure enough, the output tells us that new file: clever_analysis.py
is in
the changes to be committed
. It also tells us that the other two files in
the working directory are untracked
.
An untracked file is a file with a filename that is not listed the staging
area directory listing. Until you run git add
on an untracked file, git
will ignore these files and assume you don’t want to keep track of them.
Staging the other files with git add¶
We do want to keep track of the other files, so we stage them:
$ git add fancy_figure.png
$ git add expensive_data.csv
$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: clever_analysis.py
new file: expensive_data.csv
new file: fancy_figure.png
We have staged all three of our files. We have three objects in
.git/objects
:
objects
├── pack
├── info
├── 7b
│ └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 65
│ └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
└── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]
git commit
– making the snapshot¶
[desktop]$ git commit -m "First backup of my amazing idea"
[main (root-commit) 75206bc] First backup of my amazing idea
3 files changed, 5023 insertions(+)
create mode 100755 clever_analysis.py
create mode 100644 expensive_data.csv
create mode 100644 fancy_figure.png
Note
In the line above, I used the -m
flag to specify a message at the
command line. If I had not done that, git would open the editor I
specified in the git config
step above and ask me to enter a message.
I’m using the -m
flag so the commit command runs without interaction
in this tutorial, but in ordinary use, I virtually never use -m
, and I
suggest you don’t either. Using the editor for the commit message allows
you to write a more complete commit message, and gives feedback about the
git status
of the commit to remind you what you are about to do.
Following the logic of your SAP system, we expect that the action of making
the commit will generate two new files in .git/objects
, one for the
directory listing text file, and another for the commit message:
objects
├── pack
├── info
├── ff
│ └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│ └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│ └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│ └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
└── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]
Here is the contents of the commit message text file for the new commit. Git calls this a commit object:
$ git cat-file -p 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
tree ffc871b48a6b9df8dc4a13e8e5da99ccf2ce458d
author Matthew Brett <matthew.brett@gmail.com> 1333287013 +0100
committer Matthew Brett <matthew.brett@gmail.com> 1333287013 +0100
First backup of my amazing idea
$ # What type of git object is this?
$ git cat-file -t 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
commit
As for SAP, the commit message file contains the hash for the directory tree
file (tree
), the hash of the parent (parent
) (but this commit has no
parents), the author, date and time, and the note.
Here’s the contents of the directory listing text file for the new commit. Git calls this a tree object.
$ git cat-file -p ffc871b48a6b9df8dc4a13e8e5da99ccf2ce458d
100755 blob 6560135a5943c0509608fee6d900b775e3041197 clever_analysis.py
100644 blob 7b37886351b3df2463fd29c87bc5184b637f0926 expensive_data.csv
100644 blob 1ed447c15c125991b8a292bdb433aaf19998a3e9 fancy_figure.png
$ git cat-file -t ffc871b48a6b9df8dc4a13e8e5da99ccf2ce458d
tree
Each line in the directory listing gives the file permissions, the type of the entry in the directory (where “tree” means a sub-directory, and “blob” means a file), the file hash, and the file name (see Types of git objects).
git log
– what are the commits so far?¶
$ git log
commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Sun Apr 1 14:30:13 2012 +0100
First backup of my amazing idea
Notice that git log identifies each commit with its hash. The hash is the
hash for the contents of the commit message. As we saw above, the hash for our
commit was 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
.
We can also ask to see the parents of each commit in the log:
$ git log --parents
commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Sun Apr 1 14:30:13 2012 +0100
First backup of my amazing idea
Why are the output of git log
and git log --parents
the same in this
case? (answer 2).
git branch
- which branch are we on?¶
Branches are bookmarks. They associate a name (like “my_bookmark” or “main”)
with a commit (such as 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
).
The default branch (bookmark) for git is called main
. Git creates it
automatically when we do our first commit.
$ git branch
* main
Asking for more verbose detail shows us that the branch is pointing to a particular commit (where the commit is given by a hash):
$ git branch -v
* main 75206bc First backup of my amazing idea
In this case git abbreviated the 40 character hash to the first 7 digits, because these are enough to uniquely identify the commit.
A branch is nothing but a name that points to a commit. In fact, git stores branches as we did in SAP, as tiny text files, where the filename is the name of the branch, and the contents is the hash of the commit that it points to:
$ ls .git/refs/heads
main
$ cat .git/refs/heads/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
We will soon see that, if we are working on a branch, and we do a commit, then git will update the branch to point to the new commit.
A second commit¶
In our second commit, we will add the first draft of the Nobel prize paper.
As before, you can download this from
nobel_prize.md
. If you are
typing along, download nobel_prize.md
to the nobel_prize
directory.
The staging area does not have an entry for nobel_prize.md
, so git
status
identifies this file as untracked:
$ git status
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
nobel_prize.md
nothing added to commit but untracked files present (use "git add" to track)
We add the file to the staging area with git add
:
$ git add nobel_prize.md
Now git status
records this file being in the staging area, by listing it
under “changes to be committed”:
$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: nobel_prize.md
Finally we move the changes from the staging area into a commit with git
commit
:
[desktop]$ git commit -m "Add first draft of paper"
[main 7919d37] Add first draft of paper
1 file changed, 29 insertions(+)
create mode 100644 nobel_prize.md
Git shows us the first 7 digits of the new commit hash in the output from
git commit
– these are 7919d37
.
Notice that the position of the current main
branch is now this last
commit:
$ git branch -v
* main 7919d37 Add first draft of paper
$ cat .git/refs/heads/main
7919d37dda9044f00cf2dc0677eed18156f75404
We use git log
to look at our short history.
$ git log
commit 7919d37dda9044f00cf2dc0677eed18156f75404
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Mon Apr 2 18:03:00 2012 +0100
Add first draft of paper
commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Sun Apr 1 14:30:13 2012 +0100
First backup of my amazing idea
We add the --parents
flag to show that the second commit points back to
the first via its hash. Git lists the parent hash after the commit hash:
$ git log --parents
commit 7919d37dda9044f00cf2dc0677eed18156f75404 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Mon Apr 2 18:03:00 2012 +0100
Add first draft of paper
commit 75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Sun Apr 1 14:30:13 2012 +0100
First backup of my amazing idea
git diff
– what has changed?¶
Our next commit will have edits to the clever_analysis.py
script. We will
also refresh the figure with the result of running the script.
I open the clever_analysis.py
file in text editor and adjust the fudge
factor, add a new fudge factor, and apply the new factor to the data.
Now I’ve done these edits, I can ask git diff
to show me how the files in
my working tree differ from the files in the staging area.
Remember, the files the staging area knows about so far are the files as of the last commit.
$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 6560135..99cd07b 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -3,7 +3,8 @@ import numpy as np
import matplotlib.pyplot as plt
-FUDGE = 42
+FUDGE = 106
+MORE_FUDGE = 2.0
# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -13,6 +14,8 @@ from_world, from_brain = data.T
# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
+# Apply the new factor
+from_brain_processed = from_brain_processed / MORE_FUDGE
# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
A -
at the beginning of the git diff
output means I have removed this
line. A +
at the beginning means I have added this line. As you see I
have edited one line in this file, and added three more.
Open your text editor and edit clever_analysis.py
. See if you can
replicate my changes by editing the file, and checking with git diff
.
Now check the status of clever_analysis.py
with:
$ git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: clever_analysis.py
no changes added to commit (use "git add" and/or "git commit -a")
You need to git add
a file to put it into the staging area¶
Remember that git only commits stuff that you have added to the staging area.
git status
tells us that clever_analysis.py
has been “modified”, and
that these changes are “not staged for commit”.
There is a version of clever_analysis.py
in the staging area, but it is
the version of the file as of the last commit, and so that version is
different from the version we have in the working tree.
If we try to do a commit, git will tell us there is nothing to commit, because there is nothing new in the staging area:
$ git commit
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: clever_analysis.py
no changes added to commit (use "git add" and/or "git commit -a")
To stage this version of clever_analysis.py
we use git add
:
$ git add clever_analysis.py
Git status now shows these changes as “Changes to be committed”.
$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: clever_analysis.py
We can update the figure by running the analysis_script.py
script. The
script analyzes the data and writes the figure to the current directory. If
you have Python installed, with the numpy
and matplotlib
packages, you
can run the analysis yourself with:
python clever_analysis.py
If not, you can download a version of the figure I generated
earlier
. After you have generated or
downloaded the figure:
$ git add fancy_figure.png
Do a final check with git status
, then make the commit with:
[desktop]$ git commit -m "Add another fudge factor"
[main 003c54a] Add another fudge factor
2 files changed, 4 insertions(+), 1 deletion(-)
rewrite fancy_figure.png (96%)
The branch bookmark has moved again:
$ git branch -v
* main 003c54a Add another fudge factor
An ordinary day in gitworld¶
We now have the main commands for daily work with git;
Make some changes in the working tree;
Check what has changed with
git status
;Review the changes with
git diff
;Add changes to the staging area with
git add
;Make the commit with
git commit
.
Commit four¶
For our next commit, we will add some more changes to the analysis script and
figure, and add a new file, references.bib
.
To follow along, first download references.bib
.
Next, edit clever_analysis.py
again, to make these changes:
$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 99cd07b..e5b2efa 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -5,6 +5,7 @@ import matplotlib.pyplot as plt
FUDGE = 106
MORE_FUDGE = 2.0
+NUDGE_FUDGE = 1.25
# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -14,8 +15,8 @@ from_world, from_brain = data.T
# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
-# Apply the new factor
-from_brain_processed = from_brain_processed / MORE_FUDGE
+# Apply the new factor(s)
+from_brain_processed = from_brain_processed / MORE_FUDGE / NUDGE_FUDGE
# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
Finally regenerate fancy_figure.png
, or download the updated copy
from here
.
What will git status show now?
$ git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: clever_analysis.py
modified: fancy_figure.png
Untracked files:
(use "git add <file>..." to include in what will be committed)
references.bib
no changes added to commit (use "git add" and/or "git commit -a")
The staging area does not list a file called references.bib
so this file
is “untracked”. The staging area does contain an entry for
clever_analysis.py
and fancy_figure.png
, so these files are tracked.
Git has checked the hashes for these files, and they are different from the
hashes in the staging area, so git knows these files have changed, compared to
the versions listed in the staging area.
Before we add our changes, we confirm that they are as we expect with:
$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 99cd07b..e5b2efa 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -5,6 +5,7 @@ import matplotlib.pyplot as plt
FUDGE = 106
MORE_FUDGE = 2.0
+NUDGE_FUDGE = 1.25
# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -14,8 +15,8 @@ from_world, from_brain = data.T
# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
-# Apply the new factor
-from_brain_processed = from_brain_processed / MORE_FUDGE
+# Apply the new factor(s)
+from_brain_processed = from_brain_processed / MORE_FUDGE / NUDGE_FUDGE
# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
diff --git a/fancy_figure.png b/fancy_figure.png
index 81fc437..d1f40df 100644
Binary files a/fancy_figure.png and b/fancy_figure.png differ
Notice that git does not try and show the line-by-line differences between the old and new figures, guessing correctly that this is a binary and not a text file.
Now we have reviewed the changes, we add them to the staging area and commit:
$ git add references.bib
$ git add clever_analysis.py
$ git add fancy_figure.png
[desktop]$ git commit -m "Change analysis and add references"
[main 2d9e1df] Change analysis and add references
3 files changed, 13 insertions(+), 2 deletions(-)
rewrite fancy_figure.png (89%)
create mode 100644 references.bib
The branch bookmark has moved to point to the new commit:
$ git branch -v
* main 2d9e1df Change analysis and add references
Undoing a commit with git reset
¶
As you found in the SAP story, this last commit doesn’t look quite right,
because the commit message refers to two different types of changes. With
more git experience, you will likely find that you like to break your changes
into commits where the changes have a particular theme or purpose. This makes
it easier to see what happened when you look over the history and the commit
messages with git log
.
So, as in the SAP story, you decide to undo the last commit, and replace it with two commits:
One commit to add the changes to the script and figure;
Another commit on top of the first, to add the references file.
In the SAP story, you had to delete a snapshot directory manually, and reset
the staging area directory to have the contents of the previous commit. In
git, all we have to do is reset the current main
branch bookmark to
point to the previous commit. By default, git will also reset the staging
area for us. The command to move the branch bookmark is git reset
.
Pointing backwards in history¶
The commit that we want the branch to point to is the previous commit in our
commit history. We can use git log
to see that this commit has hash
003c54a
. So, we could do our reset with git reset
003c54a
. There is a simpler and more readable way to write this
common idea, of one commit back in history, and that is to add ~1
to a
reference. For example, to refer to the commit that is one step back in the
history from the commit pointed to by the main
branch, you can write
main~1
. Because main
points to commit 2d9e1df
, you could
also append the ~1
to 2d9e1df
. You can imagine that main~2
will point two steps back in the commit history, and so on.
So, a readable reset command for our purpose is:
$ git reset main~1
Unstaged changes after reset:
M clever_analysis.py
M fancy_figure.png
Notice that the branch pointer now points to the previous commit:
$ git branch -v
* main 003c54a Add another fudge factor
Remember in SAP that your procedure for breaking up the snapshot was to 1) delete the old snapshot and 2) reset the staging area to reflect the previous commit. After you did this, the working tree contains your changes, but the staging area does not. You could make your new commits in the usual way, by adding to the staging area, and doing the commits.
Notice that git reset
has done the same thing. It has reset the staging
area to the state as of the older commit, but it has left the working tree
alone. That means that git status
will show us the changes in the working
tree compared to the commit we have just reset to:
$ git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: clever_analysis.py
modified: fancy_figure.png
Untracked files:
(use "git add <file>..." to include in what will be committed)
references.bib
no changes added to commit (use "git add" and/or "git commit -a")
We have the changes from our original fourth commit in our working tree, but we have not staged them. We are ready to make our new separate commits.
A new fourth commit¶
As we planned, we make a commit by adding only the changes from the script and figure:
$ git add clever_analysis.py
$ git add fancy_figure.png
[desktop]$ git commit -m "Change parameters of analysis"
[main d0ef727] Change parameters of analysis
2 files changed, 3 insertions(+), 2 deletions(-)
rewrite fancy_figure.png (89%)
Notice that git status now tells us that we still have untracked (and therefore not staged) changes in our working tree:
$ git status
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
references.bib
nothing added to commit but untracked files present (use "git add" to track)
The fifth commit¶
To finish our work splitting the fourth commit into two, we add and commit the
references.bib
file:
$ git add references.bib
[desktop]$ git commit -m "Add references"
[main c3f19b2] Add references
1 file changed, 10 insertions(+)
create mode 100644 references.bib
Getting a file from a previous commit – git checkout
¶
In the SAP story, we found that the first version of the analysis script was correct, and we made a new commit after restoring this version from the first snapshot.
As you can imagine, git allows us to do that too. The command to do this is
git checkout
If you have a look at git checkout --help
you will see that git checkout
has two roles, described in the help as “Checkout a branch or paths to the
working tree”. We will see checking out a branch later, but here we are using
checkout in its second role, to restore files to the working tree.
We do this by telling git checkout which version we want, and what file we
want. We want the version of clever_analysis.py
as of the first commit.
To find the first commit, we can use git log. To make git log a bit less
verbose, I’ve added the --oneline
flag, to print out one line per commit:
$ git log --oneline
c3f19b2 Add references
d0ef727 Change parameters of analysis
003c54a Add another fudge factor
7919d37 Add first draft of paper
75206bc First backup of my amazing idea
Now we have the abbreviated commit hash for the first commit, we can checkout that version to the working tree:
$ git checkout 75206bc clever_analysis.py
Updated 1 path from ffc871b
We also want the previous version of the figure:
$ git checkout 75206bc fancy_figure.png
Updated 1 path from ffc871b
Notice that the checkout also added the files to the staging area:
$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: clever_analysis.py
modified: fancy_figure.png
We are ready for our sixth commit:
[desktop]$ git commit -m "Revert to original script & figure"
[main 677c9a6] Revert to original script & figure
2 files changed, 1 insertion(+), 5 deletions(-)
rewrite fancy_figure.png (97%)
Using bookmarks – git branch
¶
We are at the stage in the SAP story where Josephine goes away to the conference.
Let us pretend that we are Josephine, and that we have taken a copy of the
nobel_prize directory to the conference. The copy includes the .git
subdirectory, containing the git repository.
Now we (as Josephine) start doing some work. We don’t want to change the
previous bookmark, which is main
:
$ git branch -v
* main 677c9a6 Revert to original script & figure
We would like to use our own bookmark, so we can make changes without
affecting anyone else. To do this we use git branch
with arguments:
$ git branch josephines-branch main
The first argument is the name of the branch we want to create. The second is
the commit at which the branch should start. Now we have a new branch, that
currently points to the same commit as main
:
$ git branch -v
josephines-branch 677c9a6 Revert to original script & figure
* main 677c9a6 Revert to original script & figure
The new branch is nothing but a text file pointing to the commit:
$ cat .git/refs/heads/josephines-branch
677c9a6ac3004e5d84d9739751a89867e09238f4
Now we have two branches, git needs to know which branch we are working on.
The asterisk next to main
in the output of git branch
means that we
are working on main
at the moment. If we make another commit, it will
update the main
bookmark.
Git stores the current branch in the file .git/HEAD
:
$ cat .git/HEAD
ref: refs/heads/main
Git commands often allow you to write HEAD
meaning “the branch or commit
you are currently working on”. For example, git log HEAD
means “show the
log starting at the branch or commit you are currently working on”. In fact,
this is also the default behavior of git log
.
We now want to make josephines-branch
current, so any new commits will
update josephines-branch
instead of main
.
Changing the current branch with git checkout
¶
We previously saw that git checkout <commit> <filename>
will get the
file <filename>
as of commit <commit>
, and restore it to the working
tree. This was the second of the two uses of git checkout
. We now come
to the first and most common use of git checkout
, which is to:
change the current branch to a given branch or commit;
restore the working tree and staging area to the file versions from the given commit.
We are about to do git checkout josephines-branch
. When we do this, we
are only going to see the first of these two effects, because main
and
josephines-branch
point to the same commit, and so have the same file
contents:
$ git checkout josephines-branch
Switched to branch 'josephines-branch'
The asterisk has now moved to josephines-branch
:
$ git branch -v
* josephines-branch 677c9a6 Revert to original script & figure
main 677c9a6 Revert to original script & figure
This is because the file HEAD
now points to josephines-branch
:
$ cat .git/HEAD
ref: refs/heads/josephines-branch
If we do a commit, git will update josephines-branch
, not main
.
Making commits on branches¶
Josephine did some edits to the paper. If you are typing along, make these
changes to nobel_prize.md
:
$ git diff
diff --git a/nobel_prize.md b/nobel_prize.md
index 3ef5df2..19fd4c5 100644
--- a/nobel_prize.md
+++ b/nobel_prize.md
@@ -5,6 +5,12 @@
The brain thinks in straight lines once you do some poorly-motivated
corrections on some brain recordings.
+Other people have done brain recordings and claimed that they were
+interesting, but they are not as interesting as our recordings.
+
+In our previous work, we have done some other brain recordings, that were also
+interesting, but in a different way.
+
== Methods
We took some recordings of someone's brain while we showed them stuff
As usual, we add the file to the staging area, and check the status of the working tree:
$ git add nobel_prize.md
$ git status
On branch josephines-branch
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: nobel_prize.md
Finally we make the commit:
[desktop]$ git commit -m "Expand the introduction"
[josephines-branch 0a5a1b9] Expand the introduction
1 file changed, 6 insertions(+)
The main
branch has not changed, but josephines-branch
changed to
point to the new commit:
$ git branch -v
* josephines-branch 0a5a1b9 Expand the introduction
main 677c9a6 Revert to original script & figure
Now we go back to being ourselves, working in the lab. We change back to the
main
branch:
$ git checkout main
Switched to branch 'main'
The asterisk now points at main
:
$ git branch -v
josephines-branch 0a5a1b9 Expand the introduction
* main 677c9a6 Revert to original script & figure
If you look at the contents of nobel_prize.md
in the working directory,
you will see that we are back to the contents before Josephine’s changes.
This is because git checkout main
reverted the files to their state as
of the last commit on the main
branch.
Now we make our own changes to the script and figure. Here are the changes to the script:
$ git diff
diff --git a/clever_analysis.py b/clever_analysis.py
index 6560135..cf163af 100755
--- a/clever_analysis.py
+++ b/clever_analysis.py
@@ -4,6 +4,7 @@ import numpy as np
import matplotlib.pyplot as plt
FUDGE = 42
+HOT_FUDGE = 1.707
# Load data from the brain
data = np.loadtxt('expensive_data.csv', delimiter=',')
@@ -13,6 +14,7 @@ from_world, from_brain = data.T
# Process data
from_brain_processed = np.log(from_brain) * FUDGE * np.e ** np.pi
+from_brain_processed = from_brain_processed / HOT_FUDGE
# Make plot
plt.plot(from_world, from_brain_processed, 'r:')
If you are typing along, then you will also want to regenerate the figure with
python clever_analysis.py
or download the new version
.
This gives:
$ git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: clever_analysis.py
modified: fancy_figure.png
no changes added to commit (use "git add" and/or "git commit -a")
As usual, we add the files and do the commit:
$ git add clever_analysis.py
$ git add fancy_figure.png
[desktop]$ git commit -m "More fun with fudge"
[main 7598e4e] More fun with fudge
2 files changed, 2 insertions(+)
rewrite fancy_figure.png (96%)
Because HEAD
currently current points to main
, git updated the
main
branch with the new commit:
$ git branch -v
josephines-branch 0a5a1b9 Expand the introduction
* main 7598e4e More fun with fudge
Merging lines of development with git merge
¶
We next want to get Josephine’s changes into the main
branch.
We do this with git merge
:
$ git merge josephines-branch
Merge made by the 'recursive' strategy.
nobel_prize.md | 6 ++++++
1 file changed, 6 insertions(+)
This commit has the changes we just made to the script and figure, and the changes the Josephine made to the paper.
The commit has two parents, which are the two commits from which we merged:
$ git log --oneline --parents
9bcf78c 7598e4e 0a5a1b9 Merge branch 'josephines-branch'
0a5a1b9 677c9a6 Expand the introduction
7598e4e 677c9a6 More fun with fudge
677c9a6 c3f19b2 Revert to original script & figure
c3f19b2 d0ef727 Add references
d0ef727 003c54a Change parameters of analysis
003c54a 7919d37 Add another fudge factor
7919d37 75206bc Add first draft of paper
75206bc First backup of my amazing idea
The commit parents make the development history into a graph¶
As you saw in your SAP system, we can think of the commits as nodes in a graph. Each commit stores the identity of its parent commit(s). The pointers from each commit back to its parent(s) link the commits (nodes) to form edges.
It is common to see a git history shown as a graph, and it is often useful to think of this graph when we are working with a git repository.
There are a lot of graphical tools to show the git history as a graph, but
git log
has a useful flag called --graph
which shows the commits as a
graph using text characters:
$ git log --oneline --graph
* 9bcf78c Merge branch 'josephines-branch'
|\
| * 0a5a1b9 Expand the introduction
* | 7598e4e More fun with fudge
|/
* 677c9a6 Revert to original script & figure
* c3f19b2 Add references
* d0ef727 Change parameters of analysis
* 003c54a Add another fudge factor
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea
This kind of display is so useful that many of us have a shortcut to this
command, that we use instead of the standard git log. You can make customized
shortcuts to git commands by setting alias
entries using git config
.
For example, you may want to set up an alias like this:
$ git config --global alias.slog "log --oneline --graph"
Now you can use the command git slog
to mean git log --oneline
--graph
. Because of the --global
flag, this command sets up the
slog
alias as the default for your user account, so you can use git
slog
whenever you are using git as this user on this computer.
$ git slog
* 9bcf78c Merge branch 'josephines-branch'
|\
| * 0a5a1b9 Expand the introduction
* | 7598e4e More fun with fudge
|/
* 677c9a6 Revert to original script & figure
* c3f19b2 Add references
* d0ef727 Change parameters of analysis
* 003c54a Add another fudge factor
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea
Other commands you need to know¶
This tutorial gives you the basics on working with files on your own computer, and on your own repository.
You will also need to know about:
git remotes – git remotes - working with other people, making backups;
tags – making static bookmarks to commits;
You will probably also find use for:
git reflog – show a list of previous commits that you have made;
git rebase – rewrite the development history by altering or transplanting commits. See rebase without tears.
Git: are you ready?¶
If you followed this tutorial, you now have a good knowledge of how git works. This will make it much easier to understand why git commands do what they do, and what to do when things go wrong. You know all the main terms that the git manual pages use, so git’s own help will be more useful to you. You will likely lead a long life of deep personal fulfillment.
Other git-ish things to read¶
As you’ve seen, this tutorial makes the bold assumption that you’ll be able to understand how git works by seeing how it is built. These documents take a similar approach to varying levels of detail:
The Git parable by Tom Preston-Werner;
The visual git tutorial gives a nice visual idea of git at work;
Understanding Git Conceptually gives another review of how the ideas behind git;
For more detail, see the start of the excellent Pro Git online book, or similarly the early parts of the Git community book. Pro Git’s chapters are very short and well illustrated; the community book tends to have more detail and has nice screencasts at the end of some sections;
You might also try:
For windows users, an Illustrated Guide to Git on Windows is useful in that it contains also some information about handling SSH (necessary to interface with git hosted on remote servers when collaborating) as well as screenshots of the Windows interface.
Git ready A great website of posts on specific git-related topics, organized by difficulty.
QGit: an excellent Git GUI Git ships by default with gitk and git-gui, a pair of Tk graphical clients to browse a repo and to operate in it. I personally have found qgit to be nicer and easier to use. It is available on modern Linux distros, and since it is based on Qt, it should run on OSX and Windows.
Git Magic : Another book-size guide that has useful snippets.
Github Guides have tutorials on a number of topics, some specific to Github hosting but much of it of general value.
A port of the Hg book’s beginning The Mercurial book has a reputation for clarity, so Carl Worth decided to port its introductory chapter to Git. It’s a nicely written intro, which is possible in good measure because of how similar the underlying models of Hg and Git ultimately are.
Intermediate tips: A set of tips that contains some very valuable nuggets, once you’re past the basics.
Footnotes
- 1
What would happen if we delete the
.git/index
file? Remember, the.git/index
file contains the directory listing for the staging area. If we delete the file, git will assume that the directory listing is empty, and therefore that there are no files in the staging area.- 2
Why are the output of
git log
andgit log --parents
the same in this case? They are the same because this is the first commit, and the first commit has no parents.- 3
When git stores a file in the
.git/objects
directory, it makes a hash from the file, takes the first two digits of the hash to make a directory name, and then stores a file in this directory with a filename from the remaining hash digits. For example, when adding a file with hashd92d079af6a7f276cc8d63dcf2549c03e7deb553
git will create.git/objects/d9
directory if it doesn’t exist, and stores the file contents as.git/objects/d9/2d079af6a7f276cc8d63dcf2549c03e7deb553
. It does this so that the number of files in any one directory stay in a reasonable range. If git had to store hash filenames for every object in one flat directory, the directory would soon have a very large number of files.