How do git submodules work?

Git submodules can be a little confusing.

This page explains how git stores submodules. My hope is that this will make it easier to understand how to use submodules.

If you’ve read Curious git you will recognize this way of thinking.

Why submodules?

Submodules are useful when you have a project that is under git version control, and you want to include a copy of another project that is also under git version control.

Worked example

We will call the project that we need to use myproject, and the project that is using myproject we will call super.

We are expecting that myproject will continue to develop.

super is going to start using some version of myproject. In the spirit of version control, we want to keep track of exactly which myproject version super is using.

myproject

We make a little myproject to start:

$ mkdir myproject
$ cd myproject
$ git init
Initialized empty Git repository in /Users/mb312/dev_trees/curious-git/working/myproject/.git/
$ echo "Important code and data" > some_data.txt
$ git add some_data.txt
$ git commit -m "Initial commit on myproject"
[master (root-commit) 196bbbb] Initial commit on myproject
 1 file changed, 1 insertion(+)
 create mode 100644 some_data.txt

Back to the working directory containing the repositories:

$ cd ..

super

Now a super project:

$ mkdir super
$ cd super
$ git init
Initialized empty Git repository in /Users/mb312/dev_trees/curious-git/working/super/.git/

Remember (from Curious git) that doing git add on a file adds a new copy of that file to the .git/objects directory. So, .git/objects starts off empty:

objects
├── info
└── pack

When we git add a file, there is one new file in .git/objects:

$ echo "This project will use ``myproject``" > README.txt
$ git add README.txt
objects
├── 9c
│   └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── info
└── pack

Now do the first commit for super:

$ git commit -m "Initial commit on super"
[master (root-commit) 2326240] Initial commit on super
 1 file changed, 1 insertion(+)
 create mode 100644 README.txt

The commit made two new objects in the .git/objects directory:

  • a tree object giving the directory listing of the root directory;
  • a commit object giving information about the commit itself.

So, we now have three files in .git/objects:

objects
├── 23
│   └── 262403a0b913d02219ead935dd1a85d3724a0d [139B]
├── 9c
│   └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── f1
│   └── 3a8c8331c76ac965c43b09d11ee2d72bb053c1 [55B]
├── info
└── pack

Adding myproject as a submodule of super

We use a git submodule to put myproject inside super. We will use the name subproject for the submodule copy of myproject, to make clear that it is the submodule copy:

$ git submodule add ../myproject subproject
Cloning into 'subproject'...
done.

What just happened?:

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   .gitmodules
	new file:   subproject

Notice that git submodule has already staged its changes, so we need the --staged flag to git diff to see what has changed:

$ git diff --staged
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..00b54ec
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "subproject"]
+	path = subproject
+	url = ../myproject
diff --git a/subproject b/subproject
new file mode 160000
index 0000000..196bbbb
--- /dev/null
+++ b/subproject
@@ -0,0 +1 @@
+Subproject commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5

As you saw, the output from git submodule says Cloning into subproject, and sure enough, if we look in the new subproject directory, there is a clone of myproject there:

subproject
├── .git [35B]
└── some_data.txt [24B]

So, git submodule has:

  1. cloned myproject to super subdirectory subproject;
  2. created and staged a small text file called .gitmodules that records the relationship of the subproject subdirectory to the original myproject repository;
  3. claimed to have made a new file in the super repository that records the myproject commit that the submodule contains.

It’s the last of these three that is a little strange, so we will explore.

Storing the current commit of myproject

Why do I say that git “claimed” to have made a new file to record the myproject commit?

Remember that we had three files in the .git/objects directory of super after the first commit. After git submodule add we have four:

objects
├── 00
│   └── b54ece7789a75ca80a0edb1e1b1e532a1833d8 [64B]
├── 23
│   └── 262403a0b913d02219ead935dd1a85d3724a0d [139B]
├── 9c
│   └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── f1
│   └── 3a8c8331c76ac965c43b09d11ee2d72bb053c1 [55B]
├── info
└── pack

The new object has hash 00b54ece7789a75ca80a0edb1e1b1e532a1833d8, and it contains the contents of the new .gitmodules file:

$ git cat-file -p 00b54ece7789a75ca80a0edb1e1b1e532a1833d8
[submodule "subproject"]
	path = subproject
	url = ../myproject

There is only one new object in .git/objects, and that is for .gitmodules. Therefore there is no new git object corresponding to the myproject repository. In fact what has happened, is that git records the commit for myproject in the directory listing, instead of recording the subproject directory as a subdirectory (tree object) or a file (blob object). That is a bit difficult to see at the moment, because the directory listing is in the git staging area and not yet written into a tree object. To write the tree object, we do a commit:

$ git commit -m "Adding the submodule"
[master 7c556d2] Adding the submodule
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 subproject

The exotic git ls-tree command shows us the contents of the new root tree object (directory listing) for this commit:

$ git ls-tree master
100644 blob 00b54ece7789a75ca80a0edb1e1b1e532a1833d8	.gitmodules
100644 blob 9c0042144fc489d7b528ef186af49e78c2867f91	README.txt
160000 commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5	subproject

As you can see, the two real files – .gitmodules and README.txt – are listed as type blob, with the hashes of their file contents. This is the usual way git refers to a file in a directory listing (see Curious git, and Types of git objects). The new entry for subproject is of type commit. The hash is the hash for current commit of the myproject repository, in the subproject copy:

$ cd subproject
$ git log
commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Tue May 1 11:13:13 2012 +0100

    Initial commit on myproject

Updating submodules from their source repositories

How do we keep the subproject copy of myproject up to date with the original myproject repository?

To show this in action, we start by going back to the original myproject repository to make another commit:

$ cd ../myproject
$ # Now in the original "myproject" directory
$ echo "More data" > some_more_data.txt
$ git add some_more_data.txt
$ git commit -m "Add some more data"
[master 43c26bf] Add some more data
 1 file changed, 1 insertion(+)
 create mode 100644 some_more_data.txt
$ git branch -v
* master 43c26bf Add some more data

Of course super has not changed, because we haven’t updated the submodule clone:

$ cd ../super
$ git status
On branch master
nothing to commit, working directory clean

The subproject directory is a full git repository clone of the original myproject. Remember that git submodule add created the directory by cloning. The myproject clone has a remote from the URL we gave to git submodule add.

$ # We're in the "super" directory
$ cd subproject
$ # Now we're in the submodule clone of "myproject"
$ git remote -v
origin	/Users/mb312/dev_trees/curious-git/working/myproject (fetch)
origin	/Users/mb312/dev_trees/curious-git/working/myproject (push)

We can do a fetch / merge to get the new commit:

$ # This is the same as "git pull"
$ git fetch origin
$ git merge origin/master
Updating 196bbbb..43c26bf
Fast-forward
 some_more_data.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 some_more_data.txt
From /Users/mb312/dev_trees/curious-git/working/myproject
   196bbbb..43c26bf  master     -> origin/master

Now what do we see in super?

$ cd ..
$ # Now we are in the "super" directory
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   subproject (new commits)

no changes added to commit (use "git add" and/or "git commit -a")
$ git diff
diff --git a/subproject b/subproject
index 196bbbb..43c26bf 160000
--- a/subproject
+++ b/subproject
@@ -1 +1 @@
-Subproject commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
+Subproject commit 43c26bf6df6a4efade8082f3a5473b807d07c161

Git is not tracking the contents of the subproject directory, but the git state of the directory. In this case, all super sees is that the commit has changed.

$ git add subproject

As when we first added the submodule, a git add of the subproject directory has the effect of updating the commit that the super tree is pointing to in the staging area, but adds no new files to .git/objects.

If we do the commit, we can see the root tree listing now points subproject to the new commit of myproject:

$ git commit -m "Update myproject with more data"
[master 3379d64] Update myproject with more data
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git ls-tree master
100644 blob 00b54ece7789a75ca80a0edb1e1b1e532a1833d8	.gitmodules
100644 blob 9c0042144fc489d7b528ef186af49e78c2867f91	README.txt
160000 commit 43c26bf6df6a4efade8082f3a5473b807d07c161	subproject

Cloning a repository with submodules

What happens if we clone the super project?

$ cd ..
$ # In directory below "super"
$ git clone super super-cloned
Cloning into 'super-cloned'...
done.
$ cd super-cloned
$ ls
README.txt
subproject

What is in the new subproject directory?

subproject

Nothing. When you git clone a project with submodules, git does not clone the submodules.

Getting the submodule repository clone takes two steps. These are:

  • initialize with git submodule init;
  • clone with git submodule update.

Initializing the submodule copies the repository submodule information in .gitmodules to the repository .git/config file. Having this as a separate step is useful when you want to use a different clone URL from the one recorded in .gitmodules. This might happen if you want to use a local repository to clone from instead of a slower internet repository. In this case, you can do git submodule init, edit .git/config, and then do the cloning with git submdoule update.

Here’s .git/config before the init step:

$ # .git/config before submodule init
$ cat .git/config
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = /Users/mb312/dev_trees/curious-git/working/super
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
$ git submodule init
Submodule 'subproject' (/Users/mb312/dev_trees/curious-git/working/myproject) registered for path 'subproject'

.git/config after the init:

$ # .git/config after submodule init
$ cat .git/config
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = /Users/mb312/dev_trees/curious-git/working/super
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[submodule "subproject"]
	url = /Users/mb312/dev_trees/curious-git/working/myproject

We have done init, but not update. The submodule directory is still empty:

subproject

To do the submodule clone, use git submodule update after git submodule init:

$ git submodule update
Submodule path 'subproject': checked out '43c26bf6df6a4efade8082f3a5473b807d07c161'
Cloning into 'subproject'...
done.

If you are happy to clone from the clone URL recorded in .gitmodules, then you can do both init and update in one step with:

$ git submodule update --init