How do git submodules work?¶
Git submodules can be a little confusing.
This page explains how git stores submodules. My hope is that this will make it easier to understand how to use submodules.
If you’ve read Curious git you will recognize this way of thinking.
Why submodules?¶
Submodules are useful when you have a project that is under git version control, and you want to include a copy of another project that is also under git version control.
Worked example¶
We will call the project that we need to use myproject, and the project
that is using myproject we will call super.
We are expecting that myproject will continue to develop.
super is going to start using some version of myproject. In the
spirit of version control, we want to keep track of exactly which
myproject version super is using.
myproject¶
We make a little myproject to start:
$ mkdir myproject
$ cd myproject
$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject/.git/
$ echo "Important code and data" > some_data.txt
$ git add some_data.txt
$ git commit -m "Initial commit on myproject"
[main (root-commit) 196bbbb] Initial commit on myproject
1 file changed, 1 insertion(+)
create mode 100644 some_data.txt
Back to the working directory containing the repositories:
$ cd ..
super¶
Now a super project:
$ mkdir super
$ cd super
$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/super/.git/
Remember (from Curious git) that doing git add on a file adds a new
copy of that file to the .git/objects directory. So, .git/objects
starts off empty:
objects
├── info
└── pack
When we git add a file, there is one new file in .git/objects:
$ echo "This project will use ``myproject``" > README.txt
$ git add README.txt
objects
├── 9c
│ └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── info
└── pack
Now do the first commit for super:
$ git commit -m "Initial commit on super"
[main (root-commit) 2326240] Initial commit on super
1 file changed, 1 insertion(+)
create mode 100644 README.txt
The commit made two new objects in the .git/objects directory:
a tree object giving the directory listing of the root directory;
a commit object giving information about the commit itself.
So, we now have three files in .git/objects:
objects
├── 23
│ └── 262403a0b913d02219ead935dd1a85d3724a0d [139B]
├── 9c
│ └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── f1
│ └── 3a8c8331c76ac965c43b09d11ee2d72bb053c1 [55B]
├── info
└── pack
Adding myproject as a submodule of super¶
We use a git submodule to put myproject inside super. We will use the
name subproject for the submodule copy of myproject, to make clear
that it is the submodule copy:
$ git submodule add ../myproject subproject
Cloning into '/Volumes/zorg/mb312/dev_trees/curious-git/working/super/subproject'...
done.
What just happened?:
$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitmodules
new file: subproject
Notice that git submodule has already staged its changes, so we need the
--staged flag to git diff to see what has changed:
$ git diff --staged
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..00b54ec
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "subproject"]
+ path = subproject
+ url = ../myproject
diff --git a/subproject b/subproject
new file mode 160000
index 0000000..196bbbb
--- /dev/null
+++ b/subproject
@@ -0,0 +1 @@
+Subproject commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
As you saw, the output from git submodule says Cloning into
subproject, and sure enough, if we look in the new subproject directory,
there is a clone of myproject there:
subproject
├── .git [35B]
└── some_data.txt [24B]
So, git submodule has:
cloned
myprojecttosupersubdirectorysubproject;created and staged a small text file called
.gitmodulesthat records the relationship of thesubprojectsubdirectory to the originalmyprojectrepository;claimed to have made a new file in the
superrepository that records themyprojectcommit that the submodule contains.
It’s the last of these three that is a little strange, so we will explore.
Storing the current commit of myproject¶
Why do I say that git “claimed” to have made a new file to record the
myproject commit?
Remember that we had three files in the .git/objects directory of
super after the first commit. After git submodule add we have four:
objects
├── 00
│ └── b54ece7789a75ca80a0edb1e1b1e532a1833d8 [64B]
├── 23
│ └── 262403a0b913d02219ead935dd1a85d3724a0d [139B]
├── 9c
│ └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── f1
│ └── 3a8c8331c76ac965c43b09d11ee2d72bb053c1 [55B]
├── info
└── pack
The new object has hash 00b54ece7789a75ca80a0edb1e1b1e532a1833d8, and it contains the contents of
the new .gitmodules file:
$ git cat-file -p 00b54ece7789a75ca80a0edb1e1b1e532a1833d8
[submodule "subproject"]
path = subproject
url = ../myproject
There is only one new object in .git/objects, and that is for
.gitmodules. Therefore there is no new git object corresponding to the
myproject repository. In fact what has happened, is that git records the
commit for myproject in the directory listing, instead of recording the
subproject directory as a subdirectory (tree object) or a file (blob
object). That is a bit difficult to see at the moment, because the directory
listing is in the git staging area and not yet written into a tree object. To
write the tree object, we do a commit:
$ git commit -m "Adding the submodule"
[main 7c556d2] Adding the submodule
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules
create mode 160000 subproject
The exotic git ls-tree command shows us the contents of the new root tree
object (directory listing) for this commit:
$ git ls-tree main
100644 blob 00b54ece7789a75ca80a0edb1e1b1e532a1833d8 .gitmodules
100644 blob 9c0042144fc489d7b528ef186af49e78c2867f91 README.txt
160000 commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5 subproject
As you can see, the two real files – .gitmodules and README.txt
– are listed as type blob, with the hashes of their file contents. This
is the usual way git refers to a file in a directory listing (see
Curious git, and Types of git objects). The new entry for
subproject is of type commit. The hash is the hash for current commit
of the myproject repository, in the subproject copy:
$ cd subproject
$ git log
commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Tue May 1 11:13:13 2012 +0100
Initial commit on myproject
Updating submodules from their source repositories¶
How do we keep the subproject copy of myproject up to date with the
original myproject repository?
To show this in action, we start by going back to the original myproject
repository to make another commit:
$ cd ../myproject
$ # Now in the original "myproject" directory
$ echo "More data" > some_more_data.txt
$ git add some_more_data.txt
$ git commit -m "Add some more data"
[main 43c26bf] Add some more data
1 file changed, 1 insertion(+)
create mode 100644 some_more_data.txt
$ git branch -v
* main 43c26bf Add some more data
Of course super has not changed, because we haven’t updated the submodule
clone:
$ cd ../super
$ git status
On branch main
nothing to commit, working tree clean
The subproject directory is a full git repository clone of the original
myproject. Remember that git submodule add created the directory by
cloning. The myproject clone has a remote from the URL we gave to git
submodule add.
$ # We're in the "super" directory
$ cd subproject
$ # Now we're in the submodule clone of "myproject"
$ git remote -v
origin /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject (fetch)
origin /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject (push)
We can do a fetch / merge to get the new commit:
$ # This is the same as "git pull"
$ git fetch origin
$ git merge origin/main
Updating 196bbbb..43c26bf
Fast-forward
some_more_data.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 some_more_data.txt
From /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject
196bbbb..43c26bf main -> origin/main
Now what do we see in super?
$ cd ..
$ # Now we are in the "super" directory
$ git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: subproject (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
$ git diff
diff --git a/subproject b/subproject
index 196bbbb..43c26bf 160000
--- a/subproject
+++ b/subproject
@@ -1 +1 @@
-Subproject commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
+Subproject commit 43c26bf6df6a4efade8082f3a5473b807d07c161
Git is not tracking the contents of the subproject directory, but the
git state of the directory. In this case, all super sees is that the
commit has changed.
$ git add subproject
As when we first added the submodule, a git add of the subproject
directory has the effect of updating the commit that the super tree is
pointing to in the staging area, but adds no new files to .git/objects.
If we do the commit, we can see the root tree listing now points
subproject to the new commit of myproject:
$ git commit -m "Update myproject with more data"
[main 3379d64] Update myproject with more data
1 file changed, 1 insertion(+), 1 deletion(-)
$ git ls-tree main
100644 blob 00b54ece7789a75ca80a0edb1e1b1e532a1833d8 .gitmodules
100644 blob 9c0042144fc489d7b528ef186af49e78c2867f91 README.txt
160000 commit 43c26bf6df6a4efade8082f3a5473b807d07c161 subproject
Cloning a repository with submodules¶
What happens if we clone the super project?
$ cd ..
$ # In directory below "super"
$ git clone super super-cloned
Cloning into 'super-cloned'...
done.
$ cd super-cloned
$ ls
README.txt
subproject
What is in the new subproject directory?
subproject
Nothing. When you git clone a project with submodules, git does not clone
the submodules.
Getting the submodule repository clone takes two steps. These are:
initialize with
git submodule init;clone with
git submodule update.
Initializing the submodule copies the repository submodule information in
.gitmodules to the repository .git/config file. Having this as a
separate step is useful when you want to use a different clone URL from the
one recorded in .gitmodules. This might happen if you want to use a local
repository to clone from instead of a slower internet repository. In this
case, you can do git submodule init, edit .git/config, and then do the
cloning with git submodule update.
Here’s .git/config before the init step:
$ # .git/config before submodule init
$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = /Volumes/zorg/mb312/dev_trees/curious-git/working/super
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/main
$ git submodule init
Submodule 'subproject' (/Volumes/zorg/mb312/dev_trees/curious-git/working/myproject) registered for path 'subproject'
.git/config after the init:
$ # .git/config after submodule init
$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = /Volumes/zorg/mb312/dev_trees/curious-git/working/super
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/main
[submodule "subproject"]
active = true
url = /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject
We have done init, but not update. The submodule directory is still
empty:
subproject
To do the submodule clone, use git submodule update after git submodule
init:
$ git submodule update
Submodule path 'subproject': checked out '43c26bf6df6a4efade8082f3a5473b807d07c161'
Cloning into '/Volumes/zorg/mb312/dev_trees/curious-git/working/super-cloned/subproject'...
done.
If you are happy to clone from the clone URL recorded in .gitmodules, then
you can do both init and update in one step with:
$ git submodule update --init