How do git submodules work?¶
Git submodules can be a little confusing.
This page explains how git stores submodules. My hope is that this will make it easier to understand how to use submodules.
If you’ve read Curious git you will recognize this way of thinking.
Why submodules?¶
Submodules are useful when you have a project that is under git version control, and you want to include a copy of another project that is also under git version control.
Worked example¶
We will call the project that we need to use myproject
, and the project
that is using myproject
we will call super
.
We are expecting that myproject
will continue to develop.
super
is going to start using some version of myproject
. In the
spirit of version control, we want to keep track of exactly which
myproject
version super
is using.
myproject
¶
We make a little myproject
to start:
$ mkdir myproject
$ cd myproject
$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject/.git/
$ echo "Important code and data" > some_data.txt
$ git add some_data.txt
$ git commit -m "Initial commit on myproject"
[main (root-commit) 196bbbb] Initial commit on myproject
1 file changed, 1 insertion(+)
create mode 100644 some_data.txt
Back to the working directory containing the repositories:
$ cd ..
super
¶
Now a super
project:
$ mkdir super
$ cd super
$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/super/.git/
Remember (from Curious git) that doing git add
on a file adds a new
copy of that file to the .git/objects
directory. So, .git/objects
starts off empty:
objects
├── info
└── pack
When we git add
a file, there is one new file in .git/objects
:
$ echo "This project will use ``myproject``" > README.txt
$ git add README.txt
objects
├── 9c
│ └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── info
└── pack
Now do the first commit for super
:
$ git commit -m "Initial commit on super"
[main (root-commit) 2326240] Initial commit on super
1 file changed, 1 insertion(+)
create mode 100644 README.txt
The commit made two new objects in the .git/objects
directory:
a tree object giving the directory listing of the root directory;
a commit object giving information about the commit itself.
So, we now have three files in .git/objects
:
objects
├── 23
│ └── 262403a0b913d02219ead935dd1a85d3724a0d [139B]
├── 9c
│ └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── f1
│ └── 3a8c8331c76ac965c43b09d11ee2d72bb053c1 [55B]
├── info
└── pack
Adding myproject
as a submodule of super
¶
We use a git submodule to put myproject
inside super
. We will use the
name subproject
for the submodule copy of myproject
, to make clear
that it is the submodule copy:
$ git submodule add ../myproject subproject
Cloning into '/Volumes/zorg/mb312/dev_trees/curious-git/working/super/subproject'...
done.
What just happened?:
$ git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitmodules
new file: subproject
Notice that git submodule
has already staged its changes, so we need the
--staged
flag to git diff
to see what has changed:
$ git diff --staged
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..00b54ec
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "subproject"]
+ path = subproject
+ url = ../myproject
diff --git a/subproject b/subproject
new file mode 160000
index 0000000..196bbbb
--- /dev/null
+++ b/subproject
@@ -0,0 +1 @@
+Subproject commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
As you saw, the output from git submodule
says Cloning into
subproject
, and sure enough, if we look in the new subproject
directory,
there is a clone of myproject
there:
subproject
├── .git [35B]
└── some_data.txt [24B]
So, git submodule
has:
cloned
myproject
tosuper
subdirectorysubproject
;created and staged a small text file called
.gitmodules
that records the relationship of thesubproject
subdirectory to the originalmyproject
repository;claimed to have made a new file in the
super
repository that records themyproject
commit that the submodule contains.
It’s the last of these three that is a little strange, so we will explore.
Storing the current commit of myproject
¶
Why do I say that git “claimed” to have made a new file to record the
myproject
commit?
Remember that we had three files in the .git/objects
directory of
super
after the first commit. After git submodule add
we have four:
objects
├── 00
│ └── b54ece7789a75ca80a0edb1e1b1e532a1833d8 [64B]
├── 23
│ └── 262403a0b913d02219ead935dd1a85d3724a0d [139B]
├── 9c
│ └── 0042144fc489d7b528ef186af49e78c2867f91 [43B]
├── f1
│ └── 3a8c8331c76ac965c43b09d11ee2d72bb053c1 [55B]
├── info
└── pack
The new object has hash 00b54ece7789a75ca80a0edb1e1b1e532a1833d8
, and it contains the contents of
the new .gitmodules
file:
$ git cat-file -p 00b54ece7789a75ca80a0edb1e1b1e532a1833d8
[submodule "subproject"]
path = subproject
url = ../myproject
There is only one new object in .git/objects
, and that is for
.gitmodules
. Therefore there is no new git object corresponding to the
myproject
repository. In fact what has happened, is that git records the
commit for myproject
in the directory listing, instead of recording the
subproject
directory as a subdirectory (tree object) or a file (blob
object). That is a bit difficult to see at the moment, because the directory
listing is in the git staging area and not yet written into a tree object. To
write the tree object, we do a commit:
$ git commit -m "Adding the submodule"
[main 7c556d2] Adding the submodule
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules
create mode 160000 subproject
The exotic git ls-tree
command shows us the contents of the new root tree
object (directory listing) for this commit:
$ git ls-tree main
100644 blob 00b54ece7789a75ca80a0edb1e1b1e532a1833d8 .gitmodules
100644 blob 9c0042144fc489d7b528ef186af49e78c2867f91 README.txt
160000 commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5 subproject
As you can see, the two real files – .gitmodules
and README.txt
– are listed as type blob
, with the hashes of their file contents. This
is the usual way git refers to a file in a directory listing (see
Curious git, and Types of git objects). The new entry for
subproject
is of type commit
. The hash is the hash for current commit
of the myproject
repository, in the subproject
copy:
$ cd subproject
$ git log
commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Tue May 1 11:13:13 2012 +0100
Initial commit on myproject
Updating submodules from their source repositories¶
How do we keep the subproject
copy of myproject
up to date with the
original myproject
repository?
To show this in action, we start by going back to the original myproject
repository to make another commit:
$ cd ../myproject
$ # Now in the original "myproject" directory
$ echo "More data" > some_more_data.txt
$ git add some_more_data.txt
$ git commit -m "Add some more data"
[main 43c26bf] Add some more data
1 file changed, 1 insertion(+)
create mode 100644 some_more_data.txt
$ git branch -v
* main 43c26bf Add some more data
Of course super
has not changed, because we haven’t updated the submodule
clone:
$ cd ../super
$ git status
On branch main
nothing to commit, working tree clean
The subproject
directory is a full git repository clone of the original
myproject
. Remember that git submodule add
created the directory by
cloning. The myproject
clone has a remote from the URL we gave to git
submodule add
.
$ # We're in the "super" directory
$ cd subproject
$ # Now we're in the submodule clone of "myproject"
$ git remote -v
origin /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject (fetch)
origin /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject (push)
We can do a fetch
/ merge
to get the new commit:
$ # This is the same as "git pull"
$ git fetch origin
$ git merge origin/main
Updating 196bbbb..43c26bf
Fast-forward
some_more_data.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 some_more_data.txt
From /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject
196bbbb..43c26bf main -> origin/main
Now what do we see in super
?
$ cd ..
$ # Now we are in the "super" directory
$ git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: subproject (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
$ git diff
diff --git a/subproject b/subproject
index 196bbbb..43c26bf 160000
--- a/subproject
+++ b/subproject
@@ -1 +1 @@
-Subproject commit 196bbbb2b7497fdc868fa61425959d23ff1c0fe5
+Subproject commit 43c26bf6df6a4efade8082f3a5473b807d07c161
Git is not tracking the contents of the subproject
directory, but the
git state of the directory. In this case, all super
sees is that the
commit has changed.
$ git add subproject
As when we first added the submodule, a git add
of the subproject
directory has the effect of updating the commit that the super
tree is
pointing to in the staging area, but adds no new files to .git/objects
.
If we do the commit, we can see the root tree listing now points
subproject
to the new commit of myproject
:
$ git commit -m "Update myproject with more data"
[main 3379d64] Update myproject with more data
1 file changed, 1 insertion(+), 1 deletion(-)
$ git ls-tree main
100644 blob 00b54ece7789a75ca80a0edb1e1b1e532a1833d8 .gitmodules
100644 blob 9c0042144fc489d7b528ef186af49e78c2867f91 README.txt
160000 commit 43c26bf6df6a4efade8082f3a5473b807d07c161 subproject
Cloning a repository with submodules¶
What happens if we clone the super
project?
$ cd ..
$ # In directory below "super"
$ git clone super super-cloned
Cloning into 'super-cloned'...
done.
$ cd super-cloned
$ ls
README.txt
subproject
What is in the new subproject
directory?
subproject
Nothing. When you git clone
a project with submodules, git does not clone
the submodules.
Getting the submodule repository clone takes two steps. These are:
initialize with
git submodule init
;clone with
git submodule update
.
Initializing the submodule copies the repository submodule information in
.gitmodules
to the repository .git/config
file. Having this as a
separate step is useful when you want to use a different clone URL from the
one recorded in .gitmodules
. This might happen if you want to use a local
repository to clone from instead of a slower internet repository. In this
case, you can do git submodule init
, edit .git/config
, and then do the
cloning with git submodule update
.
Here’s .git/config
before the init
step:
$ # .git/config before submodule init
$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = /Volumes/zorg/mb312/dev_trees/curious-git/working/super
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/main
$ git submodule init
Submodule 'subproject' (/Volumes/zorg/mb312/dev_trees/curious-git/working/myproject) registered for path 'subproject'
.git/config
after the init
:
$ # .git/config after submodule init
$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = /Volumes/zorg/mb312/dev_trees/curious-git/working/super
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/main
[submodule "subproject"]
active = true
url = /Volumes/zorg/mb312/dev_trees/curious-git/working/myproject
We have done init
, but not update
. The submodule directory is still
empty:
subproject
To do the submodule clone, use git submodule update
after git submodule
init
:
$ git submodule update
Submodule path 'subproject': checked out '43c26bf6df6a4efade8082f3a5473b807d07c161'
Cloning into '/Volumes/zorg/mb312/dev_trees/curious-git/working/super-cloned/subproject'...
done.
If you are happy to clone from the clone URL recorded in .gitmodules
, then
you can do both init
and update
in one step with:
$ git submodule update --init