'A git submodule is a repository embedded inside another repository' where each submodule 'has its own history'.β΅
I had been trying to work with git submodules and though I finally was able to use it properly (with a good deal of help from others), it gave me sufficient trouble that I'm a bit scared of forgetting what I found out. So I thought I better make a blog post on it.
Acknowledgements: Thanks to the folks at #git, especially osse and jast for helping me get things working.
At first, I erβ¦ didn't consult the git docs and relied on blog posts and stackoverflow. π This was what caused me trouble.
A rough description. May not be accurate. See docs for accurate info.
Task | Command |
---|---|
Add a repo as submodule | git submodule add <repo-url-or-path> |
Remove a submodule | git submodule rm <path/to/submodule/> |
Initialize submodules after cloning | git submodule init |
Pull submodule changes as per parent repo | git submodule update |
Pull submodule changes as per remote repos | git submodule update --remote |
Deinitialize a submodule | git submodule deinit <path/to/submodule/> |
Show current commit of submodules | git submodule [status] |
Run a command on each submodule | git submodule foreach <command> |
We can add a repo as a submodule to another repo with
git submodule add <repo-url-or-path>
The <repo-url-or-path>
could either be a repo
hosted online or one that we may already be having in our computer.
(Though online hosting services like github may have lead many people to get the idea that git is something meant to be used over the Internet, it needn't be so. git is decentralized but services like github aren't.)
The parent repo is called super-project in the git docs.
And we can remove a submodule from the parent repo with
git submodule rm <path/to/submodule/>
After cloning a repo with submodules, you need to initialize the submodules before you can do much with them. This is something that I missed, causing a lot of confusion and costing me a lot of time.
git submodule init
After initializing the submodule you can get their contents as per the last commits checked into the parent repo with
git submodule update
(We could also use git submodule update --init
to
initialize submodule which haven't yet been initialized and then
update.)
If we need the changes from the remote repo corresponding to the submodules and not just the changes that were included in the parent repo, we can use:
git submodule update --remote
git submoduleγstatusγ
can be used to get some info on the current status of the submodules of
a repo.
(The square brackets around status
mean that it's
optional.)
It prints the commit hash of the currently checked out commit of each submodule.
This hash may be preceded one of the following symbols if something changed in the submodule:
Prefix | Meaning |
---|---|
+ |
Current checked out commit is different |
- |
Submodule not initialized |
U |
Submodule has merge conflict |
For example, the following git submodule status
output
3be52ac2a5e309b18d9a29f7a600db61d5831e91 child-repo-1 (3be52ac)
-274c8a70a146a39b512d66130d0746aa458c1528 child-repo-2
means currently checkout out commit of child-repo-1
is
same as the one 'registered' with the parent repo of the submodule and
that child-repo-2
is not initialized.
git submodule deinit
If we need to sort of stop considering a particular submodule when we
do stuff like git submodule update
, we can 'deinitialize'
(or unregister) it with
git submodule deinit <path/to/submodule/>
Then if we change our mind we can initialize it again:
git submodule init <path/to/submodule/>
An example of a submodule being deinit-ed and init-ed:
$ git submodule deinit child-repo-1/
Cleared directory '111901058-compilers'
Submodule 'child-repo-1' (git@codeberg.org:user-1/child-repo-1.git) unregistered for path 'child-repo-1'
$ git submodule init child-repo-1/
Submodule 'child-repo-1' (git@codeberg.org:user-1/child-repo-1.git) registered for path 'child-repo-1'
git submodule foreach
foreach
can be used to execute an arbitrary shell command within the root
directories of each of the submodules inside a repo.
This command has access to the following special variables (whose
values are obtained from the .gitmodules
files) when
run:
Variable | Description |
---|---|
$name |
Name of submodule section in .gitmodules |
$sm_path |
Path to submodule as in immediate parent repo |
$displaypath |
Relative path from PWD to submodule root |
$sha1 |
Submodule commit hash as in immediate parent repo |
$toplevel |
Absolute path to root of immediate parent repo |
Rough description, may not be accurate!
For example, for a repo with two submodules named
child-repo-1
and child-repo-2
, we can do
something like
$ git submodule foreach 'echo $displaypath'
Entering 'child-repo-1'
child-repo-1
Entering 'child-repo-2'
child-repo-2
If there are a lot of submodules within a repo, it's probably better to use SSH URLs for managing the submodules rather than HTTPS as the latter requires password each time an update happens.
SSH links combined with ssh-add
can relieve us of the
need to type in the password each time if access to submodule repos need
authentication.
When you add a repo as submodule in a repo, the details of the
submodule will get filled inside a .gitmodules
file at the root of the parent repo.
It contains the path to the submodule within the parent repo and the url associated with the submodule repo, among other possible info.
A .gitmodules
file would look something like:
[submodule "child-repo-1"]
path = child-repo-1
url = git@codeberg.org:user-1/child-repo-1.git
[submodule "child-repo-2"]
path = child-repo-2
url = git@codeberg.org:user-2/child-repo-2.git
(URLs in this sample .gitmodules
are SSH URLs. Haven't
checked if codeberg allows cloning via SSH. Only HTTPS links are showing
up in repo pages.)
Initially I was using git v2.17.0. But it seems that older versions
of git had a problem where running
git submodule update --remote
would throw error if the
branch names of all the submodules aren't the same.
Thankfully this has been fixed in v2.28.0.
As an example I tried making a set of dummy repos.
.
βββ parent (master)
βββ child1 (master)
β βββ readme.md (commit 1)
βββ child2 (main)
βββ readme.md (commit 1)
where parent
, child1
and
child2
are all git repos.
The branch name of parent
and child1
was
'master' and that of child2
was 'main'.
I had a line of text in each of the readme.md
files and
made a commit in in both child repos.
Then I added them as submodules in the parent repo.
.
βββ parent (master)
β βββ child1@commit1
β βββ child2@commit1
βββ child1 (master)
β βββ readme.md (commit 1)
βββ child2 (main)
βββ readme.md (commit 1)
Then I added a second line to both readme.md
files, made
a second commit in each of the two child repos and pushed the changes to
remote.
.
βββ parent (master)
β βββ child1@commit1
β βββ child2@commit1
βββ child1 (master)
β βββ readme.md (commit 2)
βββ child2 (main)
βββ readme.md (commit 2)
I went into the parent repo and tried to pull the changes with
git submodule update --remote
.
That didn't work with git v2.17.0 (the error message probably wouldn't be of much of help to identify the real problem for someone not well-versed in git) but went all right with git v2.30.2
It seems that this was because the older versions relied on the branch name to get things done, which caused complications when the branch names of the submodules were not the same.
So if you too were using a pre v2.28.0 git and ran into this problem, you now know what to do. Switch to a newer git. π
When I realized the problem was the old git version, I tried using the latest version (v2.35.1 as of 2nd March 2022).
Tried building without any configuration hoping it would work. But it didn't.. (again because of me being too lazy to read the docs π ).
Doing git submodule
gave error saying
'submodule' is not a git command
. Tried using another
executable named git-submodule
as well. But doing
git-submodule update
gave
./git-submodule: 22: .: git-sh-setup: not found
.
I obviously hadn't done something that should've been done. So I gave up and tried another computer which had a newer git version (v2.30.2) by default.