A tryst with git submodules

Tags: / git /

About my attempts at working with git submodules (before finally getting it to work).


'A git submodule is a repository embedded inside another repository' where each submodule 'has its own history'.⁡

I had been trying to work with git submodules and though I finally was able to use it properly (with a good deal of help from others), it gave me sufficient trouble that I'm a bit scared of forgetting what I found out. So I thought I better make a blog post on it.

Acknowledgements: Thanks to the folks at #git, especially osse and jast for helping me get things working.

At first, I er… didn't consult the git docs and relied on blog posts and stackoverflow. πŸ˜… This was what caused me trouble.

Overview

A rough description. May not be accurate. See docs for accurate info.

Task Command
Add a repo as submodule git submodule add <repo-url-or-path>
Remove a submodule git submodule rm <path/to/submodule/>
Initialize submodules after cloning git submodule init
Pull submodule changes as per parent repo git submodule update
Pull submodule changes as per remote repos git submodule update --remote
Deinitialize a submodule git submodule deinit <path/to/submodule/>
Show current commit of submodules git submodule [status]
Run a command on each submodule git submodule foreach <command>

Description

Adding or removing submodules

We can add a repo as a submodule to another repo with

git submodule add <repo-url-or-path>

The <repo-url-or-path> could either be a repo hosted online or one that we may already be having in our computer.

(Though online hosting services like github may have lead many people to get the idea that git is something meant to be used over the Internet, it needn't be so. git is decentralized but services like github aren't.)

The parent repo is called super-project in the git docs.

And we can remove a submodule from the parent repo with

git submodule rm <path/to/submodule/>

Using a cloned repo with submodules

After cloning a repo with submodules, you need to initialize the submodules before you can do much with them. This is something that I missed, causing a lot of confusion and costing me a lot of time.

git submodule init

After initializing the submodule you can get their contents as per the last commits checked into the parent repo with

git submodule update

(We could also use git submodule update --init to initialize submodule which haven't yet been initialized and then update.)

If we need the changes from the remote repo corresponding to the submodules and not just the changes that were included in the parent repo, we can use:

git submodule update --remote

Display current status of submodules

git submoduleγ€”status〕 can be used to get some info on the current status of the submodules of a repo.

(The square brackets around status mean that it's optional.)

It prints the commit hash of the currently checked out commit of each submodule.

This hash may be preceded one of the following symbols if something changed in the submodule:

Prefix Meaning
+ Current checked out commit is different
- Submodule not initialized
U Submodule has merge conflict

For example, the following git submodule status output

 3be52ac2a5e309b18d9a29f7a600db61d5831e91 child-repo-1 (3be52ac)
-274c8a70a146a39b512d66130d0746aa458c1528 child-repo-2

means currently checkout out commit of child-repo-1 is same as the one 'registered' with the parent repo of the submodule and that child-repo-2 is not initialized.

git submodule deinit

If we need to sort of stop considering a particular submodule when we do stuff like git submodule update, we can 'deinitialize' (or unregister) it with

git submodule deinit <path/to/submodule/>

Then if we change our mind we can initialize it again:

git submodule init <path/to/submodule/>

An example of a submodule being deinit-ed and init-ed:

$ git submodule deinit child-repo-1/
Cleared directory '111901058-compilers'
Submodule 'child-repo-1' (git@codeberg.org:user-1/child-repo-1.git) unregistered for path 'child-repo-1'

$ git submodule init child-repo-1/
Submodule 'child-repo-1' (git@codeberg.org:user-1/child-repo-1.git) registered for path 'child-repo-1'

git submodule foreach

foreach can be used to execute an arbitrary shell command within the root directories of each of the submodules inside a repo.

This command has access to the following special variables (whose values are obtained from the .gitmodules files) when run:

Variable Description
$name Name of submodule section in .gitmodules
$sm_path Path to submodule as in immediate parent repo
$displaypath Relative path from PWD to submodule root
$sha1 Submodule commit hash as in immediate parent repo
$toplevel Absolute path to root of immediate parent repo

Rough description, may not be accurate!

For example, for a repo with two submodules named child-repo-1 and child-repo-2, we can do something like

$ git submodule foreach 'echo $displaypath'
Entering 'child-repo-1'
child-repo-1
Entering 'child-repo-2'
child-repo-2

ssh vs https URLs

If there are a lot of submodules within a repo, it's probably better to use SSH URLs for managing the submodules rather than HTTPS as the latter requires password each time an update happens.

SSH links combined with ssh-add can relieve us of the need to type in the password each time if access to submodule repos need authentication.

gitmodules file

When you add a repo as submodule in a repo, the details of the submodule will get filled inside a .gitmodules file at the root of the parent repo.

It contains the path to the submodule within the parent repo and the url associated with the submodule repo, among other possible info.

A .gitmodules file would look something like:

[submodule "child-repo-1"]
    path = child-repo-1
    url = git@codeberg.org:user-1/child-repo-1.git
[submodule "child-repo-2"]
    path = child-repo-2
    url = git@codeberg.org:user-2/child-repo-2.git

(URLs in this sample .gitmodules are SSH URLs. Haven't checked if codeberg allows cloning via SSH. Only HTTPS links are showing up in repo pages.)

A problem with old git versions

Initially I was using git v2.17.0. But it seems that older versions of git had a problem where running git submodule update --remote would throw error if the branch names of all the submodules aren't the same.

Thankfully this has been fixed in v2.28.0.

As an example I tried making a set of dummy repos.

.
β”œβ”€β”€ parent (master)
β”œβ”€β”€ child1 (master)
β”‚   └── readme.md (commit 1)
└── child2 (main)
    └── readme.md (commit 1)

where parent, child1 and child2 are all git repos.

The branch name of parent and child1 was 'master' and that of child2 was 'main'.

I had a line of text in each of the readme.md files and made a commit in in both child repos.

Then I added them as submodules in the parent repo.

.
β”œβ”€β”€ parent (master)
β”‚   β”œβ”€β”€ child1@commit1
β”‚   └── child2@commit1
β”œβ”€β”€ child1 (master)
β”‚   └── readme.md (commit 1)
└── child2 (main)
    └── readme.md (commit 1)

Then I added a second line to both readme.md files, made a second commit in each of the two child repos and pushed the changes to remote.

.
β”œβ”€β”€ parent (master)
β”‚   β”œβ”€β”€ child1@commit1
β”‚   └── child2@commit1
β”œβ”€β”€ child1 (master)
β”‚   └── readme.md (commit 2)
└── child2 (main)
    └── readme.md (commit 2)

I went into the parent repo and tried to pull the changes with git submodule update --remote.

That didn't work with git v2.17.0 (the error message probably wouldn't be of much of help to identify the real problem for someone not well-versed in git) but went all right with git v2.30.2

It seems that this was because the older versions relied on the branch name to get things done, which caused complications when the branch names of the submodules were not the same.

So if you too were using a pre v2.28.0 git and ran into this problem, you now know what to do. Switch to a newer git. πŸ˜ƒ

There was an attempt…

When I realized the problem was the old git version, I tried using the latest version (v2.35.1 as of 2nd March 2022).

Tried building without any configuration hoping it would work. But it didn't.. (again because of me being too lazy to read the docs πŸ˜…).

Doing git submodule gave error saying 'submodule' is not a git command. Tried using another executable named git-submodule as well. But doing git-submodule update gave ./git-submodule: 22: .: git-sh-setup: not found.

I obviously hadn't done something that should've been done. So I gave up and tried another computer which had a newer git version (v2.30.2) by default.