Image of Star Wars clones. Organizations need uniformity in order to be transparent, just like how the Star Wars clones are all uniform.

Photo by Omar Flores on Unsplash

I needed to find a solution to managing git hooks throughout the entire organization. I did some research and didn’t find a solution that met our needs, so myself and a couple colleagues built one. In the later sections of this article, I’ll describe what we did to solve the problem.

What are git hooks?

First, for those who aren’t aware of what git hooks are, let me take a moment to explain. Git hooks are scripts that are placed in the hidden .git folder in a git workspace. The scripts are automatically executed when git commit is executed. There are hooks that execute before the commit happens and hooks that execute after. One common use case for commit hooks are to execute linting and unit test operations before the commit happens, or to validate the commit message meets a certain standard. If these operations fail, the commit doesn’t happen.

1
2
$ git commit -am "badly formatted commit message"
Failed linting! Check message format and try again. Commit failed.

Managing git hooks can be a challenge. They can’t be versioned with other files in the repository, and without additional tooling, git hooks are mostly managed by each individual developer.

Tools to manage git hooks in single repositories

But tools like Husky have enabled development teams to manage git hooks for everyone working in a single repository. Instead of each developer managing their own hooks, the team can synchronize them across all collaborators.

The way Husky works is by taking the git hooks and storing them in a versionable hidden .husky folder. This folder, and the git hooks, can then be committed to the repository. When the dependencies are installed via npm, a preinstall npm hook executes a Husky script, which configures git to look for commit hooks inside the .husky folder instead of the .git/hooks folder:

.git/config

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
	hooksPath = .husky    # hooksPath overrides .git/hooks
[remote "origin"]
	url = https://github.com/$ORG/$SOME_REPO.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master

When developers execute git commands, those git hooks in .husky are executed. If a team member updates the hooks in the .husky folder, the team member commits and pushes them. Anyone pulling those changes will also benefit because Husky updates the hooks and ensures the hooksPath config entry is updated properly.

Introduction to organizational git hooks

But this isn’t what we needed. We needed something to manage client side hooks at the organization level. Because of a Jira integration that required us to use conventional commits and have Jira issue IDs added to each commit across the organization, we needed a way to lint the commits across hundreds of repositories owned by the organization. There are tools, such as the Naming Convention GitHub App, which can enforce commit message standards in the remote repositories. It is installed at the GitHub organization level, but it does very annoying server side validations. Once badly formatted commits are pushed to GitHub, developers and testers must use complex rebasing tools to modify the commit history and change the message. This is a huge headache, and we needed a better solution that balanced providing useful dasboards to the business and project management teams without sacrificing developer and tester workflows and productivity.

After spending time looking for organizational git hook management tools, I realized there may not be tools that would do what we needed. We essentially needed something similar to Husky, but instead of managing hooks for a single repository, we needed something that could deliver git commit hooks to contributors in all repositories. If something like this exists, I wasn’t able to find it.

Let me give you a little context on how the Naming Convention GitHub App works. It basically acts as a bot that comments on open pull requests. It validates commit messages using a regular expression set by our IT team. If it sees that a commit message doesn’t meet the required format, the bot comments on the pull request with a message telling developers and testers they need to rebase the commits and force push.

It’s annoying and a huge waste of time to find out your commit doesn’t meet the required format once it’s been pushed to GitHub. The solution is to validate the message client-side. But for it to be sustainable and maintainable, the IT team needs to be able to update the logic not just on the server side but also on the client side.

Let’s talk about how we solved this problem across most of the organization. A few of us took that regular expression from the Naming Convention bot and added it manually as a commit message hook in .git/hooks, and we tried it out to see if it could catch linting problems when they’re still easy to fix, without needing to rebase or force push. While we were largely satisfied with this approach, we knew we needed a way for IT to manage this process better. So I thought we could embed the commit message hook in a custom npm package and distribute it to the entire organization.

One problem we ran into is that the regular expression used in the Naming Convention bot wasn’t working in commit message hooks written in bash, and one of our engineers spent a considerable amount of time working on modifying it to make it catch the same problems the Naming Convention bot was catching.

I later came across an article, written by Dan Kelosky, titled How to Write a Git Hook with Node.js. From the article, I learned that we can write git commit hooks in any language. It makes sense. After all, git is just executing arbitrary scripts and doesn’t care what language they’re written in. I chose Node.js since it’s already a Node.js package and since it’s a language we’re familiar with. We did a quick experiment to see if the regular expression worked out of the box in Node.js, and it did. There are differences in how languages handle regular expressions, and they’re not always 100% portable. Since it worked in Node.js, this seemed easier to maintain, since the regular expression could be the same in both Node.js and in the Naming Convention bot’s configuration.

commit-msg:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env node

const fs = require('fs');       

const fgRed = "\x1b[31m";
const fgGreen = "\x1b[32m";
const fgBlack = "\x1b[0m";
const fgBlue = "\x1b[34m";

// Regex to validate the conventional commit message
// *** replace with your organization's linting regex ***
const regex = /.*/;

const file = fs.readFileSync(process.argv[2], 'utf-8'); // The file that contains the commit message

if (regex.test(file)) {
  console.log(`${fgGreen}Commit message is valid. Applying commit...${fgBlack}`);
  process.exit(0);
} else {
  console.error(`${fgRed}Aborting the commit - not in the expected conventional commit message format\n`);
  console.error(`${fgBlue}Valid examples: `);
  console.error(`  git commit -m "test(backend): unit test case added for AP-545"`);
  console.error(`  git commit -m "feat: added the new time travel feature for AP-1"`);
  console.error(`\nSee https://github.com/$ORG/$REPO/pkgs/npm/org-git-hooks to report any issues.\n${fgBlack}`);
  process.exit(1);
}

Next, with the node.js commit-msg script working great in .git/hooks, we now needed a way to deliver the commit-msg hook to any repo in the company that wanted to use it, so we wrote a preinstall.sh script that, when the module is installed, copies commit-msg.js to the initializing Node project’s .git/hooks folder. We started out testing this with npm link, and once it appeared to be passing our tests, we published it as a private GitHub npm package in our organization, and we tried it out on a couple repositories. Here’s what the initial postinstall.sh script looks like:

1
2
3
4
5
6
7
hooks_dir=$INIT_CWD/.git/hooks

echo $hooks_dir

echo Installing @org/org-git-hooks into $hooks_dir
cp commit-msg.js $hooks_dir/commit-msg
echo done

The commit-msg.js file is hard-coded into the package, but npm install @org/org-git-hooks -D is called by developers and testers who are installing it in their projects. The npm docs mentioned an environment variable called INIT_CWD which contains the path to the Node.js project where npm install is executed. We assume this is the root of the repository where .git/hooks is located. While this may not necessarily always be the case when developers and testers are hosting individual projects in monorepos, we don’t see this as a problem. Here’s why.

The first thing I noticed is that the preinstall script locates the package.json file using INIT_CWD. If we assume that the package.json is also in the root of the repository, then we can assume .git/hooks is just one folder below the location of package.json. If the package.json is a level or two below the repository root, then this won’t work. Since we had a couple projects that were inside monorepos, where several subfolders in the repository each contained their own separate npm modules and package.json files, I thought we should instead create a root level package.json. Instead of installing @org/org-git-hooks at the application level, we installed it at the repository root level.

The only purpose of the root level package.json, if it didn’t already exist, was to simply manage the @org/org-git-hooks package. This isn’t a dependency related to the individual packages in subfolders. It’s not a dependency that’s needed in order to run the applications or to test them. It is more like a repository management tool, so we treated it as such. Installing in a root level package.json worked great, and it was not a problem like I originally thought it might be. Some engineers who only work in Java are a bit annoyed that they would need to install Node.js and npm when they don’t need it, but I have an idea to resolve this issue that I’ll mention later.

Backing up old commit-msg hooks

Now, when you look at postinstall.sh, you may notice that it’s just blindly copying commit-msg.js to the .git/hooks folder. If you’re the kind of hacker who already has an army of git commit hooks that you’ve curated and configured, and you know it’s not versioned, you might be both surprised and annoyed to learn that some package your team just installed blew away your script. I couldn’t think of a way to 100% avoid this, but I wanted to build in a little bit of breathing room. So we implemented some code in the postinstall.sh script that would keep backups of the previous two commit-msg scripts whenever the package is installed or reinstalled. The commit-msg script is renamed to commit-msg.YYYY-MM-DD-HH:MM in order to give some space against accidental deletion. If you notice your commit-msg hook suddenly behaves differently, you have some time to retrieve your previous one from the backups in .git/hooks. At the same time, I didn’t want an infinite amount of backups to clutter the hard drive and the hooks folder, so we implemented some logic to do some cleanup. As you can imagine, this took some tweaking to get right, and in order to iterate faster without having to create and delete commit messages, we monkey patched the script to make it more testable. This allowed us to run the script manually in a test environment to observe if it creates backups successfully and deletes old ones without any issues. Here’s the final version of postinstall.sh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
INIT_CWD=${INIT_CWD:-$PWD}
hooks_dir=$INIT_CWD/${HOOKS_DIR:-.git/hooks}

echo $hooks_dir

if [[ -f $hooks_dir/commit-msg ]]; then
  echo Backing up existing commit-msg hook...
  mv $hooks_dir/commit-msg $hooks_dir/commit-msg.$(date +%Y-%m-%d-%T)
  len=`ls -lt $hooks_dir/commit-msg.* | egrep "commit-msg.[0-9]{4}-[0-9]{2}-[0-9]{2}" | awk '{print $9}' | wc -l`
  echo "len = " $len
  if [[ "$len" -gt 2 ]]; then
    echo "we will delete old backups except most recent two backups of commit-msg"
    rm `ls -lt $hooks_dir/commit-msg.* | egrep "commit-msg.[0-9]{4}-[0-9]{2}-[0-9]{2}" | awk '{print $9}' | tail -n +3`
  fi
fi

echo Installing @org/org-git-hooks into $hooks_dir
cp commit-msg.js $hooks_dir/commit-msg
echo done

NOTE: Later, when I looked into how git hooks are executed in Husky, I realized Husky changes the location of the git hooks folder in the .git/config file. Instead of implementing the above backup logic, we could just simply execute the following script in postinstall.sh instead:

1
2
3
4
5
6
7
INIT_CWD=${INIT_CWD:-$PWD}
hooks_dir=$INIT_CWD/${HOOKS_DIR:-.git/org-hooks}  # use org-hooks instead of hooks

echo Installing @org/org-git-hooks into $hooks_dir
git config core.hooksPath $hooks_dir
mkdir -p $hooks_dir
cp commit-msg.js $hooks_dir/commit-msg

If we’re configuring git to use a different folder for executing hooks, and then we install our organizational hooks into that location instead, then there is nothing to overwrite. I chose to install into .git instead of some custom folder because .git is automatically gitignored, so I don’t need to manage anything in the repository .gitignore file. Simple wins.

Create your own @org/org-git-hooks package

Now that we’ve covered the problem we were trying to solve, and how we solved it, let’s look at how you can replicate this yourself. You can create your own organization’s org-git-hooks package by running npm init in a new folder, copying in postinstall.sh and commit-msg.js, and publishing the package privately on npm or GitHub npm packages, or some other npm repository.

Just replace @org/org-git-hooks with your own scope/package name, such as @abccorp/abccorp-git-hooks and publish it.

The way the package works is this. Anyone in the organization can run npm install @org/org-git-hooks -D into the root of their Node.js, Java, Python, Kotkin, Go, or whatever kind of project they’re running. Programming language really doesn’t matter since the tool is only managing git hooks. It isn’t interacting with any application code. Again, some Java engineers were annoyed that they needed to install Node.js, but I have some ideas on how to solve that.

After installing the package, it installs the commit-msg hook. Developers and testers can then use git as normal to commit code, and the commit-msg hook will catch any commit message linting issues client side, which makes them 100 times easier to fix than once the commit messages and repository code reaches the GitHub server.

If an IT administrator needs to make updates to the Naming Convention bot, that person can also update the org-git-hooks package with the new regular expression, bump the version, and republish the package. CI really helps with this so those team members don’t have to deal with the complexities of actually publishing the package and not making any mistakes. We did explore the idea of writing some unit tests to make sure the new commit message regular expression doesn’t break anything, and if unit tests fail, we could theoretically stop the package from being published.

When the package is published, team members throughout the organization obtain the latest git commit message hook by updating the package, which installs the new commit-msg hook in the .git/hooks folder.

Now, like all massive rollouts, there are a few constraints that will likely pop up. So far, there is one that I can think of: First, any developers managing their own git commit message hook will likely have problems with this solution since it replaces their own hook. We solved the problem of potentially deleting someone’s custom commit-msg hook, but we haven’t yet solved the problem of how can they use their own custom commit-msg hook along with the organization’s commit-msg hook.

To give some context, let’s say we have a developer who has a custom hook that lints the commit message to check the commit message length, while the organization’s hook just checks conventional commit format. A configuration option could tell the package to install a commit-msg hook that delegates to two additional scripts, commit-msg.js and custom-commit-msg. When committing, git executes commit-msg, which then executes the other two scripts. The combined success or single failure of one of them will then determine if the commit succeeds or fails.

However, the first version of this does not take that use case into account.

The last point I want to make is this isn’t configurable through some dashboard or central admin panel. The regular expression is embedded within the package directly. To turn it into something that could be used, and configured, across other organizations would likely involve some kind of hosted service, where the npm package is configured with an auth token. The regular expression or commit-msg code itself could then be retrieved from a database or from another GitHub repository.

A published, generalized to any organization version of this solution, where a team can configure it to pull the hooks from another repository or central store, may be something I look into in the future. But for the time being, the simplest option is for each organization to build their own using the above instructions and templates.

Avoid depending on Node.js and npm

The last issue I want to address, which I promised I’d discuss, is the fact that this solution entirely depends on Node.js and npm, even in projects built in other programming languages. While I personally don’t think this is a big deal, I also believe we need more empathy in engineering and to really go out of our way to understand where others are coming from. Maybe there’s something I’m just missing. Maybe there’s something I’m just not understanding because I may not have all of the context to see how this can be a problem. So, let’s look at some solutions.

There are compilers which can create executable binaries based on Node.js scripts, and since there aren’t any dependencies to do battle with, it is possible to generate binaries for each targeted platform in CI and distribute those instead. If compiled into a binary, this means Node.js isn’t required to execute commit-msg.js. Vercel maintains, or at least they used to maintain, a package that compiles Node.js scripts into a binary targeting various platforms. You can install it with npm i pkg -g. I did compile the binary and try it on the Mac M2. It actually worked, but the tiny 1262 byte plain-text JavaScript file gets compiled into a whopping 44 MB binary. Maybe that’s an okay tradeoff if you don’t want to install Node.js and npm.

1
2
$ ls -lth .git/hooks/commit-msg
-rwxr-xr-x  1 james  staff    44M May  7 18:13 commit-msg

Now that we’ve eliminated the need to use a Node.js runtime to execute the commit hook, the remaining issues we need to tackle, at least to avoid Node.js, are related to distribution and installation of the commit-msg script. Instead of distributing the package via npm (or Maven, or Gradle), we can simply install it via curl directly from the command line. We’ll get into what that looks like shortly.

Now, since @org/org-git-hooks, at least in theory, is a private package, curl does require a GitHub personal access token in order to authenticate with GitHub. You can obtain a token from GitHub Settings - Tokens. Also, the $ORG and $REPO environment variables need to be replaced with the organization and repository where the source code is located. Additionally, the compiled binary would need to be downloaded, perhaps from GitHub releases so we’re not needing to commit binary data to GitHub. GitHub doesn’t really deal with binary files very well, so using GitHub releases is much cleaner.

Downloading files from the GitHub releases section is pretty straightforward if doing so through the browser, and it works great when scripting it as well, as long as the repository is public. But once we start dealing with private repositories, scripting the download of a release becomes more tedious and involves using the GitHub API. If we want to go the “no Node.js route”, then we’ll modify postinstall.sh to download the most recent commit-msg binary from GitHub Releases. To avoid confusion, let’s call this install-commit-msg.sh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
INIT_CWD=${INIT_CWD:-$PWD}
hooks_dir=$INIT_CWD/${HOOKS_DIR:-.git/hooks}

echo $hooks_dir

####################################
# Get latest release_id and asset_id
####################################

release_id=`curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer $GITHUB_TOKEN"\
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/$ORG/$REPO/releases/latest | grep releases/assets | sed 's/\"//g' | sed 's/,//g' | sed 's/\// /g' | awk '{print $9}'`

asset_id=`curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer $GITHUB_TOKEN"\
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/$ORG/$REPO/releases/$release_id/assets | grep releases/assets | sed 's/\// /g' | sed 's/\"//g' | sed 's/,//g' | awk '{print $9}'`

######################################################
# Download the commit-msg binary using the GitHub API.
######################################################
curl -L \
  -H "Accept: application/octet-stream" \
  -H "Authorization: Bearer $GITHUB_TOKEN" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  -o commit-msg \
  https://api.github.com/repos/$ORG/$REPO/releases/assets/$asset_id

#############################################################
# Install commit-msg hook in .git/hooks and keep two backups.
#############################################################

if [[ -f $hooks_dir/commit-msg ]]; then
  echo Backing up existing commit-msg hook...
  mv $hooks_dir/commit-msg $hooks_dir/commit-msg.$(date +%Y-%m-%d-%T)
  len=`ls -lt $hooks_dir/commit-msg.* | egrep "commit-msg.[0-9]{4}-[0-9]{2}-[0-9]{2}" | awk '{print $9}' | wc -l`
  echo "len = " $len
  if [[ "$len" -gt 2 ]]; then
    echo "we will delete old backups except most recent two backups of commit-msg"
    rm `ls -lt $hooks_dir/commit-msg.* | egrep "commit-msg.[0-9]{4}-[0-9]{2}-[0-9]{2}" | awk '{print $9}' | tail -n +3`
  fi
fi

echo Installing @org/org-git-hooks into $hooks_dir
cp commit-msg $hooks_dir/commit-msg
echo done

If we execute sh install-commit-msg.sh in the root of the repository where we want to install the commit-msg hook, it will use the GitHub API to obtain the release_id and asset_id for the most recent release of the commit-msg binary. If we check the .git/hooks folder, we’ll find that the file is installed, with proper backups made.

The install-commit-msg.sh script works great. If we take a page out of the playbook of tools such as nvm and git-bash-for-mac, we can just include a tiny one-liner script in the README file that downloads install-commit-msg.sh and executes it immediately after the download is complete. What makes it different than the curl-based installer commands in nvm and git-bash-for-mac is that, since this is a private repository, we need to include some extra headers, as well as your GitHub personal access token, in order to download the script. Below is an example of the one-liner that you can include in your README instructions in your organization’s org-git-hooks repository:

1
$ curl -L -o- -H "Accept: application/vnd.github+json" -H "Authorization: Bearer $GITHUB_TOKEN" -H "X-GitHub-Api-Version: 2022-11-28" https://raw.githubusercontent.com/$ORG/$REPO/main/packages/org-git-hooks/install-commit-msg.sh | bash

It goes without saying that you must treat the GITHUB_TOKEN like a password. Essentially, it is a password that grants anyone who has it access to your entire GitHub repository and any repositories in organizations you are a member of. So please be careful.

I hope this information helps give you some ideas on how you can manage git hooks on the client side but at the GitHub organizational level. If you have other ideas or feedback on this solution, please leave a comment below.