Skip to content

Instantly share code, notes, and snippets.

@mmodrow
Last active February 10, 2026 15:32
Show Gist options
  • Select an option

  • Save mmodrow/58b1117060b1df337e0eabeb6decef90 to your computer and use it in GitHub Desktop.

Select an option

Save mmodrow/58b1117060b1df337e0eabeb6decef90 to your computer and use it in GitHub Desktop.
Git Plumbing to the Metal Workshop

Introduction

While most users see the directory that contains their code as their "repository" that's not technically true... That's what git calls the "working copy" and from git's perspective it is no more that a device for the user to make the handling of the "real" repository, which lives in the .git directory next to your code, more intuitive.

Besides the working copy users usually interact with the "porcellain"-commands with their repository; read "user interface outside of the wall". These are your git commit, git checkout, git rebase, etc. But they are merely a façade! They in turn talk to git using the "plumbing"-commands; read "technical foundation behind the wall, that acually gets stuff done!" Here we enter the realm of commands like git hash-object, git mktree, etc.

Just like most "users" of a bathroom will only interact with faucets, tubs, sinks & toilet bowls, they all are worth nothing without all the piping, valves, isolation, sink traps etc. which usually either work on their own or get serviced by a professional, when necessary.

Today we take the pick-axe, knock all the tiles off the wall, cut open the pipes to look into them and perform a deep dive on git's plumbing, right next to the metal (aka the file system within the .git directory)!

Preparation

For this tutorial we'll expect a git bash as the executing shell, because that should be available on any machine that runs git - duh!

It's useful to have this alias set:

alias deflate="perl -MCompress::Zlib -e 'undef $/; print uncompress(<>)'"

It allows to inspect files on a sub-git-level via:

deflate .git/objects/eb/510f79df142120ae1df25f6fe135a9d7ac6310

Workshop

Prepare the Repo

# This can be any name. It's the directory your code *would* be stored in.
mkdir working_directory
cd working_directory

# This is where the magic happens!
# Create the crucial "cannot live without"-directories.
mkdir -p .git/objects
mkdir -p .git/refs/heads

# Attach a HEAD
echo 'ref: refs/heads/master' >> .git/HEAD

# Marvel at your empty, yet functional, git repository!
git status

Committing a File

# Set us some content
echo 'workshopping git' | git hash-object -w --stdin

# It should print the new object's hash, either use it directly or look up the present files for further use.

# Look at the created file with
ls -R .git/objects
# or
find .git/objects -type f

# You'll find the newly created file as '.git/objects/##/#####[..]####' whereas the first '#'s are the first 
# 2 characters in the hash-string and the rest of the '#'s are the remaining hash-string of the file - hash 
# as in "sha1 hash of the type (blob), the content length and the content itself (plus a couple of semantic 
# delimiters)"; so the content is hard-baked into the file name and WILL be checked by git.

# Then use the hash in
git cat-file -p eb510f79df142120ae1df25f6fe135a9d7ac6310
# or read the file directly using the plain path and deflate.
deflate .git/objects/eb/510f79df142120ae1df25f6fe135a9d7ac6310
# It should look something like this:
# blob 17workshopping git

# Hang the content into a tree - the dollar sign and the \t are important!
echo $'100644 blob eb510f79df142120ae1df25f6fe135a9d7ac6310\tREADME.txt' | git mktree
# Wanna peek? Look above!

# Attach that tree to a commit
echo 'tree 5f15c377cc97845b7742e3e78d44ce6ca2b9ef0b
author Tester <tester@company.com> 1693218726 -0200
committer Also Tester <tester@company.com> 1693218726 -0200
This is a commit' | git hash-object -t commit -w --stdin
# Wanna peek? Look above!

# Set the master branch to the recently created commit.
echo '64d26b2fbc2923ad1b686d7ee4212c56127dfe23' > .git/refs/heads/master

git status
# Should print something along the lines of:
# On branch master
# Changes to be committed:
#  (use "git restore --staged <file>..." to unstage)
#        deleted:    README.txt

git log --all --decorate --oneline --graph

git checkout
# Should print something along the lines of:
# On branch master
# nothing to commit, working tree clean

ls -r
cat README.txt
## Marvel at the mysteriously appearing file!

Branching out

# Creating a follow-up commit
echo $'040000 tree 5f15c377cc97845b7742e3e78d44ce6ca2b9ef0b\tdocumentation' | git mktree
echo 'tree dceb8bbfce73ecd846060500b8e07282a95aa933
parent 64d26b2fbc2923ad1b686d7ee4212c56127dfe23
author Tester <tester@company.com> 1693218726 -0200
committer Also Tester <tester@company.com> 1693218726 -0200
This the 2nd commit' | git hash-object -t commit -w --stdin

# Check that commit out as a new branch
echo '6c5718f9025bef6bf0b387715541b5c97ff2529b' > .git/refs/heads/develop
echo 'ref: refs/heads/develop' > .git/HEAD

echo 'fresh content' | git hash-object -w --stdin

echo $'100644 blob e626596f6e70ed959407e4f20ac3a8cd040fd9e2\tREADME.md
040000 tree 5f15c377cc97845b7742e3e78d44ce6ca2b9ef0b\tbackup' | git mktree

echo 'tree facea251ceb30a8ed457fc4393941b3f4ac994a0
parent 64d26b2fbc2923ad1b686d7ee4212c56127dfe23
author Tester <tester@company.com> 1693218726 -0200
committer Also Tester <tester@company.com> 1693218726 -0200
This the first commit on another feature branch' | git hash-object -t commit -w --stdin
echo 'baebd21b39c90f1d960766f6cc112641e03227d3' > .git/refs/heads/feature1

git log --all --decorate --oneline --graph
git show --all --name-only

# Let's rebase
echo 'tree facea251ceb30a8ed457fc4393941b3f4ac994a0
parent 6c5718f9025bef6bf0b387715541b5c97ff2529b
author Tester <tester@company.com> 1693218726 -0200
committer Also Tester <tester@company.com> 1693218726 -0200
This the first commit on another feature branch - rebased onto develop' | git hash-object -t commit -w --stdin
echo 'bc8ea2cd1f739723d0866156ffabc381abb93093' > .git/refs/heads/feature1

git log --all --decorate --oneline --graph
git show --all --name-only
git diff baebd21 bc8ea2c
git diff baebd21 6c5718f
git log baebd21 --decorate --oneline --graph

# Corporate Merger!
echo 'tree facea251ceb30a8ed457fc4393941b3f4ac994a0
parent 64d26b2fbc2923ad1b686d7ee4212c56127dfe23
parent baebd21b39c90f1d960766f6cc112641e03227d3
author Tester <tester@company.com> 1693218726 -0200
committer Also Tester <tester@company.com> 1693218726 -0200
This a merge-commit' | git hash-object -t commit -w --stdin
echo 'a05d1851ea70d1b53bfa35ea6dcd6a99a4e5f874' > .git/refs/heads/feature1
git log --all --decorate --oneline --graph

Tags

Basic tags are in most practical concerns no more than fixed branches - aka they don't "wander on" when a new commit gets... comitted...

mkdir -p .git/refs/tags
echo "a05d1851ea70d1b53bfa35ea6dcd6a99a4e5f874" >  .git/refs/tags/feature-start

git log --all --decorate --oneline --graph
# Will now show the tag alongside the feature1-branch on the same commit!

... but not all tags are made equal! Some tags carry meta-information in what is called a "tag-object". Those are "annotated tags". They don't reference a commit object, but a tag object in the tag file!

echo $'object 64d26b2fbc2923ad1b686d7ee4212c56127dfe23
type commit
tag root
tagger Tester <tester@company.com> 1693218726 -0200

Rooting for you!' | git hash-object -t tag -w --stdin
echo '60044976e41a167fe07e479f25fed73fe01f7ed1' > .git/refs/tags/root

git log --all --decorate --oneline --graph
git show root

Remotes

Want remote access? Let's fake one!

 cp .git/refs/heads/master .git/refs/remotes/origin/master

git log --all --decorate --oneline --graph
* fc26684 (HEAD -> master, origin/master) This is a commit

Learnings

So what have we learned?

Everything is a File

Everything stored by git is a file. Sure, there are fancy layer that make it more performant or more robust, but that's more like the sugar-coating of a cake that would be delicous on its own. Branches? Directories with plain-text files in them. Tags? Plain-text files withing directories. Versioned files? Hashed files in a cleverly named file structure. Versioned directories? Same. Commits? See above!

Reuse what's possible, treeshake what is no longer needed

Because the versioned files get hached by their content ONLY and get stored on their hash identical files use the same git object file, regardless of age, location or the surrounding commits/branches. The same goes for tree objects. Commits are inherently less reusable, because they never get referred to by blobs or trees and only refer to each other as parents in the commit graph. The most reuse you'll get out of a commit is it having multiple children and being contained in multiple refs.

This has multiple benefits:

  • No need to store the same data twice.
    • If it's already there, just use that.
      • That even applies to whole directory trees!
    • If anything changes there are no side-effects; just create a new object!
    • Moving large files? Same blob!
  • Inherently tamper-proof!
    • Everything gets referred and located by its content's sha1 hash.
    • That has is cheap and easy to check.
      • Changing blob content? Hash mismatch.
      • Changing tree object? Hash mismatch!
      • Manipulating commits or the arrangement of the history? Hash mismatch!
  • No empty trees means no dangling, unintended empty directories
  • Objects are easy to check for structural integrity to prevent broken commits/directories/files

What do you lose?

  • File meta information
    • Creation date
    • Edit date
    • Access date
    • But they should be of little concern
  • User & group access control
    • Every file & directory on a working copy is owned by the user who checked them out.
    • The only difference between the access modifiers are
      • "Normal file"
        • 100644
      • Executable
        • 100755
      • Symbolic link
        • 120000
      • Directory
        • 040000
      • Submodule - so no file system element at all
        • 160000

Objects can get unreferenced.. A branch gets deleted and its commits start dangling, trees and blobs, that only those commits used, are no longer needed... This is why git can prune/garbage collect its objects, just like a runtime garbage-collects its memory: Look at the refs (branches, tags & HEAD), follow their lead and when you've traversed all the trees just burn what you didn't touch - or go dumpster diving before git gets fancy ideas to retrieve those precious commits you lost during the last rebase...

Commits are a STATE, not a CHANGESET

So commits are - in fact - not a change-set, but a notion of a file structure tree with blobs as leaves. Git is just very good at showing you the differences between trees! That's also the reason, why you can perform a shallow clone only a couple of commits deep! In a way, this is also why you cannot version empty directories without a .gitkeep-file... You cannot store an empty tree object!

postscript

As you have seen, git's plumbing is no more magical, than the plumbing behind your bathroom tiles! It's all just clever file management in a directory accompanied with rigorous hashing to prevent conflicts.

This knowledge is quite powerful! It shows you, why it's bad to check a big file in and remove it "before anyone notices" - every clone has that data from now on! It shows that a rebase is mostly resetting the parent of a commit, checking what changed from the old parent to the new one, apply that and then... recreate everything anew! Merge? That's just a commit that is blessed with a full pair of parents! You botched a rebase and lost your precious branch? It is still there! You'll have to dig, but you know where all the corpses are buried! It shows how "branching off" is just adding another wandering pointer to something that's already there. Detached head? Can be easily sawn back on by referencing it ANYWHERE!

So now go out, stay true to the metal, but remember "with great power comes great responsibility"! Don't go blowing up your collegues repositories (or your own), as hopefully you no longer are "meddling with powers you can't possibly comprehend"!

And as always: Before doing anything stupid: Make a backup copy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment