An algorithm for git push¶

We saw how git stores its object in Curious git.

Now we know about how git stores its objects, we can work out how git knows which objects to copy when it does a push.

Paths through commits¶

Git commits form a graph. The commits are the nodes in the graph. Each commit (except the first) has one or more parents, stored as hash references in the commit object. The references to commit parents are the links between nodes that form the edges in the graph.

If we start at a particular commit, and then track back following only one parent for each commit, this is a path in the commit history.

An algorithm¶

Something like this algorithm might do the job:

Get the commit hash corresponding the branch we are going to push;
Follow every commit path (see above)back from this commit, until we hit a commit hash (filename) that the remote has. All the previous commits on the path, that the remote does not have, are missing commits;
For every missing commit get the corresponding tree (directory listing) object. If the tree object is not in the remote objects directory, add to the list of missing trees;
For every missing tree read the tree directory listing. Find any blob (file) objects in the directory listing that are not in the remote objects directory, add to the list of missing blob objects 1;
Copy all missing commit, missing tree and missing blob objects to the remote objects directory;
Update the remote branch to point to the same commit as the local branch;
Update the local record of the last known position of the remote branch to point to the same commit.

There’s a specific example of git push at git push – synchronizing repositories. Here is how that example would follow our algorithm:

We look up the hash for main, and we get 7919d37dda9044f00cf2dc0677eed18156f75404 (abbreviated as 7919d37);
We follow all commit history paths back from 7919d37 to check for missing commits. We start with 7919d37. The remote does not have a matching file in objects, so this is a missing commit. We only have one path to follow, because 7919d37 has only one parent – {{ remote-commit1-sha }} – and the remote does have a corresponding object, so we can stop looking for missing commits;
We only have one missing commit, 7919d37. We look in the contents of 7919d37 to find the tree object hash. This is 2ce031cc4219d31835831f064a6a2d4fb0497c53. We check for this object in the remote objects directory, and sure enough, it is missing. We add this tree to the list of missing trees;
We only have one missing tree – 2ce031cc4219d31835831f064a6a2d4fb0497c53. We look in the contents of this tree object and check in the remote object directory for each object in this listing. The only missing object is 3ef5df2f711c919685f2063f24d0a18bab17760a;
We copy the objects for the missing commits ({{ remote-commit1-sha }}), missing trees (2ce031cc4219d31835831f064a6a2d4fb0497c53) and missing blobs (3ef5df2f711c919685f2063f24d0a18bab17760a) to the remote objects directory;
We set remote refs/heads/main to contain the hash 7919d37;
Set the local refs/remotes/usb_backup/main to contain 7919d37.

Footnotes

1: You have probably worked out by now that git directory listings can have files (called “blobs”) and subdirectories (“trees”) (see Types of git objects). When doing the copy, we actually have to recurse into any sub-directories to get needed file (“blob”) and subdirectory (“tree”) objects. But, you get the idea.