git remotes - working with other people, making backups

This page follows on from Curious git.

It covers git remotes. Remotes are links to other git repositories.

Now you are keeping the history of your data with git, you also want to make sure you have a backup in case your current computer dies.

You might want to work with a colleague on the same project. Perhaps your colleague Anne is working on the same files, and you want to merge her changes into yours.

We use git “remotes” to solve both of these problems. Commands for working with remotes are:

  • git remote – for adding and editing remotes;

  • git clone – make a new copy of a repository, and make a remote that points to the original repository;

  • git fetch – update stored information about a remote repository;

  • git push – upload information from this repository to a remote repository;

  • git pull – a command combining git fetch and git merge. The command first fetches information from the remote repository, then merges the current state of a remote branch with a local branch.

Keeping backups with remotes

Let’s say you have an external backup disk and you want to record all the history of your work on the backup disk.

To do this you need three steps:

  • Make an empty backup repository on the external backup disk;

  • Point your current git repository at the backup repository with git remote add;

  • Send the changes to the backup repository with git push.

Start with a git repository

To get started, we make a new repository with the same Nobel-prize-winning paper we saw in Curious git. To type along, download and unzip nobel_prize. You should have a nobel_prize directory:

nobel_prize
├── clever_analysis.py [618B]
├── expensive_data.csv [244K]
└── fancy_figure.png [183K]

Now make a new git repository:

[desktop]$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/nobel_prize/.git/

Add all the files and make the first commit:

[desktop]$ git add clever_analysis.py
[desktop]$ git add fancy_figure.png
[desktop]$ git add expensive_data.csv
[desktop]$ git commit -m "First backup of my amazing idea"
[main (root-commit) 75206bc] First backup of my amazing idea
 3 files changed, 5023 insertions(+)
 create mode 100755 clever_analysis.py
 create mode 100644 expensive_data.csv
 create mode 100644 fancy_figure.png

As we expected from our curious understanding, there are 5 objects in the .git/objects directory, one for each of the three files we git added, one for the directory listing, and one for the commit file:

objects
├── pack
├── info
├── ff
│   └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│   └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│   └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│   └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
    └── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]

Make the empty backup repository

Let’s say your external disk is mounted at /Volumes/my_usb_disk.

We make a new empty repository:

[desktop]$ git init --bare /Volumes/my_usb_disk/nobel_prize.git
Initialized empty Git repository in /Volumes/my_usb_disk/nobel_prize.git/

Notice the --bare flag. This tells git to make a repository that does not have a working tree. The bare repository only has the stuff that we are used to seeing in the .git directory of a standard git repository:

nobel_prize.git
├── refs
│   ├── tags
│   └── heads
├── objects
│   ├── pack
│   └── info
├── info
│   └── exclude [240B]
├── hooks
│   (13 files)
├── HEAD [21B]
├── config [111B]
└── description [73B]

We do not want a working tree in our case, because we will not ever want to edit the files in the /Volumes/my_usb_disk backup repository, we will only be editing files in our local nobel_prize directory, committing those changes locally (as we have done above), and then “pushing” these changes to the backup repository 1.

Tell the current git repository about the backup repository

Check we’re in our local git repository:

[desktop]$ pwd
/Volumes/zorg/mb312/dev_trees/curious-git/working/nobel_prize

Add a remote. A remote is a link to another repository.

[desktop]$ git remote add usb_backup /Volumes/my_usb_disk/nobel_prize.git

List the remotes:

[desktop]$ git remote -v
usb_backup	/Volumes/my_usb_disk/nobel_prize.git (fetch)
usb_backup	/Volumes/my_usb_disk/nobel_prize.git (push)

The list shows that we can both fetch and push to this repository, of which more later.

Git has written the information about the remote URL to the repository config file – .git/config:

[desktop]$ cat .git/config
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "usb_backup"]
	url = /Volumes/my_usb_disk/nobel_prize.git
	fetch = +refs/heads/*:refs/remotes/usb_backup/*

git push – push all data for a local branch to the remote

We now want to synchronize the data in our nobel_prize repository with the remote usb_backup. The command to do this is git push.

Before we do the push, there are no objects in the .git/objects directory of the usb_backup backup repository:

objects
├── pack
└── info

Then we push:

[desktop]$ git push usb_backup main
To /Volumes/my_usb_disk/nobel_prize.git
 * [new branch]      main -> main

This command tells git to take all the information necessary to reconstruct the history of the main branch, and send it to the remote repository. Sure enough, we now have the new files in .git/objects of the backup repository:

objects
├── pack
├── info
├── ff
│   └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│   └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│   └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│   └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
    └── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]

You’ll see that the ‘main’ branch in the backup repository now points to the same commit as the main branch in the local repository:

[desktop]$ cat .git/refs/heads/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
[desktop]$ cat /Volumes/my_usb_disk/nobel_prize.git/refs/heads/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8

The local repository has a copy of the last known position of the main branch in the remote repository.

[desktop]$ cat .git/refs/remotes/usb_backup/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8

You can see the last known positions of the remote branches using the -r flag to git branch:

[desktop]$ git branch -r  -v
  usb_backup/main 75206bc First backup of my amazing idea

To see both local and remote branches, use the -a flag:

[desktop]$ git branch -a  -v
* main                    75206bc First backup of my amazing idea
  remotes/usb_backup/main 75206bc First backup of my amazing idea

git push – synchronizing repositories

git push is an excellent way to do backups, because it only transfers the information that the remote repository does not have.

Let’s see that in action.

First we make a new commit in the local repository. Let’s add the first draft of the Nobel prize paper. As before, you can download this from nobel_prize.md. If you are typing along, download nobel_prize.md to the nobel_prize directory.

[desktop]$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	nobel_prize.md

nothing added to commit but untracked files present (use "git add" to track)

We stage the file and make the commit:

[desktop]$ git add nobel_prize.md
[desktop]$ git commit -m "Add first draft of paper"
[main 7919d37] Add first draft of paper
 1 file changed, 29 insertions(+)
 create mode 100644 nobel_prize.md

Git updated the local main branch, but the remote does not know about this update yet:

[desktop]$ git branch -a -v
* main                    7919d37 Add first draft of paper
  remotes/usb_backup/main 75206bc First backup of my amazing idea

We already know there will be three new objects in .git/objects after this commit. These are:

  • a new blob (file) object for nobel_prize.md;

  • a new tree (directory listing) object associating the hash for the contents of nobel_prize.md with the nobel_prize.md filename;

  • the new commit object.

Usually we don’t need to worry about which objects these are, but here we will track the new objects down to show how git push works. You could probably work out how to find these objects starting with git log to get the commit hash (like this 2), but here I’m going to take a short-cut and use the obscure git rev-parse command to get the hashes of the objects we need:

[desktop]$ # The hash of the current commit on the "main" branch
[desktop]$ git rev-parse main
7919d37dda9044f00cf2dc0677eed18156f75404
[desktop]$ # The hash of the directory listing
[desktop]$ git rev-parse main:./
2ce031cc4219d31835831f064a6a2d4fb0497c53
[desktop]$ # The hash of the nobel_prize.md file
[desktop]$ git rev-parse main:nobel_prize.md
3ef5df2f711c919685f2063f24d0a18bab17760a

Remember that git uses the first two digits of the hash as the directory name in .git/objects, so the filenames for these objects will be:

.git/objects/79/19d37dda9044f00cf2dc0677eed18156f75404
.git/objects/2c/e031cc4219d31835831f064a6a2d4fb0497c53
.git/objects/3e/f5df2f711c919685f2063f24d0a18bab17760a

We do have these objects in the local repository:

objects
├── pack
├── info
├── ff
│   └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│   └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 79
│   └── 19d37dda9044f00cf2dc0677eed18156f75404 [171B]
├── 75
│   └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│   └── 60135a5943c0509608fee6d900b775e3041197 [335B]
├── 3e
│   └── f5df2f711c919685f2063f24d0a18bab17760a [415B]
├── 2c
│   └── e031cc4219d31835831f064a6a2d4fb0497c53 [188B]
└── 1e
    └── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]

– but we don’t have these objects in the remote repository yet (we haven’t done a push):

objects
├── pack
├── info
├── ff
│   └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│   └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│   └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│   └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
    └── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]

Now we do a push:

[desktop]$ git push usb_backup main
To /Volumes/my_usb_disk/nobel_prize.git
   75206bc..7919d37  main -> main

The branches are synchronized again:

[desktop]$ git branch -a -v
* main                    7919d37 Add first draft of paper
  remotes/usb_backup/main 7919d37 Add first draft of paper

After the push, we do have the new objects in the remote repository:

objects
├── pack
├── info
├── ff
│   └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│   └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 79
│   └── 19d37dda9044f00cf2dc0677eed18156f75404 [171B]
├── 75
│   └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│   └── 60135a5943c0509608fee6d900b775e3041197 [335B]
├── 3e
│   └── f5df2f711c919685f2063f24d0a18bab17760a [415B]
├── 2c
│   └── e031cc4219d31835831f064a6a2d4fb0497c53 [188B]
└── 1e
    └── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]

You might also be able to see how git would work out what to transfer. See An algorithm for git push for how it could work in general, and for this case.

git clone – make a fresh new copy of the repository

Imagine we have so far been working on our trusty work desktop.

We unplug the external hard drive, put it in our trusty bag, and take the trusty bus back to our trusty house.

Now we want to start work on the paper.

We plug the hard drive into the laptop, it gets mounted again at /Volumes/my_usb_disk.

This time we want a repository with a working tree.

The command we want is git clone:

[laptop]$ git clone /Volumes/my_usb_disk/nobel_prize.git
Cloning into 'nobel_prize'...
done.

Note

You’ll see that the shell prompt has changed from [desktop]$ to [laptop]$. I used these prompts to make it more obvious which machine we are working on.

We have a full backup of the repository, including all the history:

[laptop]$ cd nobel_prize
[laptop]$ git log --oneline --graph
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea

git made a remote automatically for us, because it recorded where we cloned from. The default name for a git remote is origin:

[laptop]$ git remote -v
origin	/Volumes/my_usb_disk/nobel_prize.git (fetch)
origin	/Volumes/my_usb_disk/nobel_prize.git (push)

The clone command generated a fresh copy of the repository, so the remote and the local copy are synchronized:

[laptop]$ git branch -a -v
* main                7919d37 Add first draft of paper
  remotes/origin/HEAD -> origin/main
  remotes/origin/main 7919d37 Add first draft of paper

Now we could make some edits:

[laptop]$ git diff
diff --git a/nobel_prize.md b/nobel_prize.md
index 3ef5df2..cdbb9fb 100644
--- a/nobel_prize.md
+++ b/nobel_prize.md
@@ -27,3 +27,4 @@ brain thinks in straight lines.
 
 That is my theory, it is mine and belongs to me, and I own it and what it is,
 too.
+The brain is a really big network.

Then we do an add and commit:

[laptop]$ git add nobel_prize.md
[laptop]$ git commit -m "More great ideas after some wine"
[main 159f3b0] More great ideas after some wine
 1 file changed, 1 insertion(+)

The local copy is now ahead of the remote:

[laptop]$ git branch -a -v
* main                159f3b0 [ahead 1] More great ideas after some wine
  remotes/origin/HEAD -> origin/main
  remotes/origin/main 7919d37 Add first draft of paper

At the end of the night’s work, we push back to the remote on the USB disk:

[laptop]$ git push origin main
To /Volumes/my_usb_disk/nobel_prize.git
   7919d37..159f3b0  main -> main

The local and remote are synchronized again:

[laptop]$ git branch -a -v
* main                159f3b0 More great ideas after some wine
  remotes/origin/HEAD -> origin/main
  remotes/origin/main 159f3b0 More great ideas after some wine

git fetch – get all data from a remote

git fetch fetches data from a remote repository into a local one.

Now we are back at the work desktop. We don’t have the great ideas from last night in the local repository. Here is the latest commit in the work desktop repository:

[desktop]$ git log -1
commit 7919d37dda9044f00cf2dc0677eed18156f75404
Author: Matthew Brett <matthew.brett@gmail.com>
Date:   Mon Apr 2 18:03:00 2012 +0100

    Add first draft of paper

Here are the branch positions in the work desktop repository:

[desktop]$ git branch -a -v
* main                    7919d37 Add first draft of paper
  remotes/usb_backup/main 7919d37 Add first draft of paper

As you can see, the last known positions of the remote branches have not changed from last night. This reminds us that the last known positions only get refreshed when we do an explicit git command to communicate with the remote copy. Git stores the “last known positions” in refs/remotes. For example, if the remote name is usb_backup and the branch is main, then the last known position (commit hash) is the contents of the file refs/remotes/usb_backup/main:

[desktop]$ cat .git/refs/remotes/usb_backup/main
7919d37dda9044f00cf2dc0677eed18156f75404

The commands that update the last known positions are:

  • git clone (a whole new copy, copying the remote branch positions with it);

  • git push (copies data and branch positions to the remote repository, and updates last known positions in the local repository);

  • git fetch (this section) (copies data and last known positions from remote repository into the local repository);

  • git pull (this is nothing but a git fetch followed by a git merge).

Now we have plugged in the USB drive, we can fetch the data and last known positions from the remote:

[desktop]$ git fetch usb_backup
From /Volumes/my_usb_disk/nobel_prize
   7919d37..159f3b0  main       -> usb_backup/main

The last known positions are now the same as those on the remote repository:

[desktop]$ git branch -a -v
* main                    7919d37 Add first draft of paper
  remotes/usb_backup/main 159f3b0 More great ideas after some wine

We can set our local main branch to be the same as the remote main branch by doing a merge:

[desktop]$ git merge usb_backup/main
Updating 7919d37..159f3b0
Fast-forward
 nobel_prize.md | 1 +
 1 file changed, 1 insertion(+)

This does a merge between usb_backup/main and local main. In this case, the “merge” is very straightforward, because there have been no new changes in local main since the new edits we have in the remote. Therefore the “merge” only involves setting local main to point to the same commit as usb_backup/main. This is called a “fast-forward” merge, because it only involves advancing the branch pointer, rather than fusing two lines of development with a merge commit:

[desktop]$ git log --oneline --graph
* 159f3b0 More great ideas after some wine
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea

git pull – git fetch followed by git merge

git pull is a shortcut for git fetch followed by git merge.

For example, instead of doing git fetch usb_backup and git merge usb_backup/main above, we could have done git pull usb_backup main. If we do that now, there is nothing to do, because we have already done the fetch and the merge:

[desktop]$ git pull usb_backup main
Already up to date.
From /Volumes/my_usb_disk/nobel_prize
 * branch            main       -> FETCH_HEAD

When you first start using git, I strongly recommend you always use an explicit git fetch followed by git merge instead of git pull. It is easy to run into problems using git pull that are made more confusing by the fusion of the “fetch” and “merge” step. For example, it is not uncommon that you have done more work on a local copy, before you do an innocent git pull from a repository with different new work on the same file. You may well get merge conflicts, which can be rather surprising and confusing, even for experienced users. If you do git fetch followed by git merge, the steps are clearer so the merge conflict is less confusing and it is more obvious what to do.

Linking local and remote branches

It can get a bit boring typing all of:

git push usb_backup main

and, if you are using git pull:

git pull usb_backup main

It may well be that we nearly always want to git push the main branch to usb_backup main.

We can set this up using the --set-upstream flag to git push.

[desktop]$ git push usb_backup main --set-upstream
Branch 'main' set up to track remote branch 'main' from 'usb_backup'.
Everything up-to-date

Git then records this association in the .git/config file of the repository:

[desktop]$ cat .git/config
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "usb_backup"]
	url = /Volumes/my_usb_disk/nobel_prize.git
	fetch = +refs/heads/*:refs/remotes/usb_backup/*
[branch "main"]
	remote = usb_backup
	merge = refs/heads/main

We add some edits:

[desktop]$ git diff
diff --git a/nobel_prize.md b/nobel_prize.md
index cdbb9fb..cffcbd2 100644
--- a/nobel_prize.md
+++ b/nobel_prize.md
@@ -28,3 +28,4 @@ brain thinks in straight lines.
 That is my theory, it is mine and belongs to me, and I own it and what it is,
 too.
 The brain is a really big network.
+Is the network comment too obvious?
[desktop]$ git add nobel_prize.md
[desktop]$ git commit -m "Rethinking the drinking again"
[main ede0f86] Rethinking the drinking again
 1 file changed, 1 insertion(+)

Now instead of git push usb_backup main we can just do git push.

Before we try this, we need to set a default configuration variable to avoid a confusing warning. See git config --help for more detail:

[desktop]$ git config push.default simple
[desktop]$ git push
To /Volumes/my_usb_disk/nobel_prize.git
   159f3b0..ede0f86  main -> main

Notice that git didn’t need to ask where to “push” to.

Remember that git pull usb_backup main is the same as git fetch usb_main followed by git merge usb_backup/main. Now we have set up the association of this branch with usb_backup/main, a simple git pull will automatically fetch usb_backup and merge from usb_backup/main.

Remotes in the interwebs

So far we’ve only used remotes on the file system of the laptop and desktop.

Remotes can also refer to storage on – remote – machines, using communication protocols such as the “git” protocol, ssh, http or https.

For example, here is the remote list for the repository containing this tutorial:

$ git remote -v
origin	git@github.com:matthew-brett/curious-git.git (fetch)
origin	git@github.com:matthew-brett/curious-git.git (push)

Check out bitbucket and github for free hosting of your repositories. Both services offer free hosting of data that anyone can read (public repositories). Bitbucket offers free hosting of private repositories, and Github will host some private repositories for education users.

Footnotes

1

The reason we need a bare repository for our backup goes deeper than the fact we do not need a working tree. We are soon going to do a push to this backup repository. The push has the effect of resetting the position of a branch (usually main) in the backup repo. Git is very reluctant to set a branch position in a remote repository with a working tree, because the new branch position will not not match the existing content of the working tree. Git could either leave the remote working tree out of sync with the new branch position, or update the remote working tree by doing a checkout of the new branch position, but either thing would be very confusing for someone trying to use the working tree in that repository. So, by default git will refuse to push a new branch position to a remote repository with a working tree, giving you a long explanation as to why it is refusing, and listing things you can do about it. You can force git to go ahead and do the push, but it is much safer to use a bare repository.

2

Getting the object hash values starting with git log. Run git log to show the commit history. This will give you the hash for the current commit. Use git cat-file -p with the commit hash, to show the commit message file. This will give you the hash for the directory listing, in the line beginning tree. Use git cat-file -p again with the tree hash to show the directory listing. This will give you the hash for the nobel_prize.md file.