git remotes - working with other people, making backups¶
This page follows on from Curious git.
It covers git remotes. Remotes are links to other git repositories.
Now you are keeping the history of your data with git, you also want to make sure you have a backup in case your current computer dies.
You might want to work with a colleague on the same project. Perhaps your colleague Anne is working on the same files, and you want to merge her changes into yours.
We use git “remotes” to solve both of these problems. Commands for working with remotes are:
git remote
– for adding and editing remotes;git clone
– make a new copy of a repository, and make a remote that points to the original repository;git fetch
– update stored information about a remote repository;git push
– upload information from this repository to a remote repository;git pull
– a command combininggit fetch
andgit merge
. The command first fetches information from the remote repository, then merges the current state of a remote branch with a local branch.
Keeping backups with remotes¶
Let’s say you have an external backup disk and you want to record all the history of your work on the backup disk.
To do this you need three steps:
Make an empty backup repository on the external backup disk;
Point your current git repository at the backup repository with
git remote add
;Send the changes to the backup repository with
git push
.
Start with a git repository¶
To get started, we make a new repository with the same Nobel-prize-winning
paper we saw in Curious git. To type along, download and unzip
nobel_prize
. You should have a
nobel_prize
directory:
nobel_prize
├── clever_analysis.py [618B]
├── expensive_data.csv [244K]
└── fancy_figure.png [183K]
Now make a new git repository:
[desktop]$ git init
Initialized empty Git repository in /Volumes/zorg/mb312/dev_trees/curious-git/working/nobel_prize/.git/
Add all the files and make the first commit:
[desktop]$ git add clever_analysis.py
[desktop]$ git add fancy_figure.png
[desktop]$ git add expensive_data.csv
[desktop]$ git commit -m "First backup of my amazing idea"
[main (root-commit) 75206bc] First backup of my amazing idea
3 files changed, 5023 insertions(+)
create mode 100755 clever_analysis.py
create mode 100644 expensive_data.csv
create mode 100644 fancy_figure.png
As we expected from our curious understanding, there are
5 objects in the .git/objects
directory, one for each of the three files
we git add
ed, one for the directory listing, and one for the commit file:
objects
├── pack
├── info
├── ff
│ └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│ └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│ └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│ └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
└── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]
Make the empty backup repository¶
Let’s say your external disk is mounted at /Volumes/my_usb_disk
.
We make a new empty repository:
[desktop]$ git init --bare /Volumes/my_usb_disk/nobel_prize.git
Initialized empty Git repository in /Volumes/my_usb_disk/nobel_prize.git/
Notice the --bare
flag. This tells git to make a repository that does not
have a working tree. The bare repository only has the stuff that we
are used to seeing in the .git
directory of a standard git repository:
nobel_prize.git
├── refs
│ ├── tags
│ └── heads
├── objects
│ ├── pack
│ └── info
├── info
│ └── exclude [240B]
├── hooks
│ (13 files)
├── HEAD [21B]
├── config [111B]
└── description [73B]
We do not want a working tree in our case, because we will not ever want to
edit the files in the /Volumes/my_usb_disk
backup repository, we will only be
editing files in our local nobel_prize
directory, committing those changes
locally (as we have done above), and then “pushing” these changes to the
backup repository 1.
Tell the current git repository about the backup repository¶
Check we’re in our local git repository:
[desktop]$ pwd
/Volumes/zorg/mb312/dev_trees/curious-git/working/nobel_prize
Add a remote. A remote is a link to another repository.
[desktop]$ git remote add usb_backup /Volumes/my_usb_disk/nobel_prize.git
List the remotes:
[desktop]$ git remote -v
usb_backup /Volumes/my_usb_disk/nobel_prize.git (fetch)
usb_backup /Volumes/my_usb_disk/nobel_prize.git (push)
The list shows that we can both fetch
and push
to this repository, of
which more later.
Git has written the information about the remote URL to the repository config
file – .git/config
:
[desktop]$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "usb_backup"]
url = /Volumes/my_usb_disk/nobel_prize.git
fetch = +refs/heads/*:refs/remotes/usb_backup/*
git push – push all data for a local branch to the remote¶
We now want to synchronize the data in our nobel_prize
repository with the
remote usb_backup
. The command to do this is git push
.
Before we do the push, there are no objects in the .git/objects
directory
of the usb_backup
backup repository:
objects
├── pack
└── info
Then we push:
[desktop]$ git push usb_backup main
To /Volumes/my_usb_disk/nobel_prize.git
* [new branch] main -> main
This command tells git to take all the information necessary to reconstruct
the history of the main
branch, and send it to the remote repository.
Sure enough, we now have the new files in .git/objects
of the backup
repository:
objects
├── pack
├── info
├── ff
│ └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│ └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│ └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│ └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
└── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]
You’ll see that the ‘main’ branch in the backup repository now points to the same commit as the main branch in the local repository:
[desktop]$ cat .git/refs/heads/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
[desktop]$ cat /Volumes/my_usb_disk/nobel_prize.git/refs/heads/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
The local repository has a copy of the last known position of the main branch in the remote repository.
[desktop]$ cat .git/refs/remotes/usb_backup/main
75206bcb33ff9ad4f15f89b52cdf95bf666d67a8
You can see the last known positions of the remote branches using the -r
flag to git branch
:
[desktop]$ git branch -r -v
usb_backup/main 75206bc First backup of my amazing idea
To see both local and remote branches, use the -a
flag:
[desktop]$ git branch -a -v
* main 75206bc First backup of my amazing idea
remotes/usb_backup/main 75206bc First backup of my amazing idea
git push – synchronizing repositories¶
git push
is an excellent way to do backups, because it only transfers the
information that the remote repository does not have.
Let’s see that in action.
First we make a new commit in the local repository. Let’s add the
first draft of the Nobel prize paper. As before, you can download this from
nobel_prize.md
. If you are
typing along, download nobel_prize.md
to the nobel_prize
directory.
[desktop]$ git status
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
nobel_prize.md
nothing added to commit but untracked files present (use "git add" to track)
We stage the file and make the commit:
[desktop]$ git add nobel_prize.md
[desktop]$ git commit -m "Add first draft of paper"
[main 7919d37] Add first draft of paper
1 file changed, 29 insertions(+)
create mode 100644 nobel_prize.md
Git updated the local main
branch, but the remote does not know about
this update yet:
[desktop]$ git branch -a -v
* main 7919d37 Add first draft of paper
remotes/usb_backup/main 75206bc First backup of my amazing idea
We already know there will be three new objects in .git/objects
after this
commit. These are:
a new blob (file) object for
nobel_prize.md
;a new tree (directory listing) object associating the hash for the contents of
nobel_prize.md
with thenobel_prize.md
filename;the new commit object.
Usually we don’t need to worry about which objects these are, but here we will
track the new objects down to show how git push
works. You could probably
work out how to find these objects starting with git log
to get the commit
hash (like this 2), but here I’m going to take a short-cut and use
the obscure git rev-parse
command to get the hashes of the objects we
need:
[desktop]$ # The hash of the current commit on the "main" branch
[desktop]$ git rev-parse main
7919d37dda9044f00cf2dc0677eed18156f75404
[desktop]$ # The hash of the directory listing
[desktop]$ git rev-parse main:./
2ce031cc4219d31835831f064a6a2d4fb0497c53
[desktop]$ # The hash of the nobel_prize.md file
[desktop]$ git rev-parse main:nobel_prize.md
3ef5df2f711c919685f2063f24d0a18bab17760a
Remember that git uses the first two digits of the hash as the directory name
in .git/objects
, so the filenames for these objects will be:
.git/objects/79/19d37dda9044f00cf2dc0677eed18156f75404
.git/objects/2c/e031cc4219d31835831f064a6a2d4fb0497c53
.git/objects/3e/f5df2f711c919685f2063f24d0a18bab17760a
We do have these objects in the local repository:
objects
├── pack
├── info
├── ff
│ └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│ └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 79
│ └── 19d37dda9044f00cf2dc0677eed18156f75404 [171B]
├── 75
│ └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│ └── 60135a5943c0509608fee6d900b775e3041197 [335B]
├── 3e
│ └── f5df2f711c919685f2063f24d0a18bab17760a [415B]
├── 2c
│ └── e031cc4219d31835831f064a6a2d4fb0497c53 [188B]
└── 1e
└── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]
– but we don’t have these objects in the remote repository yet (we haven’t
done a push
):
objects
├── pack
├── info
├── ff
│ └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│ └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 75
│ └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│ └── 60135a5943c0509608fee6d900b775e3041197 [335B]
└── 1e
└── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]
Now we do a push:
[desktop]$ git push usb_backup main
To /Volumes/my_usb_disk/nobel_prize.git
75206bc..7919d37 main -> main
The branches are synchronized again:
[desktop]$ git branch -a -v
* main 7919d37 Add first draft of paper
remotes/usb_backup/main 7919d37 Add first draft of paper
After the push, we do have the new objects in the remote repository:
objects
├── pack
├── info
├── ff
│ └── c871b48a6b9df8dc4a13e8e5da99ccf2ce458d [150B]
├── 7b
│ └── 37886351b3df2463fd29c87bc5184b637f0926 [119K]
├── 79
│ └── 19d37dda9044f00cf2dc0677eed18156f75404 [171B]
├── 75
│ └── 206bcb33ff9ad4f15f89b52cdf95bf666d67a8 [148B]
├── 65
│ └── 60135a5943c0509608fee6d900b775e3041197 [335B]
├── 3e
│ └── f5df2f711c919685f2063f24d0a18bab17760a [415B]
├── 2c
│ └── e031cc4219d31835831f064a6a2d4fb0497c53 [188B]
└── 1e
└── d447c15c125991b8a292bdb433aaf19998a3e9 [179K]
You might also be able to see how git would work out what to transfer. See An algorithm for git push for how it could work in general, and for this case.
git clone – make a fresh new copy of the repository¶
Imagine we have so far been working on our trusty work desktop.
We unplug the external hard drive, put it in our trusty bag, and take the trusty bus back to our trusty house.
Now we want to start work on the paper.
We plug the hard drive into the laptop, it gets mounted again at
/Volumes/my_usb_disk
.
This time we want a repository with a working tree.
The command we want is git clone
:
[laptop]$ git clone /Volumes/my_usb_disk/nobel_prize.git
Cloning into 'nobel_prize'...
done.
Note
You’ll see that the shell prompt has changed from [desktop]$
to
[laptop]$
. I used these prompts to make it more obvious which machine
we are working on.
We have a full backup of the repository, including all the history:
[laptop]$ cd nobel_prize
[laptop]$ git log --oneline --graph
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea
git made a remote
automatically for us, because it recorded where we
cloned from. The default name for a git remote is origin
:
[laptop]$ git remote -v
origin /Volumes/my_usb_disk/nobel_prize.git (fetch)
origin /Volumes/my_usb_disk/nobel_prize.git (push)
The clone command generated a fresh copy of the repository, so the remote and the local copy are synchronized:
[laptop]$ git branch -a -v
* main 7919d37 Add first draft of paper
remotes/origin/HEAD -> origin/main
remotes/origin/main 7919d37 Add first draft of paper
Now we could make some edits:
[laptop]$ git diff
diff --git a/nobel_prize.md b/nobel_prize.md
index 3ef5df2..cdbb9fb 100644
--- a/nobel_prize.md
+++ b/nobel_prize.md
@@ -27,3 +27,4 @@ brain thinks in straight lines.
That is my theory, it is mine and belongs to me, and I own it and what it is,
too.
+The brain is a really big network.
Then we do an add and commit:
[laptop]$ git add nobel_prize.md
[laptop]$ git commit -m "More great ideas after some wine"
[main 159f3b0] More great ideas after some wine
1 file changed, 1 insertion(+)
The local copy is now ahead of the remote:
[laptop]$ git branch -a -v
* main 159f3b0 [ahead 1] More great ideas after some wine
remotes/origin/HEAD -> origin/main
remotes/origin/main 7919d37 Add first draft of paper
At the end of the night’s work, we push back to the remote on the USB disk:
[laptop]$ git push origin main
To /Volumes/my_usb_disk/nobel_prize.git
7919d37..159f3b0 main -> main
The local and remote are synchronized again:
[laptop]$ git branch -a -v
* main 159f3b0 More great ideas after some wine
remotes/origin/HEAD -> origin/main
remotes/origin/main 159f3b0 More great ideas after some wine
git fetch – get all data from a remote¶
git fetch
fetches data from a remote repository into a local one.
Now we are back at the work desktop. We don’t have the great ideas from last night in the local repository. Here is the latest commit in the work desktop repository:
[desktop]$ git log -1
commit 7919d37dda9044f00cf2dc0677eed18156f75404
Author: Matthew Brett <matthew.brett@gmail.com>
Date: Mon Apr 2 18:03:00 2012 +0100
Add first draft of paper
Here are the branch positions in the work desktop repository:
[desktop]$ git branch -a -v
* main 7919d37 Add first draft of paper
remotes/usb_backup/main 7919d37 Add first draft of paper
As you can see, the last known positions of the remote branches have not
changed from last night. This reminds us that the last known positions only
get refreshed when we do an explicit git command to communicate with the
remote copy. Git stores the “last known positions” in refs/remotes
. For
example, if the remote name is usb_backup
and the branch is main
,
then the last known position (commit hash) is the contents of the file
refs/remotes/usb_backup/main
:
[desktop]$ cat .git/refs/remotes/usb_backup/main
7919d37dda9044f00cf2dc0677eed18156f75404
The commands that update the last known positions are:
git clone
(a whole new copy, copying the remote branch positions with it);git push
(copies data and branch positions to the remote repository, and updates last known positions in the local repository);git fetch
(this section) (copies data and last known positions from remote repository into the local repository);git pull
(this is nothing but agit fetch
followed by agit merge
).
Now we have plugged in the USB drive, we can fetch the data and last known positions from the remote:
[desktop]$ git fetch usb_backup
From /Volumes/my_usb_disk/nobel_prize
7919d37..159f3b0 main -> usb_backup/main
The last known positions are now the same as those on the remote repository:
[desktop]$ git branch -a -v
* main 7919d37 Add first draft of paper
remotes/usb_backup/main 159f3b0 More great ideas after some wine
We can set our local main branch to be the same as the remote main branch by doing a merge:
[desktop]$ git merge usb_backup/main
Updating 7919d37..159f3b0
Fast-forward
nobel_prize.md | 1 +
1 file changed, 1 insertion(+)
This does a merge between usb_backup/main
and local main
. In this
case, the “merge” is very straightforward, because there have been no new
changes in local main
since the new edits we have in the remote.
Therefore the “merge” only involves setting local main
to point to the
same commit as usb_backup/main
. This is called a “fast-forward” merge,
because it only involves advancing the branch pointer, rather than fusing two
lines of development with a merge commit:
[desktop]$ git log --oneline --graph
* 159f3b0 More great ideas after some wine
* 7919d37 Add first draft of paper
* 75206bc First backup of my amazing idea
git pull – git fetch followed by git merge¶
git pull
is a shortcut for git fetch
followed by git merge
.
For example, instead of doing git fetch usb_backup
and git merge
usb_backup/main
above, we could have done git pull usb_backup main
.
If we do that now, there is nothing to do, because we have already done the
fetch and the merge:
[desktop]$ git pull usb_backup main
Already up to date.
From /Volumes/my_usb_disk/nobel_prize
* branch main -> FETCH_HEAD
When you first start using git, I strongly recommend you always use an
explicit git fetch
followed by git merge
instead of git pull
. It
is easy to run into problems using git pull
that are made more confusing
by the fusion of the “fetch” and “merge” step. For example, it is not
uncommon that you have done more work on a local copy, before you do an
innocent git pull
from a repository with different new work on the same
file. You may well get merge conflicts, which can be rather surprising and
confusing, even for experienced users. If you do git fetch
followed by
git merge
, the steps are clearer so the merge conflict is less confusing
and it is more obvious what to do.
Linking local and remote branches¶
It can get a bit boring typing all of:
git push usb_backup main
and, if you are using git pull
:
git pull usb_backup main
It may well be that we nearly always want to git push
the main
branch to usb_backup main
.
We can set this up using the --set-upstream
flag to git push
.
[desktop]$ git push usb_backup main --set-upstream
Branch 'main' set up to track remote branch 'main' from 'usb_backup'.
Everything up-to-date
Git then records this association in the .git/config
file of the
repository:
[desktop]$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "usb_backup"]
url = /Volumes/my_usb_disk/nobel_prize.git
fetch = +refs/heads/*:refs/remotes/usb_backup/*
[branch "main"]
remote = usb_backup
merge = refs/heads/main
We add some edits:
[desktop]$ git diff
diff --git a/nobel_prize.md b/nobel_prize.md
index cdbb9fb..cffcbd2 100644
--- a/nobel_prize.md
+++ b/nobel_prize.md
@@ -28,3 +28,4 @@ brain thinks in straight lines.
That is my theory, it is mine and belongs to me, and I own it and what it is,
too.
The brain is a really big network.
+Is the network comment too obvious?
[desktop]$ git add nobel_prize.md
[desktop]$ git commit -m "Rethinking the drinking again"
[main ede0f86] Rethinking the drinking again
1 file changed, 1 insertion(+)
Now instead of git push usb_backup main
we can just do git push
.
Before we try this, we need to set a default configuration variable to avoid a
confusing warning. See git config --help
for more detail:
[desktop]$ git config push.default simple
[desktop]$ git push
To /Volumes/my_usb_disk/nobel_prize.git
159f3b0..ede0f86 main -> main
Notice that git didn’t need to ask where to “push” to.
Remember that git pull usb_backup main
is the same as git fetch
usb_main
followed by git merge usb_backup/main
. Now we have set up
the association of this branch with usb_backup/main
, a simple git
pull
will automatically fetch usb_backup
and merge from
usb_backup/main
.
Remotes in the interwebs¶
So far we’ve only used remotes on the file system of the laptop and desktop.
Remotes can also refer to storage on – remote – machines, using communication protocols such as the “git” protocol, ssh, http or https.
For example, here is the remote list for the repository containing this tutorial:
$ git remote -v
origin git@github.com:matthew-brett/curious-git.git (fetch)
origin git@github.com:matthew-brett/curious-git.git (push)
Check out bitbucket and github for free hosting of your repositories. Both services offer free hosting of data that anyone can read (public repositories). Bitbucket offers free hosting of private repositories, and Github will host some private repositories for education users.
Footnotes
- 1
The reason we need a bare repository for our backup goes deeper than the fact we do not need a working tree. We are soon going to do a
push
to this backup repository. Thepush
has the effect of resetting the position of a branch (usuallymain
) in the backup repo. Git is very reluctant to set a branch position in a remote repository with a working tree, because the new branch position will not not match the existing content of the working tree. Git could either leave the remote working tree out of sync with the new branch position, or update the remote working tree by doing a checkout of the new branch position, but either thing would be very confusing for someone trying to use the working tree in that repository. So, by default git will refuse topush
a new branch position to a remote repository with a working tree, giving you a long explanation as to why it is refusing, and listing things you can do about it. You can force git to go ahead and do the push, but it is much safer to use a bare repository.- 2
Getting the object hash values starting with
git log
. Rungit log
to show the commit history. This will give you the hash for the current commit. Usegit cat-file -p
with the commit hash, to show the commit message file. This will give you the hash for the directory listing, in the line beginningtree
. Usegit cat-file -p
again with the tree hash to show the directory listing. This will give you the hash for thenobel_prize.md
file.