Søren Mortensen

Git Crash Course, Part 1

I have few frustrations with the way my computer science degree is taught, but my biggest by far has been the conspicuous lack of practical skills such as Git.

I understand this is partly by design, because it's a computer science degree and not a software engineering degree, but only 1 or 2 months in, you start being assigned group projects that pretty much require version control. I've also heard this complaint about computer science degrees before, so with that in mind, here's a crash course.

Goals & Assumptions

My goal with this crash course is to get you to the point where you can contribute to a pair or group project effectively. I'm not aiming for you to understand the inner workings of Git, but it's important that I explain enough for you to build a basic mental model.

  1. I'm going to assume that you understand how to navigate in the terminal (using commands such as cd and ls).

    At some point, I'll write a tutorial on this too and link to it here, but for now you'll have to fend for yourself (sorry).

  2. If you're on Windows, I'm going to use Unix terminology and assume that you can translate as needed; sorry about that, but I'm not well-versed enough with the Windows developer ecosystem to do that. As far as I understand, if you're using Git Bash or WSL, you don't need to worry about this.

    I'm very open to patches or emails that correct this paragraph if needed.

Version Control

What

Version control has two main purposes:

Git is the version control software that's most commonly used by software developers, and with good reason; when Git came around, it was able to replace a lot of other tools that are now widely regarded as having been much harder to use and more error-prone.

Lots of other software that you might have used has version control features, to more or less of an extent (e.g. Google Docs and Microsoft Word). Google Docs is a particularly visible example; multiple people can edit the same document at the same time and all of their changes are kept, and it does have a version history.

Why

At this point, you might be thinking that another piece of software to do this is unnecessary, or an inconvenience. You might be thinking that you've been getting by just fine until now. In that case, here are just a few of the reasons why I think learning Git is worth the effort:

Getting Started

With that out of the way, let's start by installing git. Once you've done that, make sure you configure your authorship information:


$ git config --global user.name "Your Name"
$ git config --global user.email youremail@example.com

You might also want to configure which editor Git should use. The default (if you haven't set up a default editor with $EDITOR) is probably vim, which could be frustrating if you don't know how to use it. For example, you could set the editor to Visual Studio Code:


$ git config --global core.editor "code --wait"

You can find more information about first-time setup here, and information about setting your text editor here.

Repositories

Git works by designating a particular folder as a "repository", which means that it will track changes to any files in that folder or its children.

I'll put commands throughout this article that you should run to follow along. To start, create a folder, move inside, and designate it as a Git repository:


$ mkdir git-crash-course
$ cd git-crash-course
$ git init .
Initialized empty Git repository in /Users/soren/src/localhost/git-crash-course/.git/

git has created a folder called .git/ inside git-crash-course/. This is where it will store information about the history of changes you've made to the contents of this repository.

Changes

In order to understand how Git tracks changes, we need to actually make some changes first. Create a file at the root of the repository with some text in it:


$ echo "Hello, world!" > foo.txt
$ ls -a
./       ../      .git/    foo.txt

When you make changes to a file in a repository and then run git, it compares the state of the files with their state at a specific point in the past. You can ask about what changes it sees:


$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        foo.txt

nothing added to commit but untracked files present (use "git add" to track)

There's a lot of information here, but what we care about is the section starting with "Untracked files". Git is telling us that it sees a file (foo.txt) that hasn't yet had any changes recorded. Because it hasn't seen this file before, it's labelled as untracked. To start tracking it, we need to create a commit.

Commits

A commit is simply a bundle of changes, packaged up together and given a message to describe what those changes represent. This is the basic unit of organisation in Git's model of changes.

Let's create a commit for the change we made. The first thing to do is designate which changes to include in the commit:


$ git add foo.txt
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   foo.txt

Now, the changes we made to foo.txt have been designated as "to be committed", more commonly referred to as "staged". Only changes that are staged will be included when we create a commit.

To be precise, the change that's been staged is the creation of the file foo.txt with its current contents.

Now, let's create a commit to package up those changes and describe them:


$ git commit -m "Add foo.txt"
[master (root-commit) ffbbc5a] Add foo.txt
 1 file changed, 1 insertion(+)
 create mode 100644 foo.txt

We've used the -m flag to provide a message describing the commit. Convention dictates that commits are phrased in the imperative tense. In other words, say "add foo.txt", not "added foo.txt" or "adds foo.txt".

This makes more sense if you think about the history of commits in a repository as describing the steps required to go from an empty folder to one containing all of the files in their current state. From that perspective, each commit is an instruction to Git to perform one of those steps in the change history: so we tell Git to add foo.txt, rather than telling it that we added foo.txt. If you're interested in reading more about this convention, take a look at this blog post from 2008 by Tim Pope.

If we run git status again, we see that now all the changes have been committed, we have what's called a "clean working tree": there aren't any changes in any files that haven't been committed.


$ git status
On branch master
nothing to commit, working tree clean

Let's make a couple more commits. First, add a second file:


$ echo "See foo.txt for a message!" > bar.txt
$ git add bar.txt
$ git commit -m "Add some instructions"
[master 18e31b4] Add some instructions
 1 file changed, 1 insertion(+)
 create mode 100644 bar.txt

Now, make some changes to an existing file, and notice the different output of git status after you make the change:


$ echo "Here's another line." >> foo.txt
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   foo.txt

no changes added to commit (use "git add" and/or "git commit -a")

You'll notice this time that the changes are marked as "not staged for commit", rather than having files listed as "untracked". That's because Git is comparing the current state of our files to the last commit, and since foo.txt already existed in the last commit, it recognises that all we're doing is changing some lines inside the file.

If you want to see exactly what those changes are, run git diff, and you'll see something like this:


diff --git a/foo.txt b/foo.txt
index af5626b..bdfc7b8 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1,2 @@
 Hello, world!
+Here's another line.

Now stage and commit this.


$ git add .
$ git commit -m "Update foo.txt"
[master ed1fa47] Update foo.txt
 1 file changed, 1 insertion(+)

You can pass git add a path instead of individual filenames to add all changes in the folder at that path. In this case, I've used ., which is the current directory, to just add all changes in the repository.

Branches

Now that we have a few commits, we can look at the history of this repository:


$ git log
commit ed1fa470d5d6707161d88672cd6230f2faabac92 (HEAD -> master)
Author: Søren Mortensen <soren@neros.dev>
Date:   Tue Oct 20 14:05:21 2020 +0100

    Update foo.txt

commit 18e31b4ced4f7f9ec055748b7a45f562000d2217
Author: Søren Mortensen <soren@neros.dev>
Date:   Tue Oct 20 14:03:23 2020 +0100

    Add some instructions

commit ffbbc5a3d42476a706895c974a4144406410bbfa
Author: Søren Mortensen <soren@neros.dev>
Date:   Tue Oct 20 14:02:18 2020 +0100

    Add foo.txt

Each commit is given a SHA-1 hash, computed from its contents, that uniquely identifies it.

Almost all the time, though, the first 7 characters are more than enough to identify a specific commit within the context of one repository, which is why both git and its users often just use that portion of the hash to refer to commits.

Unfortunately, this isn't a particularly helpful representation. Try something more like this:


+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+

The reason I've chosen to illustrate the commits this way is because Git represents a repository's commit history internally as a linked list: each commit is a node that points to its parent (the commit before it).

In fact, Git goes one step farther, with the concept of a branch. But once you know that the commit history is stored as a linked list, branches are much easier to understand: a branch is just a particular portion of that linked list, stored as a pointer to the most recent child node.


                                 master
                                   |
                                   v
+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+

Git calls the default branch master, and it's treated as the authoritative version of the codebase. But when new code is being written, it's important to be able to make changes and not have to worry about messing up the nice, clean version on master, so Git allows us to create other branches.

Although master is still the default name Git uses for this branch, increasingly often main is being used instead (in fact, there's nothing stopping you from designating any branch the main branch of your repo). This article by Scott Hanselman covers both the process of changing from master to main and the context behind this shift in terminology.

If we run the following, we create a new branch, pointing to exactly the same commit (called its head commit, in exactly the same sense as the head of a linked list):


$ git checkout -b new-feature
Switched to a new branch 'new-feature'

This is a little bit of a shortcut way of doing this; that command both creates and switches to the new branch, which is why the base command is git checkout (the command used for switching branches). If you wanted to do those steps separately, you could run git branch new-feature to create it, and then git checkout new-feature to switch to it.

Now our commit history looks like this:


                                master
                                   |
                                   v
+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+
                                   ^     
                                   |     
                              new-feature

We can also add into the mix the concept of HEAD, which refers to whatever branch the repository is currently on. This is what determines the state of the files on disk; if you switch branches, Git actually changes the contents of the files to reflect that.

Because the command we just ran both created and switched to the new branch, our repository is in this state:


                                master
                                   |
                                   v
+---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+    +---------+    +---------+
                                   ^
                                   |
                              new-feature <--- HEAD

Now, let's add some commits to this new branch! Create a new file or modify an existing one and commit it, as we did above.


$ echo "New feature 1" > new-feature.txt
$ git add .
$ git commit -m "Add a new feature"
[new-feature 1b8de27] Add a new feature
 1 file changed, 1 insertion(+)
 create mode 100644 new-feature.txt

The repository will now look something more like this:


                                master
                                   |
                                   v
+---------+    +---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+    +---------+    +---------+    +---------+
                                                  ^
                                                  |
                                             new-feature <--- HEAD

If you switch back to master, you'll notice that your changes from new-feature aren't there.


$ git checkout master
Switched to branch 'master'
$ cat new-feature.txt
cat: new-feature.txt: No such file or directory

Merging

Of course, at some point, we will want to integrate those changes back in! First, though, let's add some more changes on another branch. Switch to master, then create a new branch called new-feature-2 and commit something to it.


$ git checkout master
Already on 'master'
$ git checkout -b new-feature-2
Switched to a new branch 'new-feature-2'
$ echo "New feature 2" > new-feature-2.txt
$ git add .
$ git commit -m "Add another new feature"
[new-feature-2 8d5f335] Add another new feature
 1 file changed, 1 insertion(+)
 create mode 100644 new-feature-2.txt

The repository will now be in this state:


                                master       new-feature
                                   |              |
                                   v              v
+---------+    +---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+    +---------+    +---------+    +---------+
                                   ^
                                   |         +---------+
                                   +---------| 8d5f335 |
                                             +---------+
                                                  ^
                                                  |
                                            new-feature-2 <--- HEAD

Now we're ready to integrate the changes from new-feature back into master. Switch to the branch we're going to merge into (in this case master), and run git merge new-feature to merge new-feature into it.


$ git checkout master
Switched to branch 'master'
$ git merge new-feature
Updating ed1fa47..1b8de27
Fast-forward
 new-feature.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 new-feature.txt

Git will perform what's called a "fast-forward merge", which means that the only action needed to merge the changes from new-feature into master was to move the master pointer forward through the commit history so it points at the same commit as new-feature.


                                         new-feature, master <--- HEAD
                                                  |
                                                  v
+---------+    +---------+    +---------+    +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+    +---------+    +---------+    +---------+
                                   ^
                                   |         +---------+
                                   +---------| 8d5f335 |
                                             +---------+
                                                  ^
                                                  |
                                            new-feature-2

We can now delete the new-feature branch, since we're finished with it:


$ git branch --delete new-feature
Deleted branch new-feature (was 1b8de27).

Now let's merge new-feature-2 into master.


$ git merge new-feature-2

At this point, a text editor will open. If you told Git which editor you'd like it to use earlier, then it should have opened a text file in that editor. This is the same behaviour that you'll see if you run git commit without the -m flag and a message; it is asking you to enter a commit message for a commit it is creating.

Because Git is responsible for the creation of this new commit, it has helpfully autogenerated a message for you, like this:


Merge branch 'new-feature-2' into master
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.

That message should be fine, so all you need to do is save and close the file.

If you didn't set an editor, the editor that opened is probably vim. vi and its descendants are a whole other can of worms, so because this is a crash course, I'll tell you only what you need to know: type a colon (:), enter wq into the line at the bottom of the screen that opens, and press enter. This will save the file and quit.

If you pressed any other keys before trying this, you might have to press the escape key a few times before :wq.

After that, Git will spit out a message that looks something like this:


Merge made by the 'recursive' strategy.
 new-feature-2.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 new-feature-2.txt

So why did that merge play out in such a different way than the last one? Well, take a look at the state of the repository after the merge:


                                                       HEAD ---> master
                                                                    |
                                                                    v
+---------+    +---------+    +---------+    +---------+       +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |<--+---| 7d90c55 |
+---------+    +---------+    +---------+    +---------+   |   +---------+
                                   ^                       |
                                   |         +---------+   |
                                   +---------| 8d5f335 |<--+
                                             +---------+
                                                  ^
                                                  |
                                            new-feature-2

Because we merged in a branch whose history had previously diverged from that of the current branch, Git has been forced to create a new commit, called a "merge commit", to integrate together the two sets of changes. The merge commit, unlike the other commits we've seen so far, has two parents (the heads of the two diverged branches we merged).

There are ways to avoid merge commits (look into git rebase), but you should become more familiar with Git before attempting to mess around with them, and the reasons for doing this are mostly aesthetic.

Now that we're finished with new-feature-2, delete it too.


$ git branch --delete new-feature-2
Deleted branch new-feature-2 (was 8d5f335).

Prettier Logging

When you don't have someone like me around to draw ASCII art of your branches all day, it's nice to have a way of visualising your commit history that reflects this structure a bit better. Feel free to run the following git config command:


$ git config --global alias.hist "log --graph --pretty=format:'%C(magenta)%h%Creset - %G?%C(red)%d%Creset %s %C(dim green)(%cr) %C(cyan)<%an>%Creset' --abbrev-commit"

This will allow you to run git hist in place of git log. Commit histories will then include a visual representation of branches:


*   7d90c55 - G (HEAD -> master) Merge branch 'new-feature-2' (6 minutes ago) <Søren Mortensen>
|\
| * 8d5f335 - G Add another new feature (23 minutes ago) <Søren Mortensen>
* | 1b8de27 - G Add a new feature (27 minutes ago) <Søren Mortensen>
|/
* ed1fa47 - G Update foo.txt (2 days ago) <Søren Mortensen>
* 18e31b4 - G Add some instructions (2 days ago) <Søren Mortensen>
* ffbbc5a - G Add foo.txt (2 days ago) <Søren Mortensen>

Conclusion

Remember what I said at the beginning about the two main purposes of version control?

Well, congratulations! If you've gotten this far, you've learned how to use Git to do the first of those two things. Give yourself a pat on the back.

I'm going to cover the second point in part 2 of this post, coming soon.