Git Crash Course, Part 1
- 22 October 2020
- 3336 words
- 17 mins
I have few frustrations with the way my computer science degree is taught, but my biggest by far has been the conspicuous lack of practical skills such as Git.
I understand this is partly by design, because it's a computer science degree and not a software engineering degree, but only 1 or 2 months in, you start being assigned group projects that pretty much require version control. I've also heard this complaint about computer science degrees before, so with that in mind, here's a crash course.
Goals & Assumptions
My goal with this crash course is to get you to the point where you can contribute to a pair or group project effectively. I'm not aiming for you to understand the inner workings of Git, but it's important that I explain enough for you to build a basic mental model.
I'm going to assume that you understand how to navigate in the terminal (using commands such as
At some point, I'll write a tutorial on this too and link to it here, but for now you'll have to fend for yourself (sorry).
If you're on Windows, I'm going to use Unix terminology and assume that you can translate as needed; sorry about that, but I'm not well-versed enough with the Windows developer ecosystem to do that. As far as I understand, if you're using Git Bash or WSL, you don't need to worry about this.
I'm very open to patches or emails that correct this paragraph if needed.
Version control has two main purposes:
- Keeping track of changes that are made to files.
- Integrating together changes made by multiple people.
Git is the version control software that's most commonly used by software developers, and with good reason; when Git came around, it was able to replace a lot of other tools that are now widely regarded as having been much harder to use and more error-prone.
Lots of other software that you might have used has version control features, to more or less of an extent (e.g. Google Docs and Microsoft Word). Google Docs is a particularly visible example; multiple people can edit the same document at the same time and all of their changes are kept, and it does have a version history.
At this point, you might be thinking that another piece of software to do this is unnecessary, or an inconvenience. You might be thinking that you've been getting by just fine until now. In that case, here are just a few of the reasons why I think learning Git is worth the effort:
It provides a version history. If something breaks, you can search back through that history easily and find out exactly what change broke it.
It can merge changes made by multiple people. Once you've learned how to use basic Git, there's almost no extra effort required to learn how to merge changes, which, in most cases, can be done without requiring you to do any manual editing.
It stores authorship information. If you were taught to add comments to the top of files with your name in them, guess what? You won't have to do that anymore. Git can tell you who authored every single line of every file.
It's an industry standard. Most software engineering jobs will require you to know how to use it, or will train you on it immediately, because it's considered essential.
Personally, if I applied for a job and they didn't consider Git essential and require me to know how to use it, I would consider that a red flag.
With that out of the way, let's start by installing
git. Once you've done that, make
sure you configure your authorship information:
$ git config --global user.name "Your Name" $ git config --global user.email firstname.lastname@example.org
You might also want to configure which editor Git should use. The default (if you haven't set up a
default editor with
$EDITOR) is probably
vim, which could be frustrating if you don't know how
to use it. For example, you could set the editor to Visual Studio Code:
$ git config --global core.editor "code --wait"
You can find more information about first-time setup here, and information about setting your text editor here.
Git works by designating a particular folder as a "repository", which means that it will track changes to any files in that folder or its children.
I'll put commands throughout this article that you should run to follow along. To start, create a folder, move inside, and designate it as a Git repository:
$ mkdir git-crash-course $ cd git-crash-course $ git init . Initialized empty Git repository in /Users/soren/src/localhost/git-crash-course/.git/
git has created a folder called
git-crash-course/. This is where it will store
information about the history of changes you've made to the contents of this repository.
In order to understand how Git tracks changes, we need to actually make some changes first. Create a file at the root of the repository with some text in it:
$ echo "Hello, world!" > foo.txt $ ls -a ./ ../ .git/ foo.txt
When you make changes to a file in a repository and then run
git, it compares the state of the
files with their state at a specific point in the past. You can ask about what changes it sees:
$ git status On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) foo.txt nothing added to commit but untracked files present (use "git add" to track)
There's a lot of information here, but what we care about is the section starting with "Untracked files". Git is telling us that it sees a file (foo.txt) that hasn't yet had any changes recorded. Because it hasn't seen this file before, it's labelled as untracked. To start tracking it, we need to create a commit.
A commit is simply a bundle of changes, packaged up together and given a message to describe what those changes represent. This is the basic unit of organisation in Git's model of changes.
Let's create a commit for the change we made. The first thing to do is designate which changes to include in the commit:
$ git add foo.txt $ git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: foo.txt
Now, the changes we made to
foo.txt have been designated as "to be committed", more commonly
referred to as "staged". Only changes that are staged will be included when we create a commit.
To be precise, the change that's been staged is the creation of the file
foo.txtwith its current contents.
Now, let's create a commit to package up those changes and describe them:
$ git commit -m "Add foo.txt" [master (root-commit) ffbbc5a] Add foo.txt 1 file changed, 1 insertion(+) create mode 100644 foo.txt
We've used the
-m flag to provide a message describing the commit. Convention dictates that
commits are phrased in the imperative tense. In other words, say "add foo.txt", not "added
foo.txt" or "adds foo.txt".
This makes more sense if you think about the history of commits in a repository as describing the steps required to go from an empty folder to one containing all of the files in their current state. From that perspective, each commit is an instruction to Git to perform one of those steps in the change history: so we tell Git to add foo.txt, rather than telling it that we added foo.txt. If you're interested in reading more about this convention, take a look at this blog post from 2008 by Tim Pope.
If we run
git status again, we see that now all the changes have been committed, we have what's
called a "clean working tree": there aren't any changes in any files that haven't been committed.
$ git status On branch master nothing to commit, working tree clean
Let's make a couple more commits. First, add a second file:
$ echo "See foo.txt for a message!" > bar.txt $ git add bar.txt $ git commit -m "Add some instructions" [master 18e31b4] Add some instructions 1 file changed, 1 insertion(+) create mode 100644 bar.txt
Now, make some changes to an existing file, and notice the different output of
git status after
you make the change:
$ echo "Here's another line." >> foo.txt $ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: foo.txt no changes added to commit (use "git add" and/or "git commit -a")
You'll notice this time that the changes are marked as "not staged for commit", rather than having
files listed as "untracked". That's because Git is comparing the current state of our files to the
last commit, and since
foo.txt already existed in the last commit, it recognises that all we're
doing is changing some lines inside the file.
If you want to see exactly what those changes are, run
git diff, and you'll see something like
diff --git a/foo.txt b/foo.txt index af5626b..bdfc7b8 100644 --- a/foo.txt +++ b/foo.txt @@ -1 +1,2 @@ Hello, world! +Here's another line.
Now stage and commit this.
$ git add . $ git commit -m "Update foo.txt" [master ed1fa47] Update foo.txt 1 file changed, 1 insertion(+)
You can pass
git adda path instead of individual filenames to add all changes in the folder at that path. In this case, I've used
., which is the current directory, to just add all changes in the repository.
Now that we have a few commits, we can look at the history of this repository:
$ git log commit ed1fa470d5d6707161d88672cd6230f2faabac92 (HEAD -> master) Author: Søren Mortensen <email@example.com> Date: Tue Oct 20 14:05:21 2020 +0100 Update foo.txt commit 18e31b4ced4f7f9ec055748b7a45f562000d2217 Author: Søren Mortensen <firstname.lastname@example.org> Date: Tue Oct 20 14:03:23 2020 +0100 Add some instructions commit ffbbc5a3d42476a706895c974a4144406410bbfa Author: Søren Mortensen <email@example.com> Date: Tue Oct 20 14:02:18 2020 +0100 Add foo.txt
Each commit is given a SHA-1 hash, computed from its contents, that uniquely identifies it.
Almost all the time, though, the first 7 characters are more than enough to identify a specific commit within the context of one repository, which is why both
gitand its users often just use that portion of the hash to refer to commits.
Unfortunately, this isn't a particularly helpful representation. Try something more like this:
+---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+
The reason I've chosen to illustrate the commits this way is because Git represents a repository's commit history internally as a linked list: each commit is a node that points to its parent (the commit before it).
In fact, Git goes one step farther, with the concept of a branch. But once you know that the commit history is stored as a linked list, branches are much easier to understand: a branch is just a particular portion of that linked list, stored as a pointer to the most recent child node.
master | v +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+
Git calls the default branch
master, and it's treated as the authoritative version of the
codebase. But when new code is being written, it's important to be able to make changes and not have
to worry about messing up the nice, clean version on
master, so Git allows us to create other
masteris still the default name Git uses for this branch, increasingly often
mainis being used instead (in fact, there's nothing stopping you from designating any branch the main branch of your repo). This article by Scott Hanselman covers both the process of changing from
mainand the context behind this shift in terminology.
If we run the following, we create a new branch, pointing to exactly the same commit (called its head commit, in exactly the same sense as the head of a linked list):
$ git checkout -b new-feature Switched to a new branch 'new-feature'
This is a little bit of a shortcut way of doing this; that command both creates and switches to the new branch, which is why the base command is
git checkout(the command used for switching branches). If you wanted to do those steps separately, you could run
git branch new-featureto create it, and then
git checkout new-featureto switch to it.
Now our commit history looks like this:
master | v +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+ ^ | new-feature
We can also add into the mix the concept of
HEAD, which refers to whatever branch the repository
is currently on. This is what determines the state of the files on disk; if you switch branches, Git
actually changes the contents of the files to reflect that.
Because the command we just ran both created and switched to the new branch, our repository is in this state:
master | v +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 | +---------+ +---------+ +---------+ ^ | new-feature <--- HEAD
Now, let's add some commits to this new branch! Create a new file or modify an existing one and commit it, as we did above.
$ echo "New feature 1" > new-feature.txt $ git add . $ git commit -m "Add a new feature" [new-feature 1b8de27] Add a new feature 1 file changed, 1 insertion(+) create mode 100644 new-feature.txt
The repository will now look something more like this:
master | v +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 | +---------+ +---------+ +---------+ +---------+ ^ | new-feature <--- HEAD
If you switch back to
master, you'll notice that your changes from
new-feature aren't there.
$ git checkout master Switched to branch 'master' $ cat new-feature.txt cat: new-feature.txt: No such file or directory
Of course, at some point, we will want to integrate those changes back in! First, though, let's add
some more changes on another branch. Switch to
master, then create a new branch called
new-feature-2 and commit something to it.
$ git checkout master Already on 'master' $ git checkout -b new-feature-2 Switched to a new branch 'new-feature-2' $ echo "New feature 2" > new-feature-2.txt $ git add . $ git commit -m "Add another new feature" [new-feature-2 8d5f335] Add another new feature 1 file changed, 1 insertion(+) create mode 100644 new-feature-2.txt
The repository will now be in this state:
master new-feature | | v v +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 | +---------+ +---------+ +---------+ +---------+ ^ | +---------+ +---------| 8d5f335 | +---------+ ^ | new-feature-2 <--- HEAD
Now we're ready to integrate the changes from
new-feature back into
master. Switch to the branch
we're going to merge into (in this case
master), and run
git merge new-feature to merge
new-feature into it.
$ git checkout master Switched to branch 'master' $ git merge new-feature Updating ed1fa47..1b8de27 Fast-forward new-feature.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 new-feature.txt
Git will perform what's called a "fast-forward merge", which means that the only action needed to
merge the changes from
master was to move the
master pointer forward through
the commit history so it points at the same commit as
new-feature, master <--- HEAD | v +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 | +---------+ +---------+ +---------+ +---------+ ^ | +---------+ +---------| 8d5f335 | +---------+ ^ | new-feature-2
We can now delete the
new-feature branch, since we're finished with it:
$ git branch --delete new-feature Deleted branch new-feature (was 1b8de27).
Now let's merge
$ git merge new-feature-2
At this point, a text editor will open. If you told Git which editor you'd like it to use
earlier, then it should have opened a text file in that editor. This is the same
behaviour that you'll see if you run
git commit without the
-m flag and a message; it is asking
you to enter a commit message for a commit it is creating.
Because Git is responsible for the creation of this new commit, it has helpfully autogenerated a message for you, like this:
Merge branch 'new-feature-2' into master # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
That message should be fine, so all you need to do is save and close the file.
If you didn't set an editor, the editor that opened is probably
vi and its descendants are
a whole other can of worms, so because this is a crash course, I'll tell you only what you need to
know: type a colon (
wq into the line at the bottom of the screen that opens, and press
enter. This will save the file and quit.
If you pressed any other keys before trying this, you might have to press the escape key a few times before
After that, Git will spit out a message that looks something like this:
Merge made by the 'recursive' strategy. new-feature-2.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 new-feature-2.txt
So why did that merge play out in such a different way than the last one? Well, take a look at the state of the repository after the merge:
HEAD ---> master | v +---------+ +---------+ +---------+ +---------+ +---------+ | ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |<--+---| 7d90c55 | +---------+ +---------+ +---------+ +---------+ | +---------+ ^ | | +---------+ | +---------| 8d5f335 |<--+ +---------+ ^ | new-feature-2
Because we merged in a branch whose history had previously diverged from that of the current branch, Git has been forced to create a new commit, called a "merge commit", to integrate together the two sets of changes. The merge commit, unlike the other commits we've seen so far, has two parents (the heads of the two diverged branches we merged).
There are ways to avoid merge commits (look into
git rebase), but you should become more familiar with Git before attempting to mess around with them, and the reasons for doing this are mostly aesthetic.
Now that we're finished with
new-feature-2, delete it too.
$ git branch --delete new-feature-2 Deleted branch new-feature-2 (was 8d5f335).
When you don't have someone like me around to draw ASCII art of your branches all day, it's nice to
have a way of visualising your commit history that reflects this structure a bit better. Feel free
to run the following
git config command:
$ git config --global alias.hist "log --graph --pretty=format:'%C(magenta)%h%Creset - %G?%C(red)%d%Creset %s %C(dim green)(%cr) %C(cyan)<%an>%Creset' --abbrev-commit"
This will allow you to run
git hist in place of
git log. Commit histories will then include a
visual representation of branches:
* 7d90c55 - G (HEAD -> master) Merge branch 'new-feature-2' (6 minutes ago) <Søren Mortensen> |\ | * 8d5f335 - G Add another new feature (23 minutes ago) <Søren Mortensen> * | 1b8de27 - G Add a new feature (27 minutes ago) <Søren Mortensen> |/ * ed1fa47 - G Update foo.txt (2 days ago) <Søren Mortensen> * 18e31b4 - G Add some instructions (2 days ago) <Søren Mortensen> * ffbbc5a - G Add foo.txt (2 days ago) <Søren Mortensen>
Remember what I said at the beginning about the two main purposes of version control?
- Keeping track of changes that are made to files.
- Integrating together changes made by multiple people.
Well, congratulations! If you've gotten this far, you've learned how to use Git to do the first of those two things. Give yourself a pat on the back.
I'm going to cover the second point in part 2 of this post, coming soon.