Git Crash Course, Part 1
- 22 October 2020
- 3336 words
- 17 mins
I have few frustrations with the way my computer science degree is taught, but my biggest by far has been the conspicuous lack of practical skills such as Git.
I understand this is partly by design, because it's a computer science degree and not a software engineering degree, but only 1 or 2 months in, you start being assigned group projects that pretty much require version control. I've also heard this complaint about computer science degrees before, so with that in mind, here's a crash course.
Goals & Assumptions
My goal with this crash course is to get you to the point where you can contribute to a pair or group project effectively. I'm not aiming for you to understand the inner workings of Git, but it's important that I explain enough for you to build a basic mental model.
-
I'm going to assume that you understand how to navigate in the terminal (using commands such as
cd
andls
).At some point, I'll write a tutorial on this too and link to it here, but for now you'll have to fend for yourself (sorry).
-
If you're on Windows, I'm going to use Unix terminology and assume that you can translate as needed; sorry about that, but I'm not well-versed enough with the Windows developer ecosystem to do that. As far as I understand, if you're using Git Bash or WSL, you don't need to worry about this.
I'm very open to patches or emails that correct this paragraph if needed.
Version Control
What
Version control has two main purposes:
- Keeping track of changes that are made to files.
- Integrating together changes made by multiple people.
Git is the version control software that's most commonly used by software developers, and with good reason; when Git came around, it was able to replace a lot of other tools that are now widely regarded as having been much harder to use and more error-prone.
Lots of other software that you might have used has version control features, to more or less of an extent (e.g. Google Docs and Microsoft Word). Google Docs is a particularly visible example; multiple people can edit the same document at the same time and all of their changes are kept, and it does have a version history.
Why
At this point, you might be thinking that another piece of software to do this is unnecessary, or an inconvenience. You might be thinking that you've been getting by just fine until now. In that case, here are just a few of the reasons why I think learning Git is worth the effort:
-
It provides a version history. If something breaks, you can search back through that history easily and find out exactly what change broke it.
-
It can merge changes made by multiple people. Once you've learned how to use basic Git, there's almost no extra effort required to learn how to merge changes, which, in most cases, can be done without requiring you to do any manual editing.
-
It stores authorship information. If you were taught to add comments to the top of files with your name in them, guess what? You won't have to do that anymore. Git can tell you who authored every single line of every file.
-
It's an industry standard. Most software engineering jobs will require you to know how to use it, or will train you on it immediately, because it's considered essential.
Personally, if I applied for a job and they didn't consider Git essential and require me to know how to use it, I would consider that a red flag.
Getting Started
With that out of the way, let's start by installing
git
. Once you've done that, make
sure you configure your authorship information:
$ git config --global user.name "Your Name"
$ git config --global user.email youremail@example.com
You might also want to configure which editor Git should use. The default (if you haven't set up a
default editor with $EDITOR
) is probably vim
, which could be frustrating if you don't know how
to use it. For example, you could set the editor to Visual Studio Code:
$ git config --global core.editor "code --wait"
You can find more information about first-time setup here, and information about setting your text editor here.
Repositories
Git works by designating a particular folder as a "repository", which means that it will track changes to any files in that folder or its children.
I'll put commands throughout this article that you should run to follow along. To start, create a folder, move inside, and designate it as a Git repository:
$ mkdir git-crash-course
$ cd git-crash-course
$ git init .
Initialized empty Git repository in /Users/soren/src/localhost/git-crash-course/.git/
git
has created a folder called .git/
inside git-crash-course/
. This is where it will store
information about the history of changes you've made to the contents of this repository.
Changes
In order to understand how Git tracks changes, we need to actually make some changes first. Create a file at the root of the repository with some text in it:
$ echo "Hello, world!" > foo.txt
$ ls -a
./ ../ .git/ foo.txt
When you make changes to a file in a repository and then run git
, it compares the state of the
files with their state at a specific point in the past. You can ask about what changes it sees:
$ git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
foo.txt
nothing added to commit but untracked files present (use "git add" to track)
There's a lot of information here, but what we care about is the section starting with "Untracked files". Git is telling us that it sees a file (foo.txt) that hasn't yet had any changes recorded. Because it hasn't seen this file before, it's labelled as untracked. To start tracking it, we need to create a commit.
Commits
A commit is simply a bundle of changes, packaged up together and given a message to describe what those changes represent. This is the basic unit of organisation in Git's model of changes.
Let's create a commit for the change we made. The first thing to do is designate which changes to include in the commit:
$ git add foo.txt
$ git status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: foo.txt
Now, the changes we made to foo.txt
have been designated as "to be committed", more commonly
referred to as "staged". Only changes that are staged will be included when we create a commit.
To be precise, the change that's been staged is the creation of the file
foo.txt
with its current contents.
Now, let's create a commit to package up those changes and describe them:
$ git commit -m "Add foo.txt"
[master (root-commit) ffbbc5a] Add foo.txt
1 file changed, 1 insertion(+)
create mode 100644 foo.txt
We've used the -m
flag to provide a message describing the commit. Convention dictates that
commits are phrased in the imperative tense. In other words, say "add foo.txt", not "added
foo.txt" or "adds foo.txt".
This makes more sense if you think about the history of commits in a repository as describing the steps required to go from an empty folder to one containing all of the files in their current state. From that perspective, each commit is an instruction to Git to perform one of those steps in the change history: so we tell Git to add foo.txt, rather than telling it that we added foo.txt. If you're interested in reading more about this convention, take a look at this blog post from 2008 by Tim Pope.
If we run git status
again, we see that now all the changes have been committed, we have what's
called a "clean working tree": there aren't any changes in any files that haven't been committed.
$ git status
On branch master
nothing to commit, working tree clean
Let's make a couple more commits. First, add a second file:
$ echo "See foo.txt for a message!" > bar.txt
$ git add bar.txt
$ git commit -m "Add some instructions"
[master 18e31b4] Add some instructions
1 file changed, 1 insertion(+)
create mode 100644 bar.txt
Now, make some changes to an existing file, and notice the different output of git status
after
you make the change:
$ echo "Here's another line." >> foo.txt
$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: foo.txt
no changes added to commit (use "git add" and/or "git commit -a")
You'll notice this time that the changes are marked as "not staged for commit", rather than having
files listed as "untracked". That's because Git is comparing the current state of our files to the
last commit, and since foo.txt
already existed in the last commit, it recognises that all we're
doing is changing some lines inside the file.
If you want to see exactly what those changes are, run git diff
, and you'll see something like
this:
diff --git a/foo.txt b/foo.txt
index af5626b..bdfc7b8 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1,2 @@
Hello, world!
+Here's another line.
Now stage and commit this.
$ git add .
$ git commit -m "Update foo.txt"
[master ed1fa47] Update foo.txt
1 file changed, 1 insertion(+)
You can pass
git add
a path instead of individual filenames to add all changes in the folder at that path. In this case, I've used.
, which is the current directory, to just add all changes in the repository.
Branches
Now that we have a few commits, we can look at the history of this repository:
$ git log
commit ed1fa470d5d6707161d88672cd6230f2faabac92 (HEAD -> master)
Author: Søren Mortensen <soren@neros.dev>
Date: Tue Oct 20 14:05:21 2020 +0100
Update foo.txt
commit 18e31b4ced4f7f9ec055748b7a45f562000d2217
Author: Søren Mortensen <soren@neros.dev>
Date: Tue Oct 20 14:03:23 2020 +0100
Add some instructions
commit ffbbc5a3d42476a706895c974a4144406410bbfa
Author: Søren Mortensen <soren@neros.dev>
Date: Tue Oct 20 14:02:18 2020 +0100
Add foo.txt
Each commit is given a SHA-1 hash, computed from its contents, that uniquely identifies it.
Almost all the time, though, the first 7 characters are more than enough to identify a specific commit within the context of one repository, which is why both
git
and its users often just use that portion of the hash to refer to commits.
Unfortunately, this isn't a particularly helpful representation. Try something more like this:
+---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+ +---------+ +---------+
The reason I've chosen to illustrate the commits this way is because Git represents a repository's commit history internally as a linked list: each commit is a node that points to its parent (the commit before it).
In fact, Git goes one step farther, with the concept of a branch. But once you know that the commit history is stored as a linked list, branches are much easier to understand: a branch is just a particular portion of that linked list, stored as a pointer to the most recent child node.
master
|
v
+---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+ +---------+ +---------+
Git calls the default branch master
, and it's treated as the authoritative version of the
codebase. But when new code is being written, it's important to be able to make changes and not have
to worry about messing up the nice, clean version on master
, so Git allows us to create other
branches.
Although
master
is still the default name Git uses for this branch, increasingly oftenmain
is being used instead (in fact, there's nothing stopping you from designating any branch the main branch of your repo). This article by Scott Hanselman covers both the process of changing frommaster
tomain
and the context behind this shift in terminology.
If we run the following, we create a new branch, pointing to exactly the same commit (called its head commit, in exactly the same sense as the head of a linked list):
$ git checkout -b new-feature
Switched to a new branch 'new-feature'
This is a little bit of a shortcut way of doing this; that command both creates and switches to the new branch, which is why the base command is
git checkout
(the command used for switching branches). If you wanted to do those steps separately, you could rungit branch new-feature
to create it, and thengit checkout new-feature
to switch to it.
Now our commit history looks like this:
master
|
v
+---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+ +---------+ +---------+
^
|
new-feature
We can also add into the mix the concept of HEAD
, which refers to whatever branch the repository
is currently on. This is what determines the state of the files on disk; if you switch branches, Git
actually changes the contents of the files to reflect that.
Because the command we just ran both created and switched to the new branch, our repository is in this state:
master
|
v
+---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |
+---------+ +---------+ +---------+
^
|
new-feature <--- HEAD
Now, let's add some commits to this new branch! Create a new file or modify an existing one and commit it, as we did above.
$ echo "New feature 1" > new-feature.txt
$ git add .
$ git commit -m "Add a new feature"
[new-feature 1b8de27] Add a new feature
1 file changed, 1 insertion(+)
create mode 100644 new-feature.txt
The repository will now look something more like this:
master
|
v
+---------+ +---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+ +---------+ +---------+ +---------+
^
|
new-feature <--- HEAD
If you switch back to master
, you'll notice that your changes from new-feature
aren't there.
$ git checkout master
Switched to branch 'master'
$ cat new-feature.txt
cat: new-feature.txt: No such file or directory
Merging
Of course, at some point, we will want to integrate those changes back in! First, though, let's add
some more changes on another branch. Switch to master
, then create a new branch called
new-feature-2
and commit something to it.
$ git checkout master
Already on 'master'
$ git checkout -b new-feature-2
Switched to a new branch 'new-feature-2'
$ echo "New feature 2" > new-feature-2.txt
$ git add .
$ git commit -m "Add another new feature"
[new-feature-2 8d5f335] Add another new feature
1 file changed, 1 insertion(+)
create mode 100644 new-feature-2.txt
The repository will now be in this state:
master new-feature
| |
v v
+---------+ +---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+ +---------+ +---------+ +---------+
^
| +---------+
+---------| 8d5f335 |
+---------+
^
|
new-feature-2 <--- HEAD
Now we're ready to integrate the changes from new-feature
back into master
. Switch to the branch
we're going to merge into (in this case master
), and run git merge new-feature
to merge
new-feature
into it.
$ git checkout master
Switched to branch 'master'
$ git merge new-feature
Updating ed1fa47..1b8de27
Fast-forward
new-feature.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 new-feature.txt
Git will perform what's called a "fast-forward merge", which means that the only action needed to
merge the changes from new-feature
into master
was to move the master
pointer forward through
the commit history so it points at the same commit as new-feature
.
new-feature, master <--- HEAD
|
v
+---------+ +---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |
+---------+ +---------+ +---------+ +---------+
^
| +---------+
+---------| 8d5f335 |
+---------+
^
|
new-feature-2
We can now delete the new-feature
branch, since we're finished with it:
$ git branch --delete new-feature
Deleted branch new-feature (was 1b8de27).
Now let's merge new-feature-2
into master
.
$ git merge new-feature-2
At this point, a text editor will open. If you told Git which editor you'd like it to use
earlier, then it should have opened a text file in that editor. This is the same
behaviour that you'll see if you run git commit
without the -m
flag and a message; it is asking
you to enter a commit message for a commit it is creating.
Because Git is responsible for the creation of this new commit, it has helpfully autogenerated a message for you, like this:
Merge branch 'new-feature-2' into master
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
That message should be fine, so all you need to do is save and close the file.
If you didn't set an editor, the editor that opened is probably vim
. vi
and its descendants are
a whole other can of worms, so because this is a crash course, I'll tell you only what you need to
know: type a colon (:
), enter wq
into the line at the bottom of the screen that opens, and press
enter. This will save the file and quit.
If you pressed any other keys before trying this, you might have to press the escape key a few times before
:wq
.
After that, Git will spit out a message that looks something like this:
Merge made by the 'recursive' strategy.
new-feature-2.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 new-feature-2.txt
So why did that merge play out in such a different way than the last one? Well, take a look at the state of the repository after the merge:
HEAD ---> master
|
v
+---------+ +---------+ +---------+ +---------+ +---------+
| ffbbc5a |<---| 18e31b4 |<---| ed1fa47 |<---| 1b8de27 |<--+---| 7d90c55 |
+---------+ +---------+ +---------+ +---------+ | +---------+
^ |
| +---------+ |
+---------| 8d5f335 |<--+
+---------+
^
|
new-feature-2
Because we merged in a branch whose history had previously diverged from that of the current branch, Git has been forced to create a new commit, called a "merge commit", to integrate together the two sets of changes. The merge commit, unlike the other commits we've seen so far, has two parents (the heads of the two diverged branches we merged).
There are ways to avoid merge commits (look into
git rebase
), but you should become more familiar with Git before attempting to mess around with them, and the reasons for doing this are mostly aesthetic.
Now that we're finished with new-feature-2
, delete it too.
$ git branch --delete new-feature-2
Deleted branch new-feature-2 (was 8d5f335).
Prettier Logging
When you don't have someone like me around to draw ASCII art of your branches all day, it's nice to
have a way of visualising your commit history that reflects this structure a bit better. Feel free
to run the following git config
command:
$ git config --global alias.hist "log --graph --pretty=format:'%C(magenta)%h%Creset - %G?%C(red)%d%Creset %s %C(dim green)(%cr) %C(cyan)<%an>%Creset' --abbrev-commit"
This will allow you to run git hist
in place of git log
. Commit histories will then include a
visual representation of branches:
* 7d90c55 - G (HEAD -> master) Merge branch 'new-feature-2' (6 minutes ago) <Søren Mortensen>
|\
| * 8d5f335 - G Add another new feature (23 minutes ago) <Søren Mortensen>
* | 1b8de27 - G Add a new feature (27 minutes ago) <Søren Mortensen>
|/
* ed1fa47 - G Update foo.txt (2 days ago) <Søren Mortensen>
* 18e31b4 - G Add some instructions (2 days ago) <Søren Mortensen>
* ffbbc5a - G Add foo.txt (2 days ago) <Søren Mortensen>
Conclusion
Remember what I said at the beginning about the two main purposes of version control?
- Keeping track of changes that are made to files.
- Integrating together changes made by multiple people.
Well, congratulations! If you've gotten this far, you've learned how to use Git to do the first of those two things. Give yourself a pat on the back.
I'm going to cover the second point in part 2 of this post, coming soon.