The GIT version control system

The GIT version control system - a quick overview

In 2011 I began working with git as part of my effort to use ruby on rails 3.0 and was pleasantly surprised.

Getting set up

First you need to get "git". On my fedora linux system yum install git does the job and gets git installed.

Then, on each machine on which you intend to use git (which could just be one), you should set up your identity and some preferences doing something like this:

git config --global user.name "Abe Lincoln"
git config --global user.email abe@whitehouse.gov
git config --global core.editor "vi -f"
git config --global alias.co checkout

The last step (making "co" an alias for "checkout") is optional, but it hints at the fact that you can set up a swarm of aliases for common git commands if you want to (br for branch, ci for commit, st for status, ...)

Starting a project from existing files

To place an existing project under Git, you first do this:

cd project
git init

This sets up an empty local repository. Before you put your project into it, you probably want to set up a .gitignore file which has explicit filenames and patterns for files to be excluded from git control.

To place the project under git, you do:

git add .
git commit -m "initial"
git status
git log

The last two commands are optional "status" commands.

At this point your project is in your local repository, but has not yet been pushed to a central repository. You might never even want to push to a central repository, but most likely you do.

This is a key point with git as compared to svn. With svn there is no local repository, only a central repository. The fact that git first works with a local repository and only pushes to a remote repository when you tell it to is what makes git a distributed version control system, and this is very much a good thing.

You can push the changes in your local git repository to a remote server (which you of course must first set up) via:

git remote add origin  URL
git push origin master

There are a variety of ways to specify the URL of the remote repository, and that is the subject of the next section.

How to specify a remote repository

Note that you add a definition for "origin" to git, then you push to origin. This is easier than typing some weird and hard to type string for a URL each time you do a push.

There are lots of ways to specify the URL of your remote repository. The following are the sort that I always use:

git remote add origin /somedir/my_project.git
git remote add origin user@myhost.com:/somedir/my_project.git

The first specifies a path on the local machine. It is what I use when the remote or "master" repository is on the same machine I am running git on or (more commonly) is network mounted from some NAS raid box somewhere.

The second is what I use when I want to use ssh to contact a repository on machine that really is remote. Note that there is no explicit mention of ssh.

The business of having a directory with a name ending in ".git" is a convention, but a handy one that makes it easy to recognize a git repository (as distinguished from a working copy).

Starting a project from a git repository

These days, lots of folks are putting their projects on a site called github and to get a copy of what they are doing, you just do a "git clone". You can then use git to keep this copy up to date with whatever they are doing.

Also, if you have a project of you own and have pushed it to a remote repository, the way to get a working copy on another machine is to use "git clone".

This is the equivalent of SVN checkout for you old SVN hacks out there.

git clone git://github.com/wally/goose.git

This would create a local directory called goose containing a local repository and a working copy of the cloned project.

Day to day use

Once you have a project set up as described above, you typically find yourself editing one or more files and getting to a place where you feel satisfied with what you have done. Either that or you have done so much work you feel nervous about not putting it under version control. The following will put the changes into the local repository AND push them to the remote repository. You typically only do both when you reach a significant milestone and everything is tested and working.

git commit -a -m "fiddled with stuff"
git push origin master

Note that the "-m message" is optional. The commit with the -a option is a shorthand for git add followed by git commit, but I don't usually do it, and I let git prompt me for the message also.

I usually do the following:

git add .
git status
git commit
git push origin master

Doing the status before the commit lets you see what the add has staged for commit and prevents you from commiting a bunch of binary files or something you will just have to remove immediately.

When you add new files, you must use git add to tell git to put them under control.

Another scenario is that you know there are changes in the remote repository that you need (i.e. your working copy is out of date). To bring those changes into your working copy, you do this:

git pull

Or this:

git fetch origin
git merge origin/master

There are fine points here that I do not fully understand. I read an essay somewhere that said mean and nasty things about using git pull and said you should always fetch and merge. For most of what I do, this is just needless hassle. Doing the fetch and merge might be the smart thing if you had a branch in your working copy and changes that did not pertain to your branch or some more complex scenario. In any event it doesn't just automatically merge in the changes and would give you more control in complex situations.

Staging area and index

You may not actually need to know this, but here goes anyway. Git considers each file to be in one of the following states:

Modified - you have changed the file, but not told Git to do anything with it.
Staged - you have marked a modified file so it will be included in the next commit.
Commited - the changes are now safely stored in the local repository.
Untracked - this file is being ignored by git

Git uses a single file (called either the index or staging area) to keep track of which files will go into the next commit. Some people get worked up about the index in one way or another.

Branches with GIT

Take a look at this article:

A git branching model

Branches are where GIT puts subversion (SVN) to shame! SVN branches are booby traps. GIT branches are the suede tamale.

git checkout -b branch_for_test

This checks out the current repository as a new branch and it makes this the active branch.

git branch

The git branch command is just a status command that shows you all branches and which branch is current.

Merging changed from a branch back to the master:

git checkout master
git merge branch_for_test

The checkout command just switches to the master branch, then the merge command pulls in the changes.
When you are done with a branch, you can delete it via:

git branch -d branch_for_test

Working with git

To get information about what is staged and what is not, use:

git status
git diff

It is pretty common to do a bunch of edits and then want to commit all changes. To avoid having to "git add" each file and then do a "git commit" you can do:

git commit -a -m "fixed that nasty bug"

To get rid of a file (to make git stop wanting to track it, and to actually delete the working copy too). Nothing actually happens till you commit though.

git rm file

To rename a file (git will notice if you just do a rename and you can work it out after the fact by hand), you should do:

git mv old new

To find out a history of what has gone on, use "git log", which has thousands of options.

To pull a specific file out of the repository, use:

git checkout file

Remote repositories

Rather than forcing you to deal with long clumsy URL's for these, git sets up shortnames for your remote repositories. The command "git remote" will tell you about these. The shortname "origin" references the repository you cloned from (if you cloned). To add one, use something like this:

git remote add xyz git://yada.com/xyz/project.git
git fetch xyz

The fetch pulls all data that you do not yet have in your local repository. The command "git fetch origin" is what you use to keep a cloned project up to date. It only updates the local repository though, not your working files, you need to merge to do that. Git pull will try to fetch and merge both. See this article:

Git: fetch and merge, don't pull

When you have everything in your local repository the way you like it, you use git push to distribute it to the central repository. If somebody else has pushed changes while you have been working, your push is rejected - you have to pull their changes, merge them into what you are doing, and then you can push.

git push origin master

The above command pushes your master branch.

A GIT remote server

First set up an empty project repository on a server:

ssh server
cd /shared/git_repos
mkdir project.git
cd project.git ( or cd $! )
git --bare init
or maybe: git init --bare --shared

Then you can push into this empty repository:

Once this is done, how do you access this as a remote repository? You have many choices. The easy and lazy way, if you have an NFS mount shared by a team, is to use the git local protocol. This boils down to simply using the path to the repository directory:

git remote rm origin
git remote add origin /shared/git_repos/project.git
git push origin master

If you are going to access this remotely through ssh, you would want to change one line in the above to:

git remote add origin myname@myhost/shared/git_repos/project.git

Notice in the above that there is no explicit mention of ssh.

Another way to do things is to clone your project and then copy the clone to your server:

cd my_project
cd ..
git clone --bare my_project my_project.git
scp -r my_project.git gituser@server.com:/shared/git_repos
git clone gituser@server.com:/shared/git_repos/my_project.git

Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org