In this article I’ll be discussing version control. Specifically the focus of this article is how version control works. I’ll start with a thirty thousand foot view and finish by focusing on a specific example using Git which is a very popular version control system. You may already know that version control essentially tracks and catalogs the history of one or more files. However, if you haven’t been using version control you may not be entirely familiar with every benefit it offers and certainly not the nuances of every system available.
Thirty Thousand Foot View of Software Version Control
At its core version control is a simple concept involving the process of tracking one or more files as they evolve over the course of one or more products. Specifically version control tracks what is changed, it tracks who changed it, and why. Version control systems provide a sense of accountability not found in traditional file management.
It should be noted that version control is not only useful to programmers. Version control can be used by anyone who “maintains” a directory of files, so if you’re not a programmer you can keep reading! There are many options for version control systems, for the sake of this article I will be discussing Git. Having said this keep in mind that no matter which version control system you choose ultimately all have the same goal, to help you organize and safeguard your source code and files.
Version Control Concepts
It’s important to grasp the basics before you and your team focus on a specific version control system. There are three specifics I will focus on in this article.
- Source code backup
- Historical perspective (versions) and commenting (accountability)
In some version control systems, restoring a single file is possible, in most version control systems an entire directory restoration is possible. Get a glass of milk and your favorite cookies because I’m here to tell you that the days of hesitation about changing source code are gone. In the Drupal community for example there are “sandbox” versions of modules which are really nothing more than a “branched” version of production code.
Minus version control if you’re working on a development team it’s inevitable that you will overwrite “working” production code thus causing loss of revenue and valuable time. Most programmers are paid well and in a commercial environment this is unacceptable.
Git employs a database that enable it to track and report changes. Here’s an example. Let’s assume for a moment I’m a super duper Drupal developer who writes modules for a living. I decide I need to re-write a function in my spectacular module. I commit my source code back to the repository and this committed version is kept in a simple database-like system in Git. I no longer worry about tracking these changes, instead Git does this for me and because comments are required on a commit action I can refer back to my comment because if my module is spectacular why did I need to re-write it. I’m sure you’re familiar with software versions. Drupal for example is released in version numbers. At the time of this article Drupal 7.24 core version is the recommended release. If you are using version control you might refer to the Drupal 7.24 core as either a tagged or a labeled branch. Milestones can be marked by tagging, or labeling.
Before we get down to business, you should be familiar with some key terms. These terms are really defined in a standard way so you can apply this knowledge across most if not all version control systems.
- Repository – A repository is one of the most common terms. The version control system manages a repository. In Git the repository is really a database where files and history are maintained and stored.
- Working Set – The working set consists of files and folders. It is common to see a repository that consists of multiple working sets organized into what are commonly called projects. When you edit a working set you then normally update the repository, the great thing about version control is that when you update your repository you are automatically creating a backup of your source code, no more save-as. The key to remember about a working set is that the files contain “potential” changes not yet in the repository.
- Add & Check In – Just as you might have guessed add is a concept in which you are adding new files or committing and checking in existing files.
- Revert/Rollback – If you need to roll back to an earlier version of a file you will “revert” or “rollback” the working set with any version of the data in the repository.
- Checking Out – If you are checking a file out of the repository, usually the main trunk you are actually bringing it into your local working set. In some cases the action of checking a file out places a lock on the file to avoid duplicate work by other developers.
- Distributed Repository System – Normally if you’re working in a distributed repository system you have the entire repository on your local development box. Git is an example of a distributed version control system. If you need to retrieve data from the repository you would “pull” the data. If you need to add or check-in files to the repository you would “push” the data. It is highly recommended though not required to use SSH for communicating back to the repository if it exists on a remote server.
- Tagging & Labeling – The concept of naming a specific repository state. You might tag or label a version of your Drupal module that you consider “stable release”.
- Branching & Forking – This concept is common and consists of creating a copy of an entire repository. Often times you will hear the term “sandbox code” meaning that a developer has either branched or forked a project repository.
- Merging – If you have branched or forked a repository you inevitably will want to push your source code back into the trunk/main branch of the repository. The process of pushing branched or forked code back into the trunk is referred to ask “merging”.
Centralized vs. Distributed Version Control Systems
In this article I will be talking in more depth about Git which is considered to be a distributed version control system. Note that I will not go into detail about centralized version control systems, centralized version control systems are beyond the scope of this article. In a nutshell centralized version control systems are used in large development houses and two common systems that you might encounter are Microsoft Team Foundation Server (TFS) and Subversion. I recommend you take the time to learn about centralized version control systems, however, to determine if they are right for your environment.
Distributed version control works differently when compared to centralized. Typically, you initialize the repository on your local development machine. On your local development machine you have a working set and you add files and update files in the repository. Since I will be talking about Git you should know there is the option to use a “remote repository” for source code. Very often this remote repository will be GitHub. Without going into a lot of detail GitHub is a cloud service in which you can back up your local repository to the cloud.
Repository Actions with Git
Git is a distributed version control system. Git is an open source project that uses the GNU General Public License version 2 so it is free. You can download Git http://git-scm.com.
Download Git and proceed through the installation. The default options during the installation are adequate in most cases. However, for the purposes of this article I will choose Git Bash only for the shell type since I’m comfortable in Linux command line shells. Also I will choose “Checkout Windows-style and commit Unix-style line endings”. This setting will allow me to use repositories with Macintosh and Linux development environments.
After the installation has finished you can verify that installation of Git was successful by issuing a git --version command from the Git Bash shell. The Git Bash shell was provided during installation:
If you see a version number than you’ll know that git installed properly.
To get started with version control you need to create the entire repository or create a portion of the repository for the project you're working on. With the Git version control system you'll create the repository yourself on your local development machine. It’s important to remember that Git repositories reside in the same directory with your files.
For the purposes of this article I will be working in the “dev” directory.
From the dev directory I will run the git init command to initialize an empty repository:
Next I will validate the empty repository by using the git status command:
It is common to see a statement that says “nothing to commit” especially if the directory is empty.
You should be aware of is a hidden directory called .git, where all of the repository metadata is stored. If you should ever want to back up a Git repository with all revision history in tact make sure that you’re including this .git folder.
At this point you might be tempted to start committing files to your repository, but before you do anything else you should set two Git configuration items. Remember that version control systems are intended to impose accountability. With that said I should ensure that all of my source code edits are marked with my name and email. I set two configuration items:
git config user.name “web3”
git config user.email “firstname.lastname@example.org”
Now I can commit some files to my repository. For the purposes of the article I will be illustrating version control on a Drupal 7 website repository. First, I have to tell Git to start tracking one of my style sheets called style.css.
git add style.css
If I ask Git the status of my working set it will tell me there are changes to be committed followed by the name or names of the new files. Since I know there is a file called style.css that needs to be committed I issue the git commit command:
In the example above I did not supply any arguments to the commit command. When you're using the command line tools and specifically the --m option you can only add the short version of a message which is essentially a comment. If you omit the --m option the command line tools will open up in your default text editor which will allow you to put in a subsequent longer comment. My default text editor is VIM. After I typed my message I press the escape key and then I type :wq to save and quit. Upon saving the comment and exiting VIM I will be provided a status message that says 1 file changed, 1 insertion(+). There are other methods to add messages to a file commit. I could have provided --a --m as part of the commit command. --a simply means commit all files in working directory. --m is a way to provide a message about what was changed prior to the file being committed.
Now that I have committed the style.css file into the repository if I issue the git log command I’ll see my initial check in:
The default output of git log can be overwhelming so a better option is git log --oneline --all. This variation of the log command will display just the changeset IDs and the comments of the files in the current repository.
There are times that you’ll need to compare two versions of one file. This often happens because there are conflicts for which Git cannot determine which version of the file to keep. If I wanted to compare different versions of my style.css file I would use the git diff command:
git diff style.css
Reverting to a prior version
It’s inevitable that you’re going to revert or roll back at some point when working with source code. This process is simple when working with version control systems, specifically Git. At this point I’ve checked in my style.css file. I know that I have a version in the repository that I can rely on. If I realize that I need to revert my style.css file I simply check it out of the main trunk repository. I make my edits as intended and commit the changes.
There are a few options for checking out code into a working set. I can roll back the most recent commit or I can roll back to a specific revision. If I want to roll back to the most recent commit I can issue the git checkout HEAD style.css command:
git checkout HEAD style.css
Alternatively I can roll back to any particular revision by using the checkout command accompanied by the guid of the file in the repository. If I issue the git log --oneline --all command the guids will be displayed with the file names, so I’m going revert back to a specific revision in this case 98c12:
git checkout 98c12 style.css
Creating tags and labels
Up to this point I have only identified files by a changeset identifier and guid such as 98c12. However, it is often the case that you’ll need to be able to identify the entire state of the repository at a particular point in the development lifecycle. One example of software development lifecycle might be a production release of a product. Version control systems allow you to provide a human readable name for the entire repository, in the case of Drupal 7 it might be drupal-7.24.
Tagging and labeling often times are interchangeable terms depending on the version control system you are using. Since we are using Git I’m going to be referring to this process as tagging. In Git to create a tag, I can issue the command git tag:
git tag drupal-7.24
If I wanted to know exactly what existed in the tag I would issue the command:
git show drupal-7.24
Just as you can checkout a specific file into your work set you can also checkout a specific tag. I would issue the git checkout drupal-7.24:
git checkout drupal-7.24
Tagging is an extremely powerful concept especially when combined with branching and merging. Let’s suppose you needed to write specific code for version 2.0 of your product. One common strategy is to checkout a tagged version (1.0). Since you have branched version 2.0 you can edit and write code. Later when you feel confident that version 2.0 is ready for production you might merge version 1.0 and 2.0 back into your main trunk/branch. Keep in mind this is only a single strategy of many. You may decide to never branch or fork code. This is entirely up to you.
Branching and merging
Perhaps the most used and useful of all features in a version control system is that of branching. In Git, the concepts of branching and merging are very easy and extremely lightweight. I can create a new branch by issuing the git branch command followed by the name of the new branch:
git branch drupal-8.0.0
If I issue the command git branch without any options I can see a list of the current branches in addition to my currently selected branch. In Git your currently selected branch is notated between the asterisk and the text in green.
Since I have confirmed that I am currently in the main branch I will issue the checkout command on the new branch effectively marking as active:
git checkout drupal-8.0.0
As you might imagine this is a perfect “sandbox” for experimental code. You have full freedom to do anything you want without worrying about production repositories.
Hand in hand with branching is the reverse process, called merging. Merging allows you to take changes that you've made in a branch and add them back into the main trunk or another branch.
Merging branches is very simple with Git. Keep in mind the process of checking in files to your private branch is sometimes referred to as forward integration. The reverse of this process is called reverse integration and it involves you checking your files back into the main branch/trunk. Only you can determine when to reverse or forward integrate code. Although I suppose it depends how many developers are working on your product, as well as how active your private and main branches may be.
If I want to merge my private branch back into the main branch/trunk I would issue git merge followed by the name of the branch I’m attempting to merge:
git merge drupal-8.0.0
As I stated earlier branching and merging are the two most powerful features in any version control system. Use branches whenever possible to ensure that the main branch/trunk or any other branch stays intact.
Examining shell integration (again focus on GIT)
In most cases the command line interface for every version control system is what you’ll use, however, many version control systems also support hooks for operating system shells via the concept of shell integration. Simply put, with shell integration you can interact with files through the “windows” layer of your operating system. In Microsoft Windows it will be Windows Explorer, in Mac OS X it will be the Macintosh Finder, in Linux distributions it will be the X Windows system. With shell integration you can add files, check files in, check files out, revert, all from inside the shell that you're currently using, normally shell integration options appear via a context menu.
In most cases the version control system will also provide visual status indicators for the file icon in question. Usually shell integration is an optional component during the installation of a version control system.
Graphical User Interface (GUI) tools
Aside from the command line and shell integration there is yet another popular method for accessing the repositories of a version control system. This method is often referred to as a GUI tool.
Most if not all GUI tools include functionality that allow you to take exactly the same actions you would from the command line. For example add, check in and check-out, reverting, branching and merging. Perhaps the greatest advantage to a GUI tool is the ability to visually inspect the repository and open up a file in the native file browser.
Git supports many GUI tools. I recommend looking at the GUI Clients page from the Git project website http://git-scm.com/downloads/guis there are many options to choose from. I personally like GitHub for Windows http://windows.github.com/ as it works very well on the Windows platform and directly integrates with the GitHub cloud service.
GUI tools are really a matter of personal preference, they can allow you and your team to work fast. If you're not comfortable with working on the command line, GUI tools provide a great alternative.
Selecting Version Control
In this article I’ve discussed one specific version control system. While Git is a great option it may not be right for your development environment. There are many version control systems to choose from. Some of the other popular options include Subversion, Microsoft's Team Foundation Server, and Mercurial. You might be asking, how do you know which version control system is right for your development environment? Keep these considerations in mind.
- If you’re working in a large team environment and all your developers are working together in the same office than you might want to try a centralized version control system. Subversion, or TFS might be appropriate.
- If you're starting up your own team, I would highly recommend using Git or Mercurial.
- Git hooks into many integrated development environments and offers GitHub cloud service. As you may know GitHub is free for open-source projects. If you need to host commercial code it is also very affordable. Git makes it very, easy for you to use the Git repository on your local device and then push your changes periodically to the server on the cloud to the Git hub server.
- If you're software development team is scattered around the world Git and Mercurial can be great options because you can export and import and send the differences by email without having to set up a server infrastructure to host a repository.
If you need proof on how effective Git can be in a real world example you might be interested in knowing that the kernel in your favorite Linux distribution lives in Git repositories and it goes without saying this development team is large and scattered throughout the world.
Pick the version control system that you feel works best. If you’re just starting out and you have flexibility in choices I'd recommend using Git.
So there you have it. Hopefully you'll adopt a version control system that does just exactly what it is you need to do.