I love Task Scheduler and Cron. I think they are wonderful tools to help automate away repetitive tasks. I’m sure there are many manual tasks that you do, that I still do, which could be shifted off to those tools and would make your/my lives immensely better. However, I think it’s important to choice the right tool for the right job. If you are using these tools to manage your build system/build servers you have chosen the wrong tool. In this post I’ll explain why that is and what type of tools are the right ones for you.
You current setup using Task Scheduler or Cron
If you’re reading this and know about continuous integration, then you are either convinced and using it or are on the fence. I want to help push you off that proviberal fence or put you on it if you are on the “wrong side”. Before I continue let me congratulate you on making the leap to automating your build system. Lets not even talk about those people that still do that manually! If you have Task Scheduler or Cron doing your build work then you probably have a command line build system. You’re on the right track!
My first question then is does the following look familiar?
- Machine 1:
- Nightly task
- kick off my software build (make, rake, grunt)
- Maybe deploy the software somewhere…?
- Get a textual report
- Email reports somewhere
- Nightly task
- Machine (N):
- Nightly task
- kick off my software build
- Maybe deploy the software somewhere…?
- Get a textual report
- Email reports somewhere
- Nightly task
Odds are you have to set-up those machines manually? Odds are you have to configure the tasks manually? Odds are you are probably only doing a nightly build at best? Odds are you have those machines are sitting ideal for a long periods of time? Odds are you are not able to monitor the execution of the processes on those machines dynamically as they run?
When things go wrong, does it look like this?
Let me ask you a few questions about your setup.
- What happens when a build fails?
- When do you rebuild after a failure?
- How do you change the frequency of builds if you find daily is not enough or too much? (there is no such things as too much)
- Where do you have to go to do that?
- Do you have situations where a build could take a long time and you have a situation where two builds are happen on the same machine at once against the same software causing massive corruptions?
- What happens when your source control system goes down?
- If you are running at a defined interval, when will the rebuild happen after you fix your source control server? How will you kick that off? Will you leave everybody sitting waiting for it?
If you are using Cron, and you have a large product you building I can probably guess the answer to these. If you’ve make you build scripts graceful enough they’ll send you an email. In fact if you have a few machines running you get a lot of emails. You’ll probably send that to a mailing list that lots of people are signed up to. Most of those people will have an email filter set up to stash them away or even delete them. Turns out if you spam your developers with build emails they will ignore them.
On a failure you might frantically log into all you build sever to read the logs, and then you might manually kick off you’re builds.
To change the interval you might need to log into a heap of machines and change crontabs or jump into Task Scheduler. Odds are, even if you know you need to have more frequent builds you’ll never bother because it’s a hassle.
If you builds take a long time, you wouldn’t even dream of setting up more than one per build server… if you have a lot of branches you probably have a server per branch. If you don’t have that many machine you’ll probably have some builds happen a few times a week, then others on alterating days. And that results in a long lead time between development changes and testing, acceptance and releasing.
Goals of your new improved build system
Here are a set of goals you should shot for:
- Make the most of the server you have dedicated to the task of building software.
- You’re spending money on it. May as well make it work. Why not have multiple builds happen on it daily. You build can’t take that long?
- Be able to build all your software with one a single command.
- Have a build system that can capture and easily process the data that is captured from your build scripts.
- Have a way to see all the data captured by your build scripts in a historical and easily navigable sense.
- Have a system that understands when the source control changes and can respond appropriately.
- By running a full build
- Or a sub set of the build (because you have a really long one).
- By running tests.
- Testing and integrating changes.
- A build system that is smart enough to only email the right people at the right time.
- Have a build system that can understand an integrate with your release process.
- If it can talk to your source control, it can understand your branches.
- Have a build system that can integrate with your bug tracking software/process.
- Have a dashboard where you can see the outcomes of your builds/processes you kicked off.
- Bonus points for a matrix display.
You cannot easily achieve all of these goals with Cron or Task Scheduler. You shouldn’t try, because those are not the right tools to use to accomplish this epic set of requirements. So the question is…
What are the right tools to use?
What I describe in the previous section was a continuous integration (CI) server. Something like Hudson, Jenkins, Cruise Control…etc. To quote Jenkins, a continuous integration server is an “application that monitors executions of repeated jobs, such as building a software project”. That quote really does not fully convey the power of using a CI server. All of those goals I describe previously can be accomplished by using one of these systems and by having granular build script steps.
How CI Servers match your goals
Lets take the first point, “making the most of the server”. Because these CI understand Source Control, as you make changes to the code base they can pull that code down and do things with it. CIs can react, they are dynamic. Surely that’s all I need to say to sell this to you, but let me continue.
What about building your software with a single command. Well, you should have that sorted with your build scripts, the makes, rake, grunts of the world. All a CI server will do is kick those off and monitor them. But here is the interesting part, CIs monitor the output of your system. They grab the STDOUT of the process they are running and can reason about it. They provide a web front-end for your build logs. No more do you have to jump from one machine to another to find out what’s been going on with your builds. The more data you split out of your build process, running tests that produce an output, running code coverage that processes an output, running static analysers or linters that produce an output, all of that is captured and processed by these tools.
You really don’t know how life changing that is until you experience it. I know I sound like I’m over egging it, but it really does change everything. You have a dashboard for everything that happens with your software, these systems integrate with Intranet software, they have APIs to grab data and stick that data on TVs/Dashboards if you wish! CI servers integrate with your bug tracking software so you can find out exactly when a piece of code checked in is built and ready to be Q/A or whatever your process is.
Save you time, money and headaches
The CI software handles things like checking out code, emailing, managing processes…etc you now no longer need to put that stuff in your make, rake, grunt build systems. You’ve reduce the amount of code you have to deal there. Less code means less complexity, means less bugs/problems. It’s a win-win.
CI software systems integrate with the vast majority of bug tracking software, have API and plugin infrastructures that allow you to write whatever you want. You more than likely only need to search for a plugin and you have an incredibly complex feature straight away, no work. You focus on things you actually care about.
Multiple build servers
While talking about these CI servers at my current workplace I brought up one of the most important benefits of this technology. Imagine if you will, you have multiple build servers. They all have their own independent setup and need to build different branches/versions of your code. How does CI software help you manage all the data that will come out of each? Do I have to go to multiple dashboards to see what going on?
CI Server Master Slaves Architecture
Tools like Jenkins provide a sophisticated master slave architecture. In this system you have a single Jenkins instance at the top which manages all build related work and captures and displays all the data for that work carried out… on the slave servers you can have another instance of Jenkins which broadcasts it’s wiliness and ability to build software. The master then coordinates all of the slaves, gets them to do their building and then captures all of their output.
You get all of your data for every build in ones place, you get to see the build progress of these different environments. You have visibility into your multiple build server based build system set-up. It’s incredible. What is better is since they are CI instances they can set them up to quote builds. You can get the maximum benefit out of every machine, having it run the maximum amount of builds possible in any 24 hour period without the possibility of corruptions.
Conclusion
I have listed some compelling reason why I think a continuous integration server is a much better option than Cron or Task Scheduler for your software products. Now it’s up to you if you want to use one. These kind build engineering tools and practises are becoming standard in our industry and I think it’s time to take notice. Someone smarter than I once said:
the build server is critical-- it's the heart monitor of your software project.
I think this is more true now than it was in 2006. So go forth and get rid of those scheduled jobs and get some new found peace of mind while you are at it!
References
Below is a list of links that used to research this topic. I’ve place them here for your convenience and so you can go double check the information I’ve given you if you would like.
- Blog post title image, “head scratching” by James arboghast.
- Jenkins Dashboard image from wiki.cloudbee.com
CI Servers Options
There are loads of CI servers out there, you don’t need me to tell you which one to use. That’s up to you. I’ve listed a few options throughout this article here is neat list of those for your reference.
- Jenkins CI Server
- Hudson CI Server
- Cruse Control .NET CI Server
- Travis CI Server
- Wikipedia List of CI Server
Useful Articles/Videos
The following are articles I used as reference points in this post. I’d highly recommend watching some of the Etsy video, they are very inspirational because once you go past just using CI servers you find out what it takes to ship your product multiple times a day.
- The Build Server Is Your Projects Heart Monitor by Jeff Atwood
- How Facebook Pushes New Code Live by Chuck Rossi
- Mobile CI at Etsy by Daniel Schauenberg
- Continuous Deployment at Etsy: A Tale of Two Approaches by Ross Snyder
- Distributed builds using JenkinsCI
Other
Here are few miscellaneous links, articles and software. The most useful of which is probably the Shopify’s Dashing framework.
- Dashing Dashboard Framework
- Wikipedia Article on the Do One Thing Well Principle
- How not to use Cron by Tom Limoncelli
- SO question - “Is it reasonable to run processes with CI tools?” asked by smp7d