My most recent project was helping a major online retailer to mature their build process as part of a wider effort to improve their IT effectiveness through the injection of development best practices.
When we came onboard manual intervention was needed for any of their builds or deployments to work and so it was rare for more than a couple of builds or deployments to be completed successfully in a day. Now we often have up to 1,000 builds running every day – what’s more the majority of them now pass!
This article looks at a few of the techniques we’ve had to put in place to enable this transformation and what we’ve learnt along the way.
Dividing up builds – separation of responsibilities
Initially we had two versions of the build – one that would run on commit and a second that would only run at night. The two versions did pretty much the same things, except the nightly builds would deploy to a shared testing environment. Deployments only happened at night to avoid disruption to the multiple teams working in this shared environment (another smell to be dealt with later). The on-commit build did compilation, unit testing and packaging – the nightly build did the same and then handled environment preparation and deployment.
There were a few problems with this:
- The on-commit builds ran slowly because they were doing a lot of work
- The nightly builds were very brittle since responsibility for different parts of the build (compilation, packaging, deployment etc) was split among multiple teams – a failure at any step would mean QA didn’t have a new build to test that day
- It wasn’t DRY
Our solution was to divide the monolithic build up into several smaller steps and tie the steps together into a “build pipeline”, with each step triggered by a successfully completed “upstream” build. Our pipeline was divided into quick, assemble, package, deployment and regression builds. “Downstream” builds would pick up the artifacts prepared at the previous step in the build.
This lead to various benefits:
- Each build ran more quickly
- We didn’t repeat steps
- It was easier to divide up responsibilities for keeping the different builds green:
- Developers :: quick build (compilation + unit tests)
- Build team :: assemble build (jar, war, ear creation etc)
- Deployment team :: package + deployment (preparing RPMs / configuration / deploying)
- QA team :: regression suites
- Because we used the “last known good” upstream artifact, deployments weren’t blocked by a last minute problem upstream in the pipeline.
- We were now in a position to split the builds up over more boxes to achieve greater throughput through parallelism
This concept of dividing the responsibilities (at least for first and second line build support) had a major impact on improving success rates and response times to failures through an increased sense of ownership of the process.
Tying the builds together – HTTP publishing
In an ideal world the input for a downstream build should be the output from an upstream build – the appearance of a new artifact from one build would trigger the next build down the chain. In our situation that wasn’t quite possible (the server topology and optimal dividing lines between builds, from a speed point of view, didn’t favor communicating via artifacts) so we took a different approach.
Our solution was for builds to communicate with each other via small properties files, published locally and accessed over HTTP. On successful completion each build publishes a properties file (to the local Cruise web-server) containing the build number and SVN revision of the build. We built a custom Cruise publisher to handle this. We also built a custom Cruise HTTP modification-set to watch for upstream builds and an HTTP label-incrementer to keep all the build numbers in the pipeline in sync. The same build number is used through each of the steps of the build, making it easier to check whether changes have flowed through to the QA environment – more on this later.
Watching the river flow – visualizing the build pipeline
Once we had divided up the builds and added further applications, we ended up with close to 100 builds that needed to be managed and monitored. Since they were spread across multiple boxes to increase throughput this wasn’t as easy as just watching one large cruise page. So we built a centralized page for aggregating together the build results.
The dashboard page was comprised of a few separate components:
- Current build status grid
- Graph of passing v failing builds over time
- Graph of build times per type / step in the pipeline
- Quick list of which builds are currently broken
- Some basic code metrics
We also provided similar drill-down pages, filtered just to show data on a single application or step in the pipeline. These then link through to the actual Cruise pages for log files, failure reasons and test results.
Current build status grid
The current build status grid was probably the greatest breakthrough. The basic idea was to show the current status of all our builds in a simply structured manner. Over a few iterations we evolved a grid structure (a bit like battleships!). The columns in the grid represented steps in the build pipeline and the rows the different applications we were building. The columns were repeated for each active branch. Cells were then green if the build was good, red if it was bad and yellow if the last build was good, but was more than 24 hours old (stale). Mouseovers give more details about the selected build, such as when it ran and what its build number was.
From this view informative patterns were clearly distinguishable with a cursory glance. It became obvious if problems were localized or spread across a specific application or step in the pipeline.
Initially we emailed a snapshot of the grid first thing in the morning and at the end of the day to management and the teams. This had the wonderful effect of halting all the annoying questions and sparking the interesting ones. Instead of people asking whether a build was successful or what build number was last deployed to QA we got questions like:
- Why are all the assembly builds broken?
- Why is all of application X broken?
- Why hasn’t application Y built for 2 days?
- Why is there a build on the prod support branch?
- Wow – everything’s green – shall we go to the pub?
This definitely shone the spotlight in a few uncomfortable places, but the visible improvements this lead to made it worthwhile. It became an invaluable tool for educating people about how the pipeline works. Once people understood and were reliant on the data we replaced the emails with a dynamic, self-service, application hosting the data.
Graphing build metrics over time
We also had great success with some of our time-based visualizations. Simply graphing the times of each step in the build pipeline over the last month quickly highlighted problems as they developed. In one case we realized that all the packaging builds were getting geometrically slower and slower – a quick bit of research showed that each build was adding new artifacts to SVN and the ensuing checkouts were suffering. We fixed this problematic practice and had a massive positive impact on the total throughput of the pipeline.
The historic view of successful vs. failing builds over time was also a particularly useful tool while we were focusing on improving the stability of our deployments to QA / UAT. This gave us a simple report on how we were doing – 50% success last week, 75% this week – woo hoo!
Sparklines
We found a simple yet effective way of communicating a large amount of information about how a build has been performing by using “sparklines”.
For a given build these graphs communicate the number, duration and success of a large quantity of builds. You can quickly get an idea of how things are trending. Again mouseovers provide more details, context and drill down.
Under the hood
Initially the grid was generated and emailed by a scheduled round-robin script hitting all the build servers. But this was slow, required us to configure all the build server addresses in one place, wasn’t fault tolerant and stored no build history. So we moved the build telemetry dashboard to a simple Rails webapp employing CSS wizardry for the visualizations. The data was populated via a basic RESTful API. Each build has a custom HTTP publisher (again we built this) that fired off a simple set of information about the build in a POST request:
- Project name
- Build label
- Date / time
- Build duration
- Build successful
- Hostname of the Cruise server
Build naming conventions
The one extra key to making this all work was standardizing the format of the Cruise project name so we could parse out the application, branch and pipeline step without adding any extra complicated meta-data. We chose “<branch>-<application>-<step>” as our format, e.g. “trunk-bigapp-quick” or “3.20-littleapp-assemble”.
Where’s my build?
Using our custom label incrementer meant that each build in the pipeline for a given SVN revision would have the same build number (e.g. “trunk-bigapp-quick.123″, “trunk-bigapp-assemble.123″, “trunk-bigapp-deploy.123″). This made it much easier for developers and QA to work out what builds and features had made it through to the QA environments. We built on this by providing a customized version of the currents status grid to answer exactly that question – where’s my build?
The grid displays the latest build number at each step – clearly showing how far changes have progressed and also highlighting how a broken step breaks the pipe. This has become a vital self-service tool for the dev, QA and deployment teams, and has saved the build team from a lot of very dull questions.
A growing awareness of the importance of the pipeline has allowed us to devote efforts towards optimizing the total throughput. All the good lean principles definitely apply at this point – looking at optimizing the whole system and working out where we’re queuing or waiting too much nearly always deliver the best results.
And the rest …
These tools freed us up to use the build team as a beach-head for spreading general coding, development and automation best practices throughout the organization. Some of these included:
- Untangling spaghetti dependencies with Ivy
- Spawning multiple virtual machines for regression testing
- Automating Cruise configuration and version updates
- A whole host of black-belt Ant Fu
- Correlating multiple code metrics to drive code quality improvements
But those are stories for another occasion …
[This article was originally written for an internal ThoughtWorks innovation newsletter – I’ve made some minor edits for this blog]