Focusing development for delivery
Originally posted on IntelligentHack.
Twenty years ago, 17 software engineers and self-described ‘organisational anarchists’ met at a Utah ski resort. Frustrated with what they saw as the ponderous, bureaucratic processes guiding software development at the time, they strove for a simpler, leaner alternative. The result was the agile manifesto, which laid out the following principles:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
While they saw value in the items on the right, they considered the ones on the left more critical. They hoped adopting these principles would help teams move away from monolithic, unchanging plans and ‘big bang’ releases instead of delivering continuous incremental change. This approach would (they hoped) allow teams to benefit from early feedback and change direction when necessary before the teams had wasted too much time.
Far more teams claim to practice agile than actually do so effectively, and an entire industry of expensive agile consultants has sprung up to milk the enterprise cash-cow. However, much of this industry dedicates itself to applying ‘agile’ branding to existing waterfall practices, changing very little (save the respective bank balances of said consultants).
Agile development is a vast topic involving collaboration between the development team, product owner, and business stakeholders. However, this blog focuses specifically on the development part of the cycle.
Scrum - a brief outline
Many teams practice (or claim to practice) Scrum, an Agile methodology that organizes planning cycles into discrete sprints, typically two weeks long. At the beginning of the sprint, a team estimates the work it believes it can get done. At the end of the sprint, the team evaluates how well it did and adjusts its estimates for the next sprint accordingly. This way, at least theoretically, the team can deliver progress indefinitely at a sustainable pace while allowing stakeholders to track its progress.
Source: IntelligentHack
Scrum embodies the tension between responding to change and following a plan. Because the sprints are relatively short, a team is free to change direction with relatively short notice. However, the existence of sprints and the prioritised product backlog provides teams and stakeholders with an overall plan (albeit one subject to change) and mitigates against the disruption caused by too many changes of direction in a short time.
Continuous integration and delivery
A common mistake made is confusing planning cycles with delivery cycles. In Scrum, a sprint represents a discrete unit of planned work for the team. It does not denote a single delivery - the best performing teams deliver, respond to failure, and successfully recover multiple times a day, as the following table shows:
Source: State of DevOps Report 2016
Why are these metrics important to agile delivery? Suppose a team brings changes to production multiple times per day (deployment frequency and lead time for changes). In that case, it is by definition delivering frequent, incremental change, enabling it to garner rapid feedback from stakeholders (addressing the responding to change and customer collaboration parts of the manifesto). To do this safely, it needs to trigger as few incidents as possible (change failure rate) and close them quickly (MTTR).
The simplest way to minimise the lead time for changes, MTTR, and change failure rate is to make changes as small as possible and then test and deploy them as quickly as possible. This approach requires the team to concentrate on a few things.
Make pull requests as small as possible
Each PR should encapsulate a single deliverable. Huge PRs spanning dozens of files are hard to review and test, take longer to be ready for deployment, and are more likely to contain errors. A few common mistakes include:
- Building a whole feature in a single PR, which is rarely necessary. Think - could you deliver the UI first? Can you release the ‘happy path’ before handling every ‘special’ case? Feature flags are a big help here - they enable partial features to be delivered and tested in QA while being hidden from the end-users in production.
- Refactoring as a prerequisite for working on the feature. If you find that technical debt blocks you from delivering something, you should first address the technical debt in a separate PR, regression-test, deploy and then move on to the feature itself. That way, you’re not mixing up regression and feature testing.
- Random ‘drive-by refactoring.’ I always encourage my teams to do this, as it helps keep on top of technical debt. However, if you want to rename that method used in 50 files, do it in another PR.
- Refactoring too much at once. This mistake can be tricky to identify, as you can often start out trying to refactor a class, then quickly find yourself disappearing down a rabbit-hole as you fix more and more crud in its dependent classes. There’s no single approach to this, but I often find a ‘vertical’ strategy works for me - fix the class, and make sure the interfaces it touches are correct but don’t worry about the implementations of said interfaces. For example, you might refactor a controller in an MVC website, moving the business logic into some business logic classes. The interfaces on those classes should be correct (i.e. the method signatures and public properties should be the final ones), but cleaning up the actual implementation should be left to another PR.
Prioritise PR reviews
Creating regular, small PRs is pointless if they’re lying around waiting for reviews for days. Moving PRs along should be a priority for a team - no PR should be stuck for more than an hour waiting for feedback. Teams can speed up PR reviews in a few ways:
- Specify a reviewer and notify them directly. The team could automate this by integrating their source control system and messaging platform, or they could write a bot to pick a random team-member to perform a review.
- Discuss the PR synchronously face-to-face, via instant messaging or video call, rather than asynchronously via the PR itself. Integration between the team’s source control system and messaging platform can also inform people of comments on their PRs.
- Don’t protect the master/main branch. PRs only help larger or more complex changes that require collaboration - an experienced team should just push small changes directly.
Perform a full regression test before each release
No matter how much effort you put into writing clean, well-tested, loosely-coupled code, the real world always gets in the way. Perhaps you’re integrating with a particularly unreliable third-party API. Maybe your service is a small part of a complex ecosystem. But frequently, changes to one part of the system have unexpected effects on other parts. To keep your change failure rate under control, it becomes essential to test the critical components of the system - to perform a regression test:
- Verify that your main features’ ‘happy’ paths still work, ensuring you cover every integration your application has. Don’t test every edge case; just cover the main flows.
- Make a regression testing plan, and stick to it. The first step is to write down all the test steps, then execute them manually against a QA environment.
- Make your development team do the testing. Handing regression testing to a dedicated QA team is the definition of pipelining (excreting functionality and ‘throwing it over the fence’ without taking responsibility for its quality) and general developer laziness. When developers feel the pain of doing manual testing, they will be motivated to make the application easier to test.
- Automate, automate, automate! As Blake Norrish points out in his blog, The Regression Death Spiral, relying on manual regression testing will eventually choke the team’s velocity as the list of functionality requiring testing grows. Turn your manual test cases into automated tests, which should run in a realistic environment.
- Make your tests fast and reliable. System tests that fail randomly will eventually be ignored by the team, making them useless. Tests that take a long time to impact the delivery cycle in an obvious way - if your system tests take two hours, you won’t be doing many deployments a day.
Measure, measure, measure!
Each environment, test suite, and deployment process has its pain-points. The only way to know what they are is to measure them. Keep track of how much development time each release takes (along with the principal pain-points), which bugs crop up in production regularly, how long the tests take to run and how often they fail randomly. If possible, automate this data-collection via integrations with your production monitoring tools, CI pipeline, and source control system, and turn it into a set of KPIs for your team.
Summary
The agile manifesto encourages teams to deliver continuous, incremental change, gather rapid stakeholder feedback, and course-correct accordingly. Teams can optimize their development processes to this end by focusing on DevOps metrics around deployment, recovery, and reliability. Above all, teams must continuously measure their performance and seek continuous improvement.