How to tackle technical debt
Mining for debt
Recently on Slack one of my colleagues shared this comic from Monkey User.
I thought it was a great metaphor.
The world of software moves extremely fast. Inside a given company the codebase is constantly changing with the addition of new features. Outside the company is an entire world of open source software development, shipping updates to all of the libraries, frameworks and databases that are being used.
With time, piling on ever more code creates moments where the team needs to stop and take a step back. They will need to think of a different way of moving forward that is more maintainable, controlled and less prone to bugs.
Even if the internal codebase changes extremely slowly, external dependencies are always releasing new versions, requiring the team to upgrade them before they reach end of life. This can create further technical debt as APIs deprecate or breaking changes are introduced.
Quite often engineers struggle to make their case for prioritizing tech debt work. Why?
Lack of empowerment: They might not think it is their place to speak up about it; instead they expect that more senior people will dictate when to take stock, refactor, upgrade libraries or storage or so on.
Inability to persuade: They might not be able to construct an argument to spend time on it in a way that non-technical people that dictate the work streams will understand.
Apathy: They may have already lost hope that Product or any higher-ups will listen to them, and therefore silently let codebase or system degrade. "Features are more important," they say. "They'll never listen to us."
All of these situations are a shame. They're also not acceptable. But they're fixable. Let's have a look at them in turn.
It's someone else's problem
As an engineer, if you think that it is someone else's problem to point out that there is a technical debt issue beginning to get out of hand, then - and I'm sorry to say it - you're wrong. There are a number of traits that an excellent engineer will have, and a pride for their work and keen interest in the future state of the code are two of them.
Those committing code will know best about how the codebase is currently written and organized. They will be the first to begin to notice the bad smells. As they realize that continual dirty hacks are the only way of moving forward, it's their duty to raise the flag.
The creation of technical debt is inevitable; as inevitable as the slow erosion of a chalk and lime coast by lapping waves, or the weathering of a old building. We should be comfortable with the fact that it is going to happen, and is likely happening right now, and we should be especially comfortable with alerting others when it starts to feel bad. We should fix the broken roof tiles before they become a leak.
Talk to your team about it. Talk to the other engineers that work on that codebase. Build consensus that there is a problem and that something should be done about it. Don't wait for someone else to point it out. It is as much your responsibility as it is everyone else's.
Shout.
Constructing the argument succinctly
Now that a technical debt problem has been identified, we'll need to think about how best to argue for getting the time and space to fix it.
Many engineering departments are building a product that makes the company money by selling it to external users. Some service internal users. I work in SaaS, and I would say that the expectations of our users are:
That our applications are available no matter the time of day or day of week.
That we'll be continually adding new and innovative features to our products.
These expectations are pretty well understood by everyone in the business, regardless of whether they work in commercial, engineering, product, marketing, or wherever. That's a good thing, because if you use one or both of them to construct your arguments about tackling particular pieces of technical debt, then it's hard to be ignored.
Rephrasing the above two bullet points with a focus on thinking about technical debt:
The platform should be acceptably fast, correct (enough) and should have a very low likelihood of going catastrophically wrong with no prior warning. It is a very bad thing for business when this happens.
The codebase should be easy and efficient to work in as we continually add more stuff to it. If we can't maintain a reasonable speed of adding new stuff, we begin to lose out to competitors, and the rest of the business wonders why we are getting slower, inviting lots of fruitless arguments about developer productivity.
We need to tie our arguments to these reasons. If engineers argue for doing technical debt work in a way that doesn't make sense to the non-technical layperson, then it's very hard to them to win hearts and minds in the business. They'll wonder what they're up to rather than shipping features.
Technical debt shouldn't be fixed because it's "obvious" or "the code could be better" or "it's annoying" or a particular framework is now "the latest thing". Those reasons may be entirely true, but the argument needs work.
Let's have a look at some different scenarios.
"We need to upgrade Postgres." OK, I totally understand. But we need to think of a better way of phrasing this to the non-technical person. What does the upgrade bring us? Is it some critical security patches? Does it have a positive effect on the speed at which the application is going to work? Does it have new features in the query language that will allow us to query the data in a new or better way?
"We need to refactor AnalysisPipeline.scala!" Nobody has any idea what AnalysisPipeline.scala does. Probably only a few in the department even know. Does it lack tests and is there causing a lot of bugs in written documents that are challenging to fix once they're committed to storage? Is the class such a big monolithic mess that it is too hard to add new features at the rate that the business expects? Is it taking five times as long to work on as it would if it was split out into multiple classes, methods, modules or services?
"This service needs a rewrite." Sure, it probably does. But what's the real reason? Is it stuck on a framework that is now years beyond end of life and nobody knows how it works? Is it an area of the code that is going to have a lot of changes in the coming year, but the risk of it breaking is too high to keep adding to it quickly? Will the speed or stability of this particular service be much better if instead of working with it we just start again instead, taking advantage of the knowledge and technology that we have now?
Getting better at justifying why technical debt needs to be fixed isn't just a skill that helps you get the clearance of your team lead or product owner to start working on it: it can also help you make up your own mind as to whether something is a real long term issue for the coming year or just a short term frustration for the current sprint.
Nobody will listen
If nobody will listen to your arguments about addressing technical debt, then first check that you're constructing those arguments properly, as mentioned in the sections above. You are? Ace.
If a common pushback is that there are too many features queued up to build, then there may be an underlying worry from your product manager or line manager that fixing the technical debt will be a slippery slope that goes on forever and destroys productivity.
One answer to this is to try your best to estimate the effort that it will take to fix it, and, better still, break that down into phases or milestones that can be incrementally worked on.
A tactic that works well to please both Product and Engineering is to balance periods of feature delivery with periods of tidy up and refactoring. In It Doesn't Have To Be Crazy At Work, the creators of Basecamp pitch for periods of 6 weeks building followed by 2 weeks paying down technical debt.
At Brandwatch we have employed similar tactics with a period of a team delivering a big ticket feature being followed by a fallow period where the team prioritizes and executes their most pressing technical debt concerns, such as refactoring, improving monitoring and writing documentation. The bonus to this way of doing things is it gives your product managers and designers time to ruminate on the next big thing.
Sometimes, however, there is a massive elephant in the room: a technical debt project so big that nobody wanted to talk about it, yet the swell has grown to the point where the wave is going to break - either with the codebase continuing to become a complete mess, or the platform becoming increasingly slow and unstable.
In this situation, honesty and transparency is the best policy. It is the job of the leaders in Engineering to elevate a large technical debt problem into a separate work stream in order to give it the recognition, space, and resources that it needs; typically a dedicated team over a longer period of time.
In doing so, the principles above are just as valid: raise the flag, gain consensus, plot an approach, and make the problem understandable to the layperson. Make it clear that the future is brighter by doing this work.
Convince them that it would be silly not to do it because the future of the business depends on it. Then sort it out.
In summary
Remember that if you are an engineer, it's your job to raise technical debt issues as early as possible, and to make sure that you are able to explain their impact in succinct and meaningful ways. Managers: it's your job to listen and to create the space for the issues to get worked on.
Building a successful SaaS business requires a stable application and the ability to work quickly and efficiently: both of these things are impacted severely by technical debt, so don't let it build up. Pay it down.