Hi @jenwilkinson! I think you have a very important job there, more so than most engineers perhaps realise. IMHO most companies are slowly (or, um, quickly) declining because they do not do the things that you’re responsible for. I have… questions. So, so many questions .
Firstly how are systems documented? How is it all linked together? Say you’re in the guts of some service or other and it points at another service, what’s the directory that allows you to go from one to the other in terms of the documentation?
Is the documentation versioned in something like git?
Is the documentation able to answer questions from both a developer and a business process standpoint? Can both developers and business people use the same documentation?
Are datatypes at the edges of services a significant part of the documentation effort?
How do you document the “why” of each service? Is there a way to get a global overview of all the “why this service exists” for each and every service? (I know that there are a lot of microservices in Monzo’s microservice architecture
When someone decides to leave, how do you make sure that the contents of their skull is recorded in your knowledge management system? Staff attrition is the company-killer through this precise mechanism.
Is this whole documentation system linked back to the RFQ’s that originate most of the things that Monzo builds? (or at least I believe that used to be the case) so that a developer can see the entire history from inception to current state?
While I can imagine that each of the X thousand microservices have their nice individual docs, how do you structure the higher-level architecture of the documentation around that? Are there some “getting started” / intro docs and then a vast chasm all the way down to the coal-face of some service that, say, gets a list of transactions, or are there intermediary levels?
In a related way is there documentation around the flows through the application, down through each microservice? If so is this just prose that gets updated ad-hoc by developers, or is some of this auto-generated from the code itself? Do you auto-generate diagrams along with those code flows?
Do all developers get mandatory training on how to properly adhere to the conventions that you lay out?
How do developers do discovery? If a developer wants to find the service for X, or all the services involved in some process Y, how do they find them?
How do you prevent nomenclature drift? I’ve worked in a number of places where different people invented different, often deeply unintuitive terms for things, which all ended up co-existing. How do you make sure that there’s one glossary?
How do you maintain quality? Are documentation pull requests vetted deeply and thoroughly by other developers to make sure that they make sense? It’s often very hard to get outside of your own head to explain complex objects to others as you forget how familiar you are with the objects that you yourself created and all the underlying assumptions that are invisible to you but nonetheless guided the thinking.
Is documentation done primarily before, during, or after development? If all three then which parts are done in which phase of the development cycle.
This sounds ace, understandably you can’t stop people from perhaps not being “up to speed” but how are learnings fed back into the system. Do people have the ability to add themselves (such as a wiki) or is it locked down?
Also how does this work in terms of the COp’s - do they have a similar system then can use to help answer customer enquiries.
Great question. I think you’re referring specifically to our customer operations documentation, which is handled by a separate team at Monzo. However, I have worked with that team previously so I’ll try to answer as best I can!
In the past it’s been difficult for our customer operations staff to find accurate internal docs to help them answer customer queries. Sometimes this has meant we’ve given inconsistent messages to customers, which we absolutely do not want.
In the last year, we’ve invested heavily in improving this. That’s included things like moving all customer support docs to its own dedicated system that is much easier to search, understanding how customer operations staff are interacting with it, and creating new processes to keep that documentation up to date.
Every piece of content in that system has an owner who regularly reviews the content to keep it fresh. As well as proactively monitoring that content, we also have a team of Quality Assurers (QAs) who investigate if things have gone wrong. They will look at things like whether the support person checked the docs while helping the customer.
All of this is a work in progress and more improvements are on the way. We’re now starting to make better use of the metadata and analytics we have and restructuring what’s there so we make it much quicker for people to find the right answer.
We have a few different knowledge systems. Wherever possible we try to be as transparent with our internal docs as possible. Unless there are strong reasons why someone shouldn’t be able to read a doc (usually something operationally sensitive), everyone has the ability to see most information on those systems.
For engineers, information largely lives in either our main company-wide knowledge system. This is the closest thing we have to a wiki. And we have an internal developer portal where the source files live alongside our code.
We have tighter permissions in those systems for content that folks may be allowed to read, but not to edit. And in the case of our internal developer portal, those permissions are often tied to the same permissions we have around the code. So in the same way that an engineer cannot make a rogue change to some code without the owners seeing and approving it, someone cannot make rogue changes to the associated documentation without it also going under the same checks.
A few years ago, we made the decision to move COps-specific documentation to its own system, better designed for their needs. It was a gnarly migration, but has been well worth it to help COps more easily find the information they need to help customers.
Interesting to hear about your time at the GDS, and the work you do. I suppose my questions are:
What skills/knowledge/experience translated best in the migration from CS to private?
Did you find the culture massively different, or more samey than you expected?
And in relation to this respone:
In the last year, we’ve invested heavily in improving this. That’s included things like moving all customer support docs to its own dedicated system that is much easier to search
I’m not sure what system you used in the GDS (I myself have some experience in Ocelot), but do you find that private sector lends itself to a more agile rather than a process driven approach when it comes to updating COps policies?
I’m ruling out the animal (unwisely?) so my shortlist is a Microsoft API framework, an accountancy package for content producers and AI driven higher education software (no me neither on that last one).
Can you point us in the direction of what this is? (If it’s a secret government code name then I have screenshots and know where the MI5 building is).
Can you point us in the direction of what this is?
Absolutely it’s a web based portal for COps. Not sure who developed it, but it’s used by 3 departments I know of from a 2021 Office for Tax Simplification release. I just made a link between simplifying that and streamlining COps within banking
I’ve not used it myself in quite some years but it seemed quite useful, but of course my experience is that the private sector is more agile and so might not need such a system - I probably didn’t word my question that well though, so I’ve done a wee rewording to help
As I think you mentioned in another comment, GDS feels a lot like Monzo and vice versa. When I first met Jonas, it was actually one of the first things we chatted about.
GDS feels very different from other departments. Everyone is there because they really believe in making digital services better for the public. The same is true for Monzo. Folks here believe really passionately about making it easier for people to handle their finances. No one looks forward to having to interact with the government or their bank. Usually when you do, you just want to get something done so you can get on with your day. Working with people who understand and empathise with that sounds basic, but contributes to everything both organisations do.
Where it feels different is the speed at which Monzo can move. In gov land, everything needs a lengthy business case, there are mounds of red tape, and there’s always a committee for a committee for a committee involved somewhere.
At Monzo, anyone has the genuine autonomy to pitch an idea, get feedback, and run experiments to find out very quickly whether something is worth investing in. And if we find out something hasn’t worked, that’s a great result for us because we know it’s not worth our time and effort. That’s not to say there aren’t checks and balances in place, but the difference is that you’re working with the system, with the regulations, not against red tape that exists for the sake of it.
Oh my goodness. What amazing questions! Ok here goes…
We have a central catalogue of all the components that power the bank. Think of it like a big long list of all our microservices, web apps, and cron jobs etc. We have an internal developer portal that helps people search that list and find information associated with each component. That could be docs about what the component is, or the microservices upstream or downstream from it.
The source files for that component documentation live in GitHub alongside the code and we use the same processes for reviewing documentation changes as we do for reviewing code changes. I’m a big big fan of ‘docs as code’ for many reasons, but the review flow is a big part of that.
Everyone at the company has access to the internal developer portal. We also have a central knowledge base (kind of like a company wiki) that the rest of the business also uses.
If you mean service RPC definitions where those services interact with other things then yep! We have strongly defined types. In some places and for the most critical services, we have more detailed API docs, but we need to improve how we handle these and make sure we always provide more contextual information beyond a name and data type.
So with so many components this gets a little tricky and we have to take a pragmatic approach to documentation. You’re never going to be able to comprehensively document every single little thing and keep it all up to date, and that’s ok. Our time is often spent better elsewhere. However, each component will have at minimum a very short description of the ‘why’ in its README and some links to related documentation. I’d consider that part of our ‘minimum viable docs’ criteria.
Each component is owned by a team rather than an individual. They are responsible for the documentation as well as the code and everything else around the running of that component or collection of components. If someone leaves, ownership of the docs remains with the team.
Minimising the risk of people leaving with critical information and institutional knowledge is really hard and sort of why my job exists in the first place! We encourage teams to document throughout the software development cycle, contribute regularly to talks and knowledge-sharing sessions etc. If someone does leave with some important knowledge, we try to make as much time as possible for them to document before they go. And if they’re replaced, the new Monzonaut is the perfect litmus test for those docs and to find any knowledge gaps we still need to fill.
Yep! We usually link to the original RFCs (what we call ‘proposals’ instead) as part of the docs so folks can see the original justification or design behind the system or component. That’s really great, but difficult if you want to understand to what extent the original proposal was implemented and where it differs or why it differs. That additional knowledge is sometimes missing and one of the gaps we want to focus on improving this year.
Solid question. I am loving these! So you’re absolutely right that you can go very deep on certain topics, particularly on a service by service level. But often, our engineers aren’t looking for service-specific information. They typically have some kind of task-based question they want the answer to. In those cases, we need more horizontal documentation that cuts across different parts of our architecture like you suggest. We tackle these in a few ways, by doing things like:
Creating system-level documentation that covers multiple components working together on one thing, like a particular feature
Providing entry guides on overarching topics that cross whole systems, like our coding conventions, or how to deploy a change
Those docs are kind of like ‘landing pages’ for the topic. They give an overview of the concepts needed to dive in then act as sign posts to the most relevant content. They tend to live in our central company-wide knowledge system as they’re often useful for everyone in the business.
We do auto-generate some of our docs, though not as much as I would like. We’ll be exploring more automation in future. In some cases we do auto-generate diagrams too, which is always good to see. There’ll always be a need for human-generated docs though, often to provide the missing ‘why’ behind an implementation or flow.
Not as much as we should! New joiners are given a tour through our different knowledge systems and given guidance, but I ran some recent user research that showed this isn’t enough. One of the projects I’m involved in at the moment is overhauling our entire engineering onboarding experience. A big part of that will include introducing our conventions and being given the chance to contribute to our docs much earlier. I also run ad hoc training with individual teams or individuals.
From our research, usually by searching our internal developer portal, company-wide knowledge system, or grepping a bunch of files. Findability is really gnarly so we’re always looking for ways to make it easier. Sign-posting to relevant docs and deleting/archiving old information is sometimes the most effective way to keep on top of this. I’m sure our engineers are very tired of hearing me witter on about the power of the delete button
We have a marvellous writing team who maintain our ‘Tone of Voice’, which sets expectations for any kind of writing at Monzo. A big part of that is making sure we use plain language and avoiding unintuitive terms. You’ll often see feedback on pull requests or in proposals suggesting name changes or checking if the user of the thing will be comfortable with the terms.
We don’t maintain a central glossary. I’m sure there will be knowledge management people reading this and flinching at that. While the motivation behind them is good, glossaries put the onus on the reader to go hunting for info. In our view, it should be down to the writer to make it easier to be understood. If you rely on a glossary within a team it’s a sign your terms maybe aren’t common enough to be understood and if lots of people are struggling to understand your terms, it’s a sign you might want to design/re-name something or provide better contextual information to help them understand. Also maintaining glossaries is a nightmare and not how anyone wants to spend a Tuesday
This might be my favourite question of them all. For our git-backed docs, documentation pull requests get the same scrutiny as code changes. We treat them exactly the same.
You’re absolutely right that half the battle is getting out of your own head to think about what your reader needs. That’s the old ‘curse of knowledge’ that’s so hard to shake. Documentation reviews help that, but we also do user research internally to check what people need to know about a thing and during our documentation training we always challenge writers to think about what skills, experience, knowledge, or access someone needs in order to understand and work with the thing we’re documenting. Taking just a few minutes to think about that before putting pen to paper (or hands to keyboard) is really impactful.
We’ve also introduced the start of some automated quality assessments. We have a system called ‘software excellence’ that looks at our components and assesses them against certain criteria. It generates a rudimentary ‘grade’ to let teams know how their components are doing and help them figure out where to spend their time improving existing things. Part of that ‘grade’ is calculated using some very rudimentary documentation measures. For example, how recently the docs were updated, and whether our prose linter has flagged any readability issues. It’s by no means perfect, but is extremely useful in prompting conversations about what makes good documentation and where documentation improvements are needed. I’ve got some big plans to make this more useful for engineers so will pop onto the blog when we’ve improved this.
I was really trying to avoid the answer of ‘it depends’ but it does depend! Broadly speaking it’s likely we’ll use a mix of documentation artefacts or knowledge-sharing practices for a piece of work. That could be:
A proposal outlining the problem we’re trying to solve, a suggested implementation with the measurements we’ll use to figure out if it’s worked, and an assessment of the risks and possible mitigations we need to consider
An approval or discussion in an architecture review meeting
A series of decision records as the project progresses
Skills = writing good business cases/proposals. Being able to articulate a problem and a proposed solution in a way that is meaningful to someone who may be removed from the detail is so important.
Knowledge = the inner workings of weird legacy software. Being able to interrogate the trade-offs engineers have to make on a daily basis.
Experience = Probably project managing big content projects, like migrating to new tools, and learning how to facilitate good training.
In the civil service it sometimes felt like running a series of sprints wearing shoes that were too small while people threw projectiles at you. At Monzo, we’ve got to run the same distance, but the shoes fit and your supporters are actually cheering you on. Mentally that makes such a difference.
I explained some of this in a different answer, but the biggest change for me is not having to constantly defend my profession and value. At Monzo, it’s a given that good knowledge-sharing is beneficial for the organisation. A lot of the improvements good knowledge practices bring are intangible. They don’t always fit neatly on a graph trending up or down. It takes senior leaders that believe in the value of the marathon and understand that it’s going to take time. We have significant documentation and knowledge debt to pay down while introducing new practices to keep up with the sheer pace the rest of the company is going. I fully credit our engineering leadership for always championing that work. It’s so refreshing to move beyond the ‘Why should we bother?’ question and get straight to ‘Ok, so how are we going to improve this?’
I never came across Ocelot in my time in gov, but I’m sure the support teams’ needs are very similar.
My time was mostly spent helping other engineers across the civil service understand how to use the tools and platforms that GDS built. And in a meta way, also how to handle their own documentation. We ended up building our own tech stack for docs using a ‘docs as code’ approach. Here’s an old blog post I wrote about it a few years ago.
At Monzo we have an off-the-shelf product for COps knowledge that only contains information relevant for COps. It makes it a lot easier for them to find the information they need as it’s designed with them in mind from the very start.
I was going to ask how do you make Monzonauts (with emphasis on those that don’t work in technical operations) actually read the documentation, but this is answered - in part at least - in a reply further down.
My own personal experience with this kind of thing is where I am at the moment I’ve built pages in Confluence that cover specific topics that come up repeatedly - Slack, tickets, whatever. I spend ages building the pages. I include screenshots. I keep the language simple and with as little technical jargon as possible. I cover different situations and how to deal with them. But then, on Slack usually, I’m bombarded with the same questions over and over again despite everything that’s been done to promote the documentation.