Regression Testing

Version 1.8.1 (28 Oct 2016):

“Oops. We accidentally stopped new users from receiving notifications with transactions… Sorry about that - it’s now fixed.”

I’d heard that legacy companies use a boring thing called regression testing. What is the equivalent in Monzo’s world please? And how will this type of development system failure be prevented from happening again?

Hey Colin!

First up - let me apologise for the inconvenience this bug caused, it’s not acceptable from us - and every incident prompts our engineering team to look back and put additional measures in place to prevent it happening again.

Something we don’t mention too often is how we deal with regressions, and testing around releases, we actually have a few different testing and QA phases on the iOS product team, which was the origin of this defect.

:hammer_and_wrench: Unit tests

During development, our continuous integration platform runs a suite of unit tests designed to individually test isolated components of our iOS app, for example - that our model objects can correctly deserialise from a JSON object our backend service provides, or that our mobile number formatter correctly formats mobile numbers. We enforce good testing through our Peer review process, so that all new functional code that can be tested, is tested in a useful way. We have a few hundred of these tests currently.

:satellite: API tests

We test all our different API service integrations against pre-canned or “stubbed” data, this ensures that we can quickly verify that the app is handling responses (or errors), and is structuring it’s various network requests correctly. This has saved our bacon a few times, especially during development against new services (this request should be a PUT not a POST, etc)

:camera_flash: Snapshot tests

As we build our user interfaces, we record snapshots, and write tests to compare these during continuous integration and deployment, and screens that don’t match their pre-canned image will cause a failure, and therefore won’t be eligible to be deployed.

:iphone: Integration tests

We currently have a suite of tests which run a simulated version of the app, tap some buttons, and compare the output of that action. Sadly this area is a little flakey at the moment, so they aren’t running automatically and require us to manually verify that tests pass before a deployment, but we’re working hard to get this into our automatic test pipeline. @jgarnham wrote an excellent blog post on this here: https://monzo.com/blog/2016/04/26/automated-testing/

:fire: Smoke / regression tests

As a ‘last line of defence’, before a release candidate build goes out to our TestFlight beta testing channel, we perform a smoke test on the build. This is performed by a human, and runs through a pre-written script of scenarios to ensure they all work as expected. It’s time consuming, but ensures we don’t (usually) miss anything obvious that might have slipped through the nets of the previous tests.

After this, builds go out to our TestFlight beta test superstars, who kick the tyres and report back any oddities, we very rarely make changes between TestFlight and full releases, but if we do ship with something “not quite working as expected”, we’ll continue to iterate with our beta testers before going to the App Store.

Sadly, in this case - the bug was a little more difficult to spot (and slipped through all of these nets). Our app was not correctly reporting back the APNS (Apple Push Notification Service) token to our backend service, in certain scenarios.

This meant that certain users had approved push notification permissions, but we weren’t able to send messages to them.

We were able to reproduce the issue reliably on Friday morning, and worked on a patch hotfix. This was then independently reviewed by 2 other engineers, and tested on a number of internal accounts after reproducing the original issue. After running a smoke test to ensure we hadn’t introduced further regressions, and making sure our full unit test suite passed, we deployed the fix to the App Store, and requested an expedited review in order to minimise the number of users affected. We were lucky enough to receive approval for this, and on Friday evening around 8:00pm the 1.8.1 patch release went live on the App Store.

As a result of this issue, we’ve added further steps to our smoke testing script to cover this explicit case, and we’re investigating what more we can do on the automated testing side to beef this up too.

Sadly, no amount of testing can prevent things falling through the net, we’re making sure our ability and speed at reacting to live issues is second to none, so we can minimise the impact of any issue before it becomes a real problem :slight_smile:

As always - happy to answer questions about our testing practices! If you read this whole thing, and still want more, we’re currently hiring for product and testing roles, get in touch via https://monzo.com/careers/ for more info.

10 Likes

Fantastic writeup @andys. Very interesting to see what is happening behind the scenes. And from my point of view you have testing very well covered.

Obviously all testing cannot be 100% and this is where bug fixing occurs. If there was 100% coverage there would be no bugs :wink:

Thank you for the great support and information.

3 Likes

That’s a really helpful and extensive reply @andys - thank you

It’s good that Monzo isn’t all about flash and show, and that the heavy engineering to build a sustainable platform is going on too :slight_smile:

As a slight aside, I wonder if you also develop test cases for automated threats? As co-author of this free OWASP handbook ( https://www.owasp.org/index.php/File:Automated-threat-handbook.pdf ) , I wonder if it might help with some ideas. We’re releasing v1.1 of the handbook later this week (same URL) adding countermeasure suggestions for each automated threat. The threats are web application related and thus mainly for the web APIs, the main website, the community forum, the crowdfunding platform, etc, but some might be relevant for the mobile app too.

2 Likes

Yes was a very good reply.

If they’re not already in place, are there any plans in the near future to do regular penetration tests, DDoS prevention tests as well as any other security threats out there? Understandably, that you’re not able to give us the details of such security testing otherwise that’d defeat the purpose. Just good to know that defences against such attacks are being considered.

1 Like