Say hello to iOS Health team at Monzo 🍎 🩺

Hey folks! :wave:t2:

Here’s an update on what iOS Health has been up to since our last update.

Crashes

We’ve been on a bit of a rollercoaster with crashes, I’m going to split the update into two parts. The first part will cover 30th March (date of our last update) to 16th April (the day before version 5.20.0 was released on the AppStore). The second part will cover the 17th April until today.

30th March → 16th April

We managed to get fixes in for our 3 of our biggest crashes (one of which was either our top crash, or second top crash, depending on various factors). Two of the fixes were to do with how we handle references we have to “live” objects in Realm (the primary technology we use to store data to disk on your iPhones).

The 3rd fix was to do with navigation, and was one of the first crashes we started digging into shortly after the team was formed. So we’ve been hunting this one down for months. The stack trace didn’t give us much to go on, and after lots of trial and error we were only able to replicate the crash in a demo project. If you push three screens simultaneously onto a navigation stack, where only some of those pushes are animated, it puts the navigation stack into a strange state. If you then tap the back button twice you’d get the crash. So, we knew how to replicate in a demo app, but we had yet to replicate it in our app. We eventually got there, and it turned out to be 4 separate factors which, when combined, led to us replicating the crash in the app:

  • Either a bug in Apples UINavigationController, or potentially API misuse on our part, when pushing multiple screens simultaneously.
  • A specific kind of inbox message (the wee messages you sometimes see at the top of your feed) being in the users inbox.
  • A bug in our code which, in some circumstances :point_down:, caused us to push two transaction details screens when tapping on a transaction push notification.
  • Opening a transaction details push notification after the phone has been restarted (we suspect if we killed the app for a decent chunk of time that might have worked too, we were also suspicious of the new app “prewarming” feature in iOS 15, but didn’t investigate these avenues any further).

These three fixes meant that in version 5.19.0 of the app, our crash numbers “fell off a cliff” and we dropped below our target by a significant amount :chart_with_downwards_trend:

We were delighted, although that turned out to be short lived.

30th March → Today

In version 5.20.0 of the iOS app, we saw a regression in one of our biggest crashes, and also a new (related) crash. Combined, the two undid our last few months of effort:

The crashes appear to be concurrency issues stemming from the way we access Realm on background threads. We upgraded to a newer version of Realm in 5.20.0 which we believe to be the culprit. Downgrading Realm wasn’t an option, so we got to work on a fix. Thanks to some excellent investigative work by some members of the iOS team, as well as our very own Lee Watkins, we’ve got a potential fix which we’re fairly confident in. Testing this week, and planning to ship on the 22nd May (version 5.25.0 if I have my maths right).

New target

While we’ve recently seen a jump taking us back above our initial target, we’re confident that with the fix of the two new crashes introduced in 5.20.0 we’ll be comfortably below our current target of 0.5%. With that in mind, we did some investigating into what we could achieve by the end of July - and our new target is 0.2%. Bear in mind that this is just a milestone, and not our end goal for crash rates.

Logouts

We’ve been hard at work on building out the self-recovery mechanism we described in the last update, and should have it implemented (hopefully with some downwards trending graphs :chart_with_downwards_trend: to share) by our next update.

The work we’ve done so far has given us an insight into the impact the self-recovery mechanism will have, and its looking like it should address at least 2500 logouts per day.

Performance

We’ve been working hard breaking down our app launch times to try and understand where the slowdowns are. We’ve pinpointed a promising part of our startup process which would give us the biggest increase, and are doing some digging to find out how much work would be involved. Should have some more information by the next update.

We made some progress on excessive disk usage by making use of the new compaction features in the new version of Realm we upgraded to, although there’s more work to be done in this area.

The new version of Realm also came with some other unexpected, but very welcome, side effects. We’ve seen an improvement in performance across the board (in hang rates, scroll hitch rates, and battery usage).

We’d like to thank everyone who helped with our initial investigation into excessive disk usage. It wasn’t an easy thing to track whether or not we’d made an impact, so having folk give us feedback on changes with each app release was incredibly useful. Thank you!

Summary

Over the last month we’ve made some great progress on crashes, hampered slightly by some new issues which came off the back of us upgrading Realm in 5.20.0. We’re confident that once we fix those new issues we’ll be well on our way to our new target for crash rates.

We’ve made great progress in implementing a self-recovery mechanism which should significantly decrease the amount of users who are logged out of the app.

32 Likes