Using topic modelling to understand customer saving goals

Hi, I’m Emma, and I’m a Product Data Scientist here at Monzo :wave:

This blog walks you through one of the ways we use data at Monzo to learn about our customers so we can best serve their needs. We know that our customers love Pots and use them for all sorts of reasons. In this blog I walk through how we used a method called topic modelling to categorise Pots into different themes so we can understand which Pots are used for holidays, life events, or just general savings.

I hope you find it interesting!


Interesting blog, feels a little big brother (what kind of pots people have) but I guess that’s standard for most banks in terms of understanding the customer and I can see how this data is useful to understand product direction and how best to suit customer needs.

Thanks for writing it, love to see it and really enjoy reading things like this about how different companies use data to model and understand development direction.


“And our longest Pot name is 574 words”

I’m curious :thinking:

and also, I want to beat that record.


Interesting piece, but I was left feeling like it was only half an article? It would be interesting to see how those pot names then drive forward the products you create? Because if they aren’t then it is just machine learning for the sake of machine learning. The data has little value unless it is used. Would love to see a Part 2.


Really interesting post - thank you for making it available.

I love reading this kind of stuff and getting an insight into the inner workings of the operation.

I’d agree with the other comment posted above - it would be great to see a Part 2.

Just how does this data and these analytics drive product change - does it feed gradual incremental implementation of new features or is it potentially a springboard for something new, or a combination of both?

Again, many thanks,

Thanks @lpoolrob @thehamshackman :pray:

This blog post was just a taster into how we used the topic modelling method to gain new insights from different data sources. Now we have this new categorisation we can begin to use it across the business.

We could use this data to help us when making any product changes to Pots, or for helping tailor marketing for different customer needs. At the moment it’s just for insights to build a better picture of our customers, but maybe I’ll come back with Part 2 if we decide to use it for specific product changes or campaigns.


Really interesting!

I wonder if you can spot any behaviour of spending from this that you can use to improve budgeting as a whole?

Are there any users who are using pots as de facto categories?

Or rather what type and frequency of those categories exist, and can that help infer useful features to build into the planning side for new budget features.

Finally, I’ve got a pot now that is >600 words long.


I came across this today, this is the kind of product I’d hope Monzo would build using the data they have available. Obviously the sky’s the limit and this is just one example.

At the end of the product term it drops down to an instant access account.

Thanks @BritishLibrary

We definitely saw people using Pots for specific types of budgeting behaviours. For example, we had categories coming through for household bills and food, which I can imagine customers are using to portion their income for specific use cases.

Another interesting behaviour we saw was a “date/time” category where Pot names were dedicated to months or weeks of the year. When looking into money movement between different Pots, we saw customers who moved money between two “date/time” Pots, e.g. from their January Pot into their February Pot. It was an interesting behaviour to observe and definitely evidence that Pots are a great tool for budgeting.

Also, 600 words! :exploding_head: My blog post is already out of date :joy:


Thank you Emma for the very interesting blog post and for sharing your methodology. As a fellow data scientist, and a big Monzo fan, I found the insights of your project very interesting. Out of curiosity, which ML framework did you use and do you have any tips for other scientists working on similar projects?

1 Like

@mpogias I’m glad you found it interesting! :pray:

I used the biterm package for the implementation of the topic modelling, and the majority of the data cleaning was done through Pandas functions.

In terms of tips, I think it’s worth spending time making sure your data is in a good place. You can automate a lot of the cleaning, such as removing spaces and punctuation, converting to lower case, or replacing emojis with their text form. But the other part was really understanding the data and doing some manual work to clean it; in this use case, we found common typos or abbreviations that customers would use. Knowing this we were able to write manual functions to help with the data cleaning.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.