Laying the Foundation for a Data Team

4 Likes

I have done some work on a similar thing for a big startup in London :slight_smile:
Streaming events from backend services and the mobile app to a Kafka endpoint, but involving Redshift and creation of datasets on S3. We did not do streaming inserts, but there are projects for it for Redshift. Have you tried that?

@flagZ sorry, we didn’t try to do anything with Redshift. We settled on BigQuery pretty much from the beginning. It supports streaming inserts out of the box.

I love how open :monzo: is with everyone, even thought I dont understand half of what actually goes on but I love reading through these regardless.

p.s 10/10 for the drawings. :stuck_out_tongue_closed_eyes:

4 Likes

thanks Kieran :smiley:

1 Like

Same here even though I’m far far away in West Africa :joy:

2 Likes

Nice post!

Interesting to see Google big query being used in Fintec world. Is there a reason you have decided not to further expand your usage of cassandra to cover the use cases for bigquery? Is there a worry over consistency with cassandra due to the eventual consistent model? Great to see kafka being used also!

Hi munkee, BigQuery fundamentally serves a different purpose. It’s good for storing large amounts of structured data which can be easily analysed. Performing complicated transformations and joins is easy and fast . Cassandra is ultimately a transactional database and even-though there are some attempts to pair it with e.g. Spark for in-memory analytics it’s not built for complicated queries. Hope it makes sense.

1 Like

Hi dmitri,

Thank you for confirming my doubts about BigQuery. I knew it was good, but not as good and easy as advertised on conferences.

Yes, Exasol can definitely help you to speed-up small queries. We have a lot of experience with it, including many topics not covered in manuals.

Maybe you can even fit in limitations of free small business version and skip any payments and licensing fees until Monzo grows really big. 200Gb of compressed memory buffer holds approximately 1Tb of hot raw data.

You may even consider switching from BigQuery to Exasol altogether. Yes, it does not have this seamless integration with other Google services, and you’ll need to store ExaStorage compressed data separately. But it provides real convenience to end users.

E.g., no problems with non-standard SQL. No need to store all events in one huge table. One table for each event is OK. No need to worry about really big joins, projections, partitioning etc. You also have real ACID (commit \ rollback).

Hi I really loved reading about technology blogs. Is there any real reason for keeping two separate database Cassandra and big query for reporting. If you are creating 1 single source of truth will it not break all the principles of microservices? How do you manage the testing changes of all the models built on this huge monolith single source of truth? I hope my question make sense

Just seen this article, what’s your AWS bill , more than I make in a year each week probably.