Laying the Foundation for a Data Team


(Bailey Kursar) #1

#2

I have done some work on a similar thing for a big startup in London :slight_smile:
Streaming events from backend services and the mobile app to a Kafka endpoint, but involving Redshift and creation of datasets on S3. We did not do streaming inserts, but there are projects for it for Redshift. Have you tried that?


(Dimitri Masin) #3

@flagZ sorry, we didn’t try to do anything with Redshift. We settled on BigQuery pretty much from the beginning. It supports streaming inserts out of the box.


(Kieran McCann ) #4

I love how open :monzo: is with everyone, even thought I dont understand half of what actually goes on but I love reading through these regardless.

p.s 10/10 for the drawings. :stuck_out_tongue_closed_eyes:


(Dimitri Masin) #5

thanks Kieran :smiley:


(Olumide) #6

Same here even though I’m far far away in West Africa :joy:


#7

Nice post!

Interesting to see Google big query being used in Fintec world. Is there a reason you have decided not to further expand your usage of cassandra to cover the use cases for bigquery? Is there a worry over consistency with cassandra due to the eventual consistent model? Great to see kafka being used also!


(Dimitri Masin) #8

Hi munkee, BigQuery fundamentally serves a different purpose. It’s good for storing large amounts of structured data which can be easily analysed. Performing complicated transformations and joins is easy and fast . Cassandra is ultimately a transactional database and even-though there are some attempts to pair it with e.g. Spark for in-memory analytics it’s not built for complicated queries. Hope it makes sense.


#9

Hi dmitri,

Thank you for confirming my doubts about BigQuery. I knew it was good, but not as good and easy as advertised on conferences.

Yes, Exasol can definitely help you to speed-up small queries. We have a lot of experience with it, including many topics not covered in manuals.

Maybe you can even fit in limitations of free small business version and skip any payments and licensing fees until Monzo grows really big. 200Gb of compressed memory buffer holds approximately 1Tb of hot raw data.

You may even consider switching from BigQuery to Exasol altogether. Yes, it does not have this seamless integration with other Google services, and you’ll need to store ExaStorage compressed data separately. But it provides real convenience to end users.

E.g., no problems with non-standard SQL. No need to store all events in one huge table. One table for each event is OK. No need to worry about really big joins, projections, partitioning etc. You also have real ACID (commit \ rollback).