Laying the Foundation for a Data Team

bailey · 30 November 2016 14:25

anon87313011 · 30 November 2016 15:23

I have done some work on a similar thing for a big startup in London
Streaming events from backend services and the mobile app to a Kafka endpoint, but involving Redshift and creation of datasets on S3. We did not do streaming inserts, but there are projects for it for Redshift. Have you tried that?

anon77843727 · 30 November 2016 16:19

@anon87313011 sorry, we didn’t try to do anything with Redshift. We settled on BigQuery pretty much from the beginning. It supports streaming inserts out of the box.

NamesKieran · 30 November 2016 20:41

I love how open is with everyone, even thought I dont understand half of what actually goes on but I love reading through these regardless.

p.s 10/10 for the drawings.

anon77843727 · 1 December 2016 09:38

thanks Kieran

anon31150033 · 3 December 2016 22:47

Same here even though I’m far far away in West Africa

anon94057127 · 4 December 2016 12:29

Nice post!

Interesting to see Google big query being used in Fintec world. Is there a reason you have decided not to further expand your usage of cassandra to cover the use cases for bigquery? Is there a worry over consistency with cassandra due to the eventual consistent model? Great to see kafka being used also!

anon77843727 · 6 December 2016 15:04

Hi munkee, BigQuery fundamentally serves a different purpose. It’s good for storing large amounts of structured data which can be easily analysed. Performing complicated transformations and joins is easy and fast . Cassandra is ultimately a transactional database and even-though there are some attempts to pair it with e.g. Spark for in-memory analytics it’s not built for complicated queries. Hope it makes sense.

anon40782166 · 22 January 2017 11:46

Hi dmitri,

Thank you for confirming my doubts about BigQuery. I knew it was good, but not as good and easy as advertised on conferences.

Yes, Exasol can definitely help you to speed-up small queries. We have a lot of experience with it, including many topics not covered in manuals.

Maybe you can even fit in limitations of free small business version and skip any payments and licensing fees until Monzo grows really big. 200Gb of compressed memory buffer holds approximately 1Tb of hot raw data.

You may even consider switching from BigQuery to Exasol altogether. Yes, it does not have this seamless integration with other Google services, and you’ll need to store ExaStorage compressed data separately. But it provides real convenience to end users.

E.g., no problems with non-standard SQL. No need to store all events in one huge table. One table for each event is OK. No need to worry about really big joins, projections, partitioning etc. You also have real ACID (commit \ rollback).

Swaroop · 11 September 2019 22:05

Hi I really loved reading about technology blogs. Is there any real reason for keeping two separate database Cassandra and big query for reporting. If you are creating 1 single source of truth will it not break all the principles of microservices? How do you manage the testing changes of all the models built on this huge monolith single source of truth? I hope my question make sense

kolok · 12 September 2019 00:31

Just seen this article, what’s your AWS bill , more than I make in a year each week probably.

Topic		Replies	Views
How we scaled our data team from 1 to 30 people (part 1) News & Updates	9	3592	27 January 2020
We secured thousands of Cassandra clients to keep Monzo's data safe News & Updates	4	2369	6 January 2020
Building a Modern Bank Backend News & Updates	29	13729	2 August 2018
How we built a backend for our £20 million crowdfunding round News & Updates	27	2616	18 January 2019
Eventual consistency vs financial transactions Monzo Chat	1	1782	8 August 2016

Laying the Foundation for a Data Team

Related topics