Recent Blog Posts

Distributed log analytics using Apache Kafka, Kafka Connect and Fluentd

As always you can find the full code discussed in this post on Cloudbox Labs github. … At Cloudbox Labs, we think logs is an incredibly interesting dataset. They are the heart beats of our tech stack. They give us insight into how users interact with us. They provider real ...

Tracking NYC Citi Bike real time utilization using Kafka Streams

  As always you can find the full code discussed in this post on Cloudbox Labs github. … We are big fans of Apache Kafka when it comes to building distributed real time stream processing systems. It’s massively scalable, has simple pub-sub semantics, and offers fault-tolerant persistent data store. It’s ...

Building a real time quant trading engine on Google Cloud Dataflow and Apache Beam

As always you can find the full code discussed in this post on Cloudbox Labs github. … Google Cloud has fully managed services that allows end users to build big data pipelines for their analytical needs. One of them is called Dataflow. It allows developers to build data pipelines based ...