The Fifth Elephant Pune Meetup

A workshop and meetup in Pune about data science, analytics and machine learning.

Swapnil Dubey


Time Processing and Watermarks using Google Pub\Sub and Google DataFlow.

Submitted Jun 29, 2017

In this talk, we will be discussing about the watermark ingestion pattern of doing realtime analytics using Google Cloud Platform. Talk will primary cover the use case of intermittant data ingestion in a system(for example: a system ingesting data going offline frequently) and how to perform Real Time Analytics using GCP as platform.


  • What is Google Pub/sub?
  • What is Dataflow(Apache Beam)?
  • What is Watermarking?
  • How to implement the watermark pattern using Google Pub/Sub and Dataflow?

Speaker bio

Swapnil Dubey has close to 8.5 years of work experience and is currently working as Data Engineer at Schlumberger. Prior to Schlumberger, he was working as Lead Engineer Big Data at Snapdeal in realtime data anlytics team.

He is a Certified Developer for Spark(Datastax) ,Cloudera Certified Hadoop developer, Cloudera Certified Hadoop Administrator and Cloudera Certified Hbase Specialist. In past, he has worked as Certified Cloudera Trainer in India.

He has worked on several projects in the BFSI,Ad serving and e commerce domains. He has worked extensively in the data quality domain using Hadoop,Spark and Kafka as the primary technology. He has an inclination to open source technologies and likes to explore/delve into new framework.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning