Time Processing and Watermarks using Google Pub\Sub and Google DataFlow.
Submitted by Swapnil Dubey (@swapnildubey) on Thursday, 29 June 2017
Crisp talk for data engineering track
In this talk, we will be discussing about the watermark ingestion pattern of doing realtime analytics using Google Cloud Platform. Talk will primary cover the use case of intermittant data ingestion in a system(for example: a system ingesting data going offline frequently) and how to perform Real Time Analytics using GCP as platform.
- What is Google Pub/sub?
- What is Dataflow(Apache Beam)?
- What is Watermarking?
- How to implement the watermark pattern using Google Pub/Sub and Dataflow?
Swapnil Dubey has close to 8.5 years of work experience and is currently working as Data Engineer at Schlumberger. Prior to Schlumberger, he was working as Lead Engineer Big Data at Snapdeal in realtime data anlytics team.
He is a Certified Developer for Spark(Datastax) ,Cloudera Certified Hadoop developer, Cloudera Certified Hadoop Administrator and Cloudera Certified Hbase Specialist. In past, he has worked as Certified Cloudera Trainer in India.
He has worked on several projects in the BFSI,Ad serving and e commerce domains. He has worked extensively in the data quality domain using Hadoop,Spark and Kafka as the primary technology. He has an inclination to open source technologies and likes to explore/delve into new framework.