The Fifth Elephant Pune Meetup

A workshop and meetup in Pune about data science, analytics and machine learning.

Time Processing and Watermarks using Google Pub\Sub and Google DataFlow.

Submitted by Swapnil Dubey (@swapnildubey) on Thursday, 29 June 2017

videocam_off

Technical level

Advanced

Section

Crisp talk for data engineering track

Status

Confirmed

Vote on this proposal

Login to vote

Total votes:  +5

Abstract

In this talk, we will be discussing about the watermark ingestion pattern of doing realtime analytics using Google Cloud Platform. Talk will primary cover the use case of intermittant data ingestion in a system(for example: a system ingesting data going offline frequently) and how to perform Real Time Analytics using GCP as platform.

Outline

  • What is Google Pub/sub?
  • What is Dataflow(Apache Beam)?
  • What is Watermarking?
  • How to implement the watermark pattern using Google Pub/Sub and Dataflow?

Speaker bio

Swapnil Dubey has close to 8.5 years of work experience and is currently working as Data Engineer at Schlumberger. Prior to Schlumberger, he was working as Lead Engineer Big Data at Snapdeal in realtime data anlytics team.

He is a Certified Developer for Spark(Datastax) ,Cloudera Certified Hadoop developer, Cloudera Certified Hadoop Administrator and Cloudera Certified Hbase Specialist. In past, he has worked as Certified Cloudera Trainer in India.

He has worked on several projects in the BFSI,Ad serving and e commerce domains. He has worked extensively in the data quality domain using Hadoop,Spark and Kafka as the primary technology. He has an inclination to open source technologies and likes to explore/delve into new framework.

Comments

Login with Twitter or Google to leave a comment