IN/Clojure 2020

India's annual Clojure and ClojureScript conference. 14th-15th Feb, 2020. Pune, MH, IN.

Attend

Building data platforms from business stores using Clojure

Submitted by Mayur Jadhav (@mj13) on Thursday, 14 November 2019

🌑 Submission Type: Full (40 minutes) Status: Confirmed & Scheduled

View proposal in schedule

Abstract

You have just bootstrapped and is catering thousands(or maybe millions) of happy users. Like most good applications, tech stack starts with a battle-tested RDBMS/NoSQL crud operations. Then, you start looking into user behavior, user’s interaction with the system to provide a customized experience or to deliver the next set of cool features. The ideal way to achieve such analysis is by sending custom events that can derive required metrics. Another hassle free approach is to capture change data of existing databases as an event stream. In this session, I will discuss the latter approach, it’s benefits and kind of use cases we solved with the power of Clojure. Clojure is a core part this architecture which handles spawning/destroying on-demand EMR clusters independent of cloud providers, DAG execution of EMR jobs, etc. Clojure REPL helped to speed up our development by reducing the time required to write and validate adhoc EMR queries.

Outline

  • Introduction
  • Use cases
  • Building OLAP on top of RDBMS
  • Hyper scalable function triggering platform(alternative to RDBMS triggers but scalable)
  • Cheaper Point in time backups
  • Data replication across multi-variety of databases or across multi versions
  • Architecture
  • Independent building blocks which include:
    • Log Reader
    • Distributed Queue System (For eg Apache Kafka)
    • Cloud Storage
    • Query services
    • Visualization tools
  • Benefits
  • Components Synchronisation
  • Power of Clojure
  • Writing EMR jobs
  • Spawning and destroying Hadoop cluster on-demand
  • Support for AWS, GCP
  • By-default auto-scaling supported
  • DAG Execution
  • Development with REPL
  • Integration with Spark, Hive, Pig
  • Obstacles and Learnings
  • Low-cost Scalable platform
  • Conclusion
  • Demo showing for any operations in DB
  • Kafka being populated
  • S3 dumps
  • Lambda function being triggered

NOTE: Slides WIP

Requirements

Basic knowledge of Clojure and data analytics

Speaker bio

Hi, I am co-founder of Dataorc, a Data oriented startup based out of Pune. I started my professional journey with Clojure at my previous company Helpshift and have been coding in same for last 6 years. Even at Dataorc, almost every project have some part developed in Clojure. I have used Clojure for architecting automation frameworks, building super scalable backends, distributed crawlers, munching TBs of data with it.

Comments

Login with Twitter or Google to leave a comment