The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Amrit Sarkar

@sarkaramrit2

Scale Search Infrastructure with Apache Solr and Kubernetes

Submitted May 28, 2020

Kubernetes is fast becoming the operating system for the Cloud and brings a ubiquity that has the potential for massive benefits for technology organizations. Applications/Microservices are moved to orchestration tools like Kubernetes to leverage features like horizontal autoscaling, fault tolerance, CICD and more.

Apache Solr is an open-source search engine platform built on an Apache Lucene library. It offers Apache Lucene’s search capabilities in a user-friendly way. Lucidworks Inc runs over a thousand distributed-mode Apache Solr Clusters spread across several machines for a plethora of use-cases around Search and Analytics. The traffic demands a massive scale which creates scenarios of in-depth micro-management like operating systems upgrade, scaling cluster dynamically, etc, affecting the overall search experience.

This talk is focussed on the journey taken by Lucidworks on addressing scaling clusters horizontally and vertically, on the basis of query traffic load, data ingestion throughput or any other relevant metrics by extending capabilities of Kubernetes and Apache Solr to achieve true physical and logical autoscaling, satisfying modern era SLAs and infrastructure cost. The talk concludes with how the solution adopted opens up the future scope of fine-grained scaling of search clusters.

Outline

  • Motivation.
    • Why we need smart self-scale system for any e-commerce search.
  • Limitations in open source search engines, like Solr and Elasticsearch at scaling.
  • Apache Lucnee and Apache Solr.
    • Short introduction to Apache Lucene & Solr.
    • Which tools of Solr are used for a typical e-commerce search.
    • Discuss Autoscaling feature in Apache Solr.
  • Short Introduction to Kubernetes.
    • Discuss Horizontal Pod Autoscaling feature (HPA) of Kubernetes
  • Discuss the integration of Autoscaling feature of Solr and HPA of Kubernetes to achieve traffic driven search cluster.
    • Showcase an example configuration.
    • Live Demonstration - simulate live traffic, and see a live search cluster scaling up and down.
  • Discuss future scope
    • Scaling search cluster on fine-grained attributes, instead of overall traffic load
  • Referneces

Speaker bio

Amrit Sarkar is Cloud Search Reliability Engineer at Lucidworks Inc, California-based enterprise search technology company, with 5+ years experience in search domain and big data, e-commerce and product. He is working primarily on running search-based applications on Kubernetes, and developing and improving core components of Apache Solr.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more