Scaling SolrCloud to a large number of collections
The objective of this talk is to share the challenges and learnings from setting up a large SolrCloud installation running on hundreds of nodes with thousands of collections and millions of users. This talk will also help people understand the guts of SolrCloud’s architecture.
The traditional and typical search use case is the one large search collection distributed among many nodes and shared by all users. However, there is a class of applications which need a large number of small or medium collections which can be used, managed and scaled separately. This talk will cover our effort in helping a client set up a large scale SolrCloud setup with thousands of collections running on hundreds of nodes. I will describe the bottlenecks that we found in SolrCloud when running a large number of collections. I will also take you through the multiple features and optimisations that we contributed to Apache Solr to reduce or remove the choke points in the system. Finally, I will talk about the benchmarking process and the lessons learned from the exercise.
Familiarity with Solr and SolrCloud is a must. I will explain certain concepts that might not be well known but this is by no means an introductory talk.
I am a committer on Apache Lucene/Solr since 2008 as well as a member of the Lucene/Solr project management committee. I’ve worked at AOL for five years on vertical search, content mangement systems, social/community platforms and anti-spam systems as well as AOL WebMail’s Inbox Search system which uses a highly customized version of Apache Solr to service tens of millions of users and more than a billion index/search operations a day. I currently work at LucidWorks Inc. on Apache Solr and LucidWorks Search mostly on the SolrCloud side of things. I also help organize the Bangalore Apache Solr/Lucene Meetup Group which has 350+ members and holds regular meetings of people interested in Lucene, Solr and search in general.