The Fifth Elephant 2014

A conference on big data and analytics

Building distributed search applications using Apache SOLR

Submitted by Saumitra Srivastav (@saumitra) on Monday, 28 April 2014

videocam
Preview video

Technical level

Beginner

Section

Workshops

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +8

Objective

The objective of this workshop is to introduce attendees with most common features of a search application and how to implement them using Apache Solr. The workshop will also cover how to scale the application by leveraging SolrCloud.

We will see the performance factors which should be kept in mind while desinging your application using Solr and if time permits, without going into ElasticSearch details, I will share the differences between Solr and ElasticSearch.

Description

For the workshop, we will index and search data from 'StackExchange' sites using dumps available here and build backend for following demo application: saumitra.me/solrdemo/. (Tested only on Chrome and Firefox).

Agenda:

  1. What is Solr? Use cases and architecture
  2. Solr schema, config, tokenizers and filters
  3. Indexing data:
    • From disk using SolrJ
    • Importing from database(MySQL) with DataImport Handler
  4. Querying Solr (Filters, Faceting, highlighting, sorting, grouping, boosting, range, function and fuzzy queries)
  5. Adding 'Auto Suggest' component to auto complete user queries
  6. Using 'Clustering' component to cluster similar results.
  7. SolrCloud
    • Architecture
    • Setting up a multinode cluster with Zookeeper
    • Creating a distributed index
    • Collections API
  8. Solr Admin UI

Requirements

Workshop will be delivered using a virtual image. You need to have VirtualBox installed to import the image. Download OS specific version of VirtualBox from https://www.virtualbox.org/wiki/Downloads.

Try to bring laptop with at least 4 GB RAM. Operating system MUST be 64 bit. Participant should have basic Java knowledge.

Speaker bio

Links

Slides

http://www.slideshare.net/saumitra121/apache-solr-workshop

Preview video

https://www.youtube.com/watch?v=iVaVC-uD6JY

Comments

  • 1
    Vinayak Hegde (@vin) 4 years ago

    Sounds good. Have you presented this workshop before ? Also it would be more interesting if you can compare the approaches that Solr takes with Elastic search ?

  • 1
    Saumitra Srivastav (@saumitra) Proposer 4 years ago

    Thanks Vinayank. Yes, I have presented this workshop before couple of times.

    Comparing Solr with ElasticSearch is a very good suggestion. This is the first choice people have to make before building a search solution on top of lucene. We recently did a comparative study between these focused on features and scalability. I will put up few slides to share the results. I might have to remove “Search Engines 101” to accommodate for it.

  • 1
    Rohit Chatterjee (@rohitchatterjee) 4 years ago

    Hello Saumitra, are you going to make the VirtualBox image available before the workshop?

    • 1
      Saumitra Srivastav (@saumitra) Proposer 4 years ago

      Yes Rohit. We will provide the image during workshop. Image will be 64-bit, so make sure that your OS is also 64 bit.

  • 1
    Sreenath Ramanna (@sreenath-ramanna) 4 years ago

    Hi Saumitra,

    If you can make the image availabl sooner (may be today) it may be better. If all the attendees try to download the image simultaneously, considering typical sizes of images (I guess a few GBs), it may over load the network at workshop. Some people may not even be able get it ready before the workshop starts.

    Sreenath

    • 1
      Saumitra Srivastav (@saumitra) Proposer 4 years ago

      Hi Sreenath, thanks for sharing the concern. We will distribute image through pen drives - in morning itself. Its 5gb zipped file(14gb unzipped) and it’s not pratical to share it through network.

Login with Twitter or Google to leave a comment