The art and science of exploiting near-similar text and images

Jul 2013

8 Mon

9 Tue

10 Wed

11 Thu 09:30 AM – 04:30 PM IST

12 Fri 10:15 AM – 05:30 PM IST

13 Sat 10:15 AM – 05:30 PM IST

14 Sun

Nimhans Convention Centre

All submissions

Previous Next

The art and science of exploiting near-similar text and images

Submitted Jun 5, 2013

Section: Analytics and Visualization Technical level: Intermediate

Big Data, by its inherent nature, will have near-similar items. Identifying the repetitions and, even better, leveraging them to get your job done is both an art and science. The goal of this talk is to share some experiences with this and to get you excited about this.

Outline

I will first motivate how data repetitions provide an opportunity in several tasks: image recognition, spam detection, string matching, etc. I will then talk about specific techniques for scalably identifying such near-duplicates: signature-based near-duplicate image detection, sequence mining, new string similarity measure. By then, hopefully, you’re excited enough to take a relook at your data.

Previous Next

Comments

Jul 2013

8 Mon

9 Tue

10 Wed

11 Thu 09:30 AM – 04:30 PM IST

12 Fri 10:15 AM – 05:30 PM IST

13 Sat 10:15 AM – 05:30 PM IST

14 Sun

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

The Fifth Elephant 2013

The art and science of exploiting near-similar text and images

Outline

Requirements

Speaker bio

Slides

Comments