Nov 2019
18 Mon
19 Tue
20 Wed
21 Thu
22 Fri
23 Sat 08:30 AM – 05:30 PM IST
24 Sun
Make a submission
Accepting submissions till 01 Nov 2019, 04:20 PM
Nov 2019
18 Mon
19 Tue
20 Wed
21 Thu
22 Fri
23 Sat 08:30 AM – 05:30 PM IST
24 Sun
Accepting submissions till 01 Nov 2019, 04:20 PM
##About the 2019 edition:
The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule
The conference has three tracks:
#Who should attend Anthill Inside:
Anthill Inside is a platform for:
For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to sales@hasgeek.com
#Sponsors:
Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.
#Bronze Sponsor
#Community Sponsor
Hosted by
Govind Chandrasekhar
This presentation will look at how you can generate product catalogs from ecommerce websites using just the homepage URL of the website. Techniques explored include URL clustering, regex generation, reinforcement learning and supervised classification.
Presentation structure:
Intro: What the problem is, why it’s useful and its roots in the Semantic Web movement.
Identifying Product URLs: The need to identify product pages from just their URLs. Using URL signatures + clustering + regex generation + supervised classification to solve this problem.
Spidering Strategy: Optimal strategy for spidering through the website to find product URLs, using reinforcement learning techniques.
Context Extraction: Techniques for extracting structured data from HTML + rendered webpages, notably through the use of bounding boxes. Then, we look at variation identification and extraction through the use of headless browsers.
Govind is a co-founder of Semantics3. Semantics3 offers data and AI based enterprise solutions for ecommerce marketplaces (catalog generation & enrichment, seller on-boarding) and logistics companies (HTS/tariff classification, attribute enrichment). We’re a 7+ year old Y Combinator backed startup based in Bengaluru, San Francisco and Singapore.
Our data-science team works on problems like product categorization, product matching, named entity recognition and unsupervised content extraction.
Nov 2019
18 Mon
19 Tue
20 Wed
21 Thu
22 Fri
23 Sat 08:30 AM – 05:30 PM IST
24 Sun
Accepting submissions till 01 Nov 2019, 04:20 PM
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}