Nandan Thakur

@nthakur20

FlashText – A Python Library 28x faster than Regular Expressions for NLP tasks

Submitted Jun 15, 2019

Data Science starts with data cleaning. When developers are working with text, they often clean it up first. Sometimes by replacing keywords (“Javascript” with “JavaScript”) while other times, to find out whether a keyword (“JavaScript”) was mentioned in a document. In today’s fast-moving world, bigger and bigger datasets are coming up with tens of thousands to millions of documents. the amount of time one would want to invest in cleaning these gigantic datasets would take them days using RegEx (5 days ~ 20K keywords and 3 Million documents). Therefore, FlashText - a super blazingly fast library reduced days of computation time into few minutes (15mins ~ 20K keywords and 3 Million documents). FlashText is efficient at both extracting keywords and replacing them in sentences and has been implemented using the Aho-Corasick algorithm and the Trie Data Structure approach.

Outline

[0-3mins]: Brief Introduction about Myself. Introduction to FlashText and compare FlashText vs. Regular Expressions Performance.

[3-8mins]: How is FlashText so blazingly fast?

[8-10mins]: When to Use FlashText?

[10-12mins]: Installing FlashText.

[12-15mins]: UseCase 1: Code – Searching for words in a text document

[15-18mins]: UseCase 2: Code – Replacing words in a text document

[18-20mins]: End Notes and Feedback for Future Talks.

Requirements

Not a workshop

Speaker bio

I am a perpetual, quick learner and keen to explore the realm of Data Analytics and Science. I am deeply excited about the times we live in and the rate at which data is being generated and being transformed as an asset. I am well versed in domains such as Natural Language Processing, Machine Learning, and Signal Processing and share a keen interest in learning interdisciplinary concepts involving Machine Learning.

Links

Slides

https://drive.google.com/open?id=1WZ6MU80Qoz5znd89p9aSzTKxAor4Mo6zMvF2qPKqRyA

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}