The Fifth Elephant For members

The Fifth Elephant 2023 Monsoon

On AI, industrial applications of ML, and MLOps

Make a submission

Accepting submissions till 04 Jul 2023, 12:30 PM

Bangalore International Centre (BIC), Bengaluru

Tickets

Loading…

The Fifth Elephant 2023 Monsoon Edition event recap is now up here . The event was attended by 192 participants, of which one-fourth were women. The Fifth Elephant videos are available to watch here

Event highlights:




Editors

The 2023 Monsoon edition is curated by:

  1. Nischal HP, Vice President of Data Engineering and Data Science at Scoutbee. Nischal curated the MLOps conference which was held online between 23 and 27 July 2021.
  2. Sumod Mohan, Founder and CEO at AutoInfer. Sumod curated Anthill Inside 2019 edition, held in Bangalore on 23 November.

Tracks and themes

  1. AI and Research - covers research, findings, and solutions for challenges on building models in various areas such as fraud detection, forecasting, and analytics. This track delves into the latest methodologies for handling challenges such as large-scale data processing, distributed computing, and optimizing model performance.
  2. Industrial applications of ML - covers implementation of AI in the industry, with more focus on the AI models, the issues in training, gathering data so, and so forth. ML is being used at scale in industries such as automotive, mechanical, manufacturing, agriculture, and such domains. This track focuses on the challenges in this space, as we see innovation coming out of these industries in the pursuit of using ML on a second-to-second basis.
  3. AI and Product - covers strategies for building AI products to scale and mitigating challenges. This track provides insights on incorporating AI tools and forecasting techniques to improve model training, developing a working model architecture, and using data in the business context.

There are three phases in the lifecycle of an application - research, application and aftermath of the application.

  1. Assess capabilities, determining the new frontiers for AI.
  2. Find a use for the application.
  3. Learn how to run it, monitor it and update it with time.

The three tracks at the 2023 Monsoon edition of The Fifth Elephant will cover this lifecycle.

Members-only conference

The Fifth Elephant 2023 Monsoon edition will be held in-person. Attendance is open to The Fifth Elephant members only. Pick a membership to attend the in-person conference. If you have questions about participation, post a comment here.

Who will benefit from participating in The Fifth Elephant community:

  1. Data/MLOps engineers who want to learn about state-of-the-art tools and techniques, especially from domains such as automobile, agri-tech and mechanical industries.
  2. Data scientists who want a deeper understanding of model deployment/governance.
  3. Architects who are building ML workflows that scale.
  4. Tech founders who are building products that require AI or ML.
  5. Product managers, who want to learn about the process of building AI/ML products.
  6. Directors, VPs and senior tech leadership who are building AI/ML teams.

Sponsorship

Sponsorship slots are open for:

  1. Infrastructure (GPU, CPU and cloud providers) and developer productivity tool makers who want to evangelise their offering to developers and decision-makers.
  2. Companies seeking tech branding among AI and ML developers.
  3. Venture Capital (VC) firms and investors who want to scan the landscape of innovations and innovators in AI and who want to source leads for investment in the AI and ML space.

Contact information

Join the @fifthel Telegram group or follow @fifthel on Twitter. For any inquiries, call Hasgeek at +91 7676 33 2020.

Hosted by

All about data science and machine learning

Supported by

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing

Nishant Singh

@nrohlable

Efficient AI pipeline for Entity Extraction from Government Records

Submitted Jun 30, 2023

Abstract:

Nowadays in this digital world efficient extraction of Entities from various Government records like Pan card, Adhar card, Driving License and etc. has become a priority for various use cases like Authentication, KYC Compliance, Partner/Customer Onboarding, Age Validation etc. in a wide number of sectors. Solving such an essential problem also comes with a variety of challenges like variations in image quality , uneven orientation, Inclusion of unnecessary background, Compressed Images, Proper Gap detection between texts etc. An open source solution which could efficiently provide us with these entity information while tackling all the challenges mentioned above was something we were missing out on and could help various firms based on Logistics, Manufacturing , Service - providing , Partner based start-ups etc. Motivated by these observations during initial analysis, we introduce an Entity Extraction pipeline which could be easily used for different Government records only by introducing changes specific to type of records and Entities placement / Entities specific Regex. In particular, using the Entity Extraction pipeline we were able to extract various entities i.e, Name, Date of birth , PAN ID through Pan Cards with 97.27% , 98.10%, 97.87% accuracy respectively.

Pipeline Flow:

The following module below are involved in the pipeline step-wise:

a) Card Segmentation : Binary Segementation to detect ROI from Government record images
b) Segmentation post processor : Creation of Contours based on Mask from Segmentation block and Cropping out from based on conditions
c) Image Preprocessor : Introducing changes in properties of Image for Angle Dectection
d) Angle Detection : Detection of correct oreintation of text in image using Thresholding + Mask and Houghlines Algorithm
e) OCR Block : Entity Extraction using LanyOCR and Creation of Information DataFrame
f) Regex Block : Entity Specific Regex and Entity positioning conditions along with Re-iteration of OCR based on bbox ratio and entity conditions
g) Optional Recognition block : Gap Detection using Vertical Histogram Logic and Text Recognition using DOCTR of split images

Talk Outline:

We are looking forward to discuss our Implementation under following order points :
1.) Importance of Entity Extraction from Government Records
2.) Problems associated with real time Images used for Entity Extraction
3.) How have we designed a pipeline consisting of various modules to minimize the impact of such problems
4.) Overview of Pipeline and their corresponding modules along with there working
5.) Why we choose LanOCR for Entity Extraction over other OCR Algorithms
6.) How could we use this pipeline for various records by introducing minor changes
7.) What would be the necessary steps taken in order to overcome problems related to it
8.) Possible drawbacks with the pipeline due to Image quality and human related errors
9.) Optimisation of Pipeline in-terms of Inference speed
10.) Various Use cases and Further Scope of improvements for pipeline

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 04 Jul 2023, 12:30 PM

Bangalore International Centre (BIC), Bengaluru

Hosted by

All about data science and machine learning

Supported by

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing