The Catalog as a Catalyst - Bringing benefits of Big Data to MSMEs
While large enterprises have the necessary resources to acquire and process Big Data, the Micro / Small / Medium enterprises in emerging economies like India are far from being ‘data-driven’. This is a huge opportunity untapped, considering that MSMEs account for more than 99% of businesses, and they make up the backbone of our economy. For the opportunity to be leveraged, a crucial pre-requisite is that the MSMEs transactions be anchored onto a common ‘reference’ data. The ‘Product Catalog’ is one such reference data containing rich semantics about all products being transacted by MSMEs.
However, building and maintaining such catalogs especially for MSMEs is a herculean task in itself, owing to several complex challenges. First, the product universe for MSMEs is very diverse. Even at the top-level classification of finished goods alone, there are nearly hundred industry segments. Moreover, while B2C companies transact in finished goods, B2B transactions happen in raw material, intermediate artifacts, and parts all of which combine to make a consumer good. Hence, the size of the product catalog in which MSMEs operate, is perhaps hundreds of times larger than the size of the catalog operated by, let’s say the e-commerce sector. Second, there is a vast disorganization in terms of product representation. A pencil may be represented by a manufacturer as ‘Natraj Pencil hardness HB, shape=Octagone’; whereas the same pencil may be denoted by a retailer simply as ‘Pencils’. In addition, there is the issue of multilingual representations, given that India has more than 20 regional languages and even more local dialects. And often, the MSME owners / data operators aren’t familiar with English. Third, the catalog needs to cover the product universe transacted by a huge number of businesses of varying scale. By government census, there are around 6 million registered businesses in India, and 99% of them are MSMEs. Each business records and structures their data uniquely to suit their individual needs, and because no standardization has been enforced.
Keeping in mind the nature and scale of the problem, this talk will present innovative approaches to tackling a few of the challenges in building a product catalog for MSMEs. These solutions rely on techniques ranging from heuristics, string match to conditional random fields, evidence theory, and semantic graph mining, to name a few.
For the big data analytics to be leveraged by MSMEs, a crucial pre-requisite is that their transactions be anchored onto a common ‘reference’ data. The ‘Product Catalog’ is one such reference data containing rich semantics about all products being transacted by MSMEs. This talk will present innovative approaches to tackling a few of the challenges in building a product catalog for MSMEs.
Kalpit V. Desai is the Director of Data Science at Clustr. Prior to Clustr, Kalpit has gained over 14 years of experience building the core algorithms for data products in variety of settings ranging from an academic lab CISMM to a multinational conglomerate GE to a start-up Bidgely. His core expertize is in building intelligent software systems based on statistical inference, pattern recognition and machine learning. He is passionate about making use of data and algorithms to make our world a better place. Kalpit holds PhD from The University of North Carolina at Chapel Hill, USA and has numerous patents and peer-reviewed publications at international journals in the field of data science. He lead a prize-winning team in the IEEE data mining contest ICMD 2011. When the clock is ticking a bit slower, Kalpit enjoys family time, chess, non-fiction, and often advising budding businesses on their data strategy.