Migrating traditional warehouse and its applications to a Big-data platform
- Understanding the capabilities/limitations of Hadoop platform for efficient migration
- Identifying the pitfalls and anti-patterns to avoid
There is no “the solution” when you start thinking about the migration of a traditional warehouse and its applications to big data platform. Every problem is different and so are their solutions. Inspite of this there are common mistakes that developers make because of lack of understanding of the new space. Knowing these pitfalls and common patterns will certainly help the existing big data developers and for them who are looking for an opportunityto work in this space.
A prevalent trend in the industry is that with growing volumes of data and hence exponentially growing cost of processing, enterprises find it hard to scale their traditional data warehouses. Approaches such as building specialised data marts, projecting a view of a subset of the entire data, act as short term tactical fixes, but cause other problems. A more strategic option to handle this situation is to migrate to BigData platforms like Hadoop. However, such a migration should be done keeping in mind the capabilities and limitations of the BigData system, in order to build an efficient solution.
I see three aspects of a data analytics solution: data ingestion and preparation, data aggregation and statistical computation (e.g. various forecast algorithms etc.). In this talk, I will share my experiences in migrating an application to the Hadoop ecosystem. I will describe the options in each of the above aspects. I will also talk about some common pitfalls and anti-patterns that we should identify and avoid.
I am a developer at Thoughtworks with around eleven years of experience in various technologies. A few years back I started working on Hadoop platform helping our clients to migrate traditional data warehouses and analytical applications to hadoop eco system. I believe that the analytical and the transactional systems is going to converge at some point to big data and hence distributed computing is going to be the solution in future. I have presented a similar topic at Great Indian Developers Summit also.