Jul 2014
21 Mon
22 Tue
23 Wed 09:30 AM – 05:00 PM IST
24 Thu 09:45 AM – 05:00 PM IST
25 Fri 08:30 AM – 07:15 PM IST
26 Sat 08:30 AM – 07:15 PM IST
27 Sun
In 2014, infrastructure components such as Hadoop, Berkeley Data Stack and other commercial tools have stabilized and are thriving. The challenges have moved higher up the stack from data collection and storage to data analysis and its presentation to users. The focus for this year’s conference on analytics – the infrastructure that powers analytics and how analytics is done.
Talks will cover various forms of analytics including real-time and opportunity analytics, and technologies and models used for analyzing data.
Proposals will be reviewed using 5 criteria:
Domain diversity – proposals will be selected from different domains – medical, insurance, banking, online transactions, retail. If there is more than one proposal from a domain, the one which meets the editorial criteria will be chosen.
Novelty – what has been done beyond the obvious.
Insights – what insights does the proposal share with the audience that they did not know earlier.
Practical versus theoretical – we are looking for applied knowledge. If the proposal covers material that can be looked up online, it will not be considered.
Conceptual versus tools-centric – tell us why, not how. Tell the audience what was the philosophy underlying your use of an application, not how an application was used.
Presentation skills – proposer’s presentation skills will be reviewed carefully and assistance provided to ensure that the material is communicated in the most precise and effective manner to the audience.
Tickets: http://fifthel.doattend.com
Website: https://fifthelephant.in/2014
For queries about proposals / submissions, write to info@hasgeek.com
Data Collection and Transport – for e.g, Opendatatoolkit, Scribe, Kafka, RabbitMQ, etc.
Data Storage, Caching and Management – Distributed storage (such as Gluster, HDFS) or hardware-specific (such as SSD or memory) or databases (Postgresql, MySQL, Infobright) or caching/storage (Memcache, Cassandra, Redis, etc).
Data Processing, Querying and Analysis – Oozie, Azkaban, scikit-learn, Mahout, Impala, Hive, Tez, etc.
Real-time analytics
Opportunity analytics
Big data and security
Big data and internet of things
Data Usage and BI (Business Intelligence) in different sectors.
Please note: the technology stacks mentioned above indicate latest technologies that will be of interest to the community. Talks should not be on the technologies per se, but how these have been used and implemented in various sectors, enterprises and contexts.
Hosted by
Ashok Banerjee
@ashokbanerjee
Submitted Apr 30, 2014
This talk is intended to help businesses avoid expensive incorrect decisions based on poor understanding of the underlying models. In this talk I shall discuss ways to understand a phenomenon by triangulating across visualizations, underlying model understanding and experimentation.
As the volume of data increases we as humans need abstractions. At the first level we resort to aggregate measures. However not all aggregate measures are meaningful for all phenomena. Picking the wrong aggregate measure, and then fine tuning parameters may over-fit the distribution but perform poorly on predictions and lead to fatally flawed conclusions. To understand the models we resort to visualizations, segmentation and then picking mathematical models followed by parametrization of the models.
In this talk I shall discuss the most common models seen repeatedly in nature. And some techniques I use to help me spot the udnerlying model.
We shall start in this discussion with:
Starting with the right model in mind often allows the system to converge rapidly to the required models.
Ashok Banerjee is the CTO of EBusiness at Symantec. Ashok has 23 patents approved to date and counting. Prior to Symantec Ashok has led Engineering teams at Google, Twitter, Flipkart etc.
Ashok takes interest in Large Data Systems (Databases and alternative databases - NOSQL, Message Systems), Parallel Computing, Distributed Systems, Fault Tolerant Computing, Database, Recommendation Systems, Supply Chain and Mathematical Models and Investments.
On the non-work side Ashok enjoys - sailing, wind surfing, horse riding, german shepherd dogs and soccer.
Experience Summary (reverse chronologically)
Ashok today leads the EBusiness team at Symantec technology team for Data Platform and Analytics at Flipkart and has also led the largest online Supply Chain infrastructure in India (Flipkart) - At Google he led a large scale Datawarehouse infrastructure which converts SQL (approximately) into execution on a platform built on MapReduce, GFS, Columnar compressed data using block oriented computing. This was at the scale of many billion rows added per day (cannot disclose how many billions) - At Google Ashok had led the payment processing infrastructure which processes payments for Adwords, Adsense, Checkout and Google Apps Prior to that Ashok led engineering efforts at BEA WebLogic.
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}