The Fifth Elephant 2018

The seventh edition of India's best data conference


Bad Data is No Better Than No Data! - Impact of Automation in Data Stewardship Workflows in Plant Agriculture Industry

Submitted by Karnam Vasudeva Rao (@vasukarnam) on Friday, 30 March 2018

Preview video

Section: Crisp talk Technical level: Intermediate


Data stewardship is the management and oversight of organization’s data assets to provide high quality data that is easily accessible in a consistent manner for business and research decisions. It includes data acquisition, data standardization, data integration and data analytics. Data generated at different phases of the pipeline often end up in different databases and use colloquial vocabulary resulting in scattered and disconnected data. Data standardization allows integration using controlled vocabulary ensuring that data can be tracked across the pipeline. Reliable and clean data coming from proper governance enables tracking, automation, integration and speedy decisions. I will be discussing about the challenges and solutions in data accessibility, analysis and decision making in plant biotechnology. Further, I will provide you a glimpse on how we automated the data stewardship workflows using R programming to save resources and time. I will illustrate 1-2 examples on how we adopted predictive analytics in plant agriculture field to enable research decisions.


1. Brief introduction about Monsanto
2. Why data science/Digitization in agriculture field? - Challenges with data and decision making in agriculture research.
3. Automation of data stewardship workflows to overcome above challenges (example: ‘dataCuratoR’ - an R engine built by us).
2. How we influence decisions? - Data analytics - How we influence business decisions by adopting machine learning (1-2 examples from our work).
3. Summary: Impact of Data Science in Agriculture industry - Improved data quality and accessibility, and minimized resource and time.

Speaker bio

Dr. Karnam Vasudeva Rao is a Senior Scientist in Data Science team at Monsanto- a modern agriculture company with a rich history. Vasu is a domain expert in plant biology with 15 years experience and a data scientist with more than 5 years of experience. He has substantial knowledge in data standardization/wrangling, data mining, R programming, data visualization and predictive analytics. With this proficiency, he can provide an unique experience to audience by taking them through the application of Data Science in addressing challenges in agriculture industry. Prior to his career in Monsanto, he has pursued his doctorate from Max-Planck Institute, Munich, Germany.



Preview video


  • Hari C M (@haricm) Reviewer a year ago

    Is dataCuratoR an open source project?

  • Karnam Vasudeva Rao (@vasukarnam) Proposer a year ago

    Hi, No it is not open source. I would like to focus more on concepts and automation of workflows in Agriculture industry, rather than highlighting dataCuratoR. If you think it’s a problem I could remove the term. Please let me know.

  • Venkata Pingali (@pingali) a year ago

    Interesting talk from an off-beat domain. Few thoughts:

    1. Can you elaborate on why data was not accessible to begin with? It is the volume of data, trust, lack of standardization, process? Automation was clearly helpful but you may have uncovered more issues along the way about the current system.

    2. Can you summarize which features of dataCurator helped address which problem?

    3. Can you articulate lessons learnt that others can apply in their contexts? For example, (a) what drives the costs and complexity (b) where should one invest to ensure high quality? (c) what are the limits of such systems? (d) what you see as the next set of needs for such systems?

Login with Twitter or Google to leave a comment