The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Tickets

How We Built a ML Model to Predict Proteins for Insecticidal Activity?

Submitted by Karnam Vasudeva Rao (@vasukarnam) on Wednesday, 26 June 2019

Session type: Short talk of 20 mins Session type: Full talk of 40 mins

View proposal in schedule

Abstract

To improve the crop plant yield, agriculture companies have successfully adopted development of insect resistant crops by expressing insecticidal (insect killing) proteins in plants. As a leader in Agriculture Biotechnology industry, Bayer tests hundreds of genes every year for insecticidal activity in their proprietary pipeline to develop next generation of insect control solutions. Identification and nomination insecticidal proteins using traditional methods like blast and structure similarity have some drawbacks because of which more than 90% of the nominated proteins end up displaying no or less activity against insects. The testing of these proteins consumes enormous amount of time and resource. So we adopted machine learning (ML) approach to identify these proteins. We generated numerous features for more than 5000 amino acid sequences using a Python toolkit, iFeature, developed by Chen et al, in 2018 and built ML models to identify proteins with insecticidal activity. Proteins identified using this method are tested in the pipeline to check their efficacy against insect pests. Challenges faced while building the model and methods to overcome those challenges are discussed in this presentation. The information in this presentation can be helpful for building models for bio-medical research (example cancer-related proteins, proteins in age-related diseases), agriculture and other domains.

Outline

  1. What are insecticidal proteins?
  2. Why machine learning for protein activity identification?
  3. Different approaches used by researchers
  4. Why not traditional methods?
  5. iFeature - a Python tool kit
    5a. Why did we choose iFeature?
    5b. What features iFeature has?
    5c. How we adopted it for our need?
  6. Model performance
  7. Model managament
  8. What were the challenges?
  9. How did we overcome those?
  10. Where else this study can be applied?

Speaker bio

Dr. Karnam Vasudeva Rao is presently working as Senior Scientist-Data Science, Bayer, Bengaluru, India since 2009. Prior to this Vasu has pursued his PhD from Max-Planck Institute For Biochemistry, Munich, Germany. He has enormous experience working in research organizations in India and abroad. He is involved in developing data science products in his organization and in mentoring budding Data Scientists within and outside the organization.

Links

Slides

https://docs.google.com/presentation/d/1nmOVHJdoyO-xR3gWGhY3FTF62LdUPKDC0QYr_D4t1lM/edit#slide=id.g5ed249e491_0_0

Comments

  • Zainab Bawa (@zainabbawa) Reviewer 4 months ago

    Thanks for the review call, Vasu. As discussed, here is the feedback to work on:

    1. Providing the context to insecticidal activity succinctly enough for the audience to get into the flow.
    2. Focussing on the “why” of the technical choice(s), besides explaining how. What approaches for model management did you consider, before choosing the approach that you did?
    3. Showing technical details, including the code and the architecture details, for the segment of the audience that wants to know the implementation nitty-gritties.
    4. Did you experience any failures or setbacks in the model management process? Or were there any clear improvements as a result of the implementation? It will be useful to share a point or two here, before you move into the takeaways.
    5. Finally, the visual layout of the slides has to dramatically improve. The improvements being:
      - Font size has to be increased and made more readable
      - No more than three bullet points per slide
      - Where a lot of data has to be shown, divide into screenshots that can be showed one after the other
      - Code samples have to be magnified to auditorium scale.

    Upload the revised slides before 18 July so that these are ready for rehearsal and review.

    • Karnam Vasudeva Rao (@vasukarnam) Proposer 4 months ago

      Thank you Zainab for valuable feedback. I will work on this.

Login with Twitter or Google to leave a comment