How We Built a ML Model to Predict Proteins for Insecticidal Activity?
Karnam Vasudeva Rao
To improve the crop plant yield, agriculture companies have successfully adopted development of insect resistant crops by expressing insecticidal (insect killing) proteins in plants. As a leader in Agriculture Biotechnology industry, Bayer tests hundreds of genes every year for insecticidal activity in their proprietary pipeline to develop next generation of insect control solutions. Identification and nomination insecticidal proteins using traditional methods like blast and structure similarity have some drawbacks because of which more than 90% of the nominated proteins end up displaying no or less activity against insects. The testing of these proteins consumes enormous amount of time and resource. So we adopted machine learning (ML) approach to identify these proteins. We generated numerous features for more than 5000 amino acid sequences using a Python toolkit, iFeature, developed by Chen et al, in 2018 and built ML models to identify proteins with insecticidal activity. Proteins identified using this method are tested in the pipeline to check their efficacy against insect pests. Challenges faced while building the model and methods to overcome those challenges are discussed in this presentation. The information in this presentation can be helpful for building models for bio-medical research (example cancer-related proteins, proteins in age-related diseases), agriculture and other domains.
- What are insecticidal proteins?
- Why machine learning for protein activity identification?
- Different approaches used by researchers
- Why not traditional methods?
- iFeature - a Python tool kit
5a. Why did we choose iFeature?
5b. What features iFeature has?
5c. How we adopted it for our need?
- Model performance
- Model managament
- What were the challenges?
- How did we overcome those?
- Where else this study can be applied?
Dr. Karnam Vasudeva Rao is presently working as Senior Scientist-Data Science, Bayer, Bengaluru, India since 2009. Prior to this Vasu has pursued his PhD from Max-Planck Institute For Biochemistry, Munich, Germany. He has enormous experience working in research organizations in India and abroad. He is involved in developing data science products in his organization and in mentoring budding Data Scientists within and outside the organization.