Machine Learning in Molecular Biology
Why do we need new machine learning algorithms to solve problems in molecular biology? Most “plug and play” packages cannot be applied directly, because often it is not even clear how to pose the problem as one of machine learning. Also, high-throughput biotechnologies keep evolving, producing different “types” of data, so the methods have to keep up. I will show how probabilistic models based on Bayesian principles can come to the rescue. I will also talk about the importance of “feature selection” in both paradigms of learning: supervised as well as unsupervised.
- Brush up on high-school biology. (3mins)
- Introduction to some of the new biotechnologies that produce data. (2mins)
- The Biological problems we are trying to solve. (5mins)
- Mixture models and why feature selection is important in an unsupervised learning kind of a setting, with an example.(10mins)
- An example of a Biological problem than can be formulated as supervised learning.(10mins)
5a. Some pictures of genetically modified creatures from our collaborators (that show machine learning works!).
I am part of a group of scientists at the National Chemical Laboratory, Pune, who use mathematics and computation to understand diverse aspects of Biology. I am a computer scientist by training and work primarily on designing probabilistic models as well as algorithms to learn them, all with the hope of solving fundamental problems in genomics.