The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Tickets

Data Science Best Practices for R and Python

Submitted by Pushker Ravindra (@rpushker) on Monday, 15 April 2019

Session type: Workshop

Abstract

How many times did you feel that you were not able to understand someone else’s code or sometimes not even your own? It’s mostly because of bad/no documentation and not following the best practices. Here I will be demonstrating some of the best practices in Data Science, for R and Python, the two most important programming languages in the world for Data Science, which would help in building sustainable data products.

  • Integrated Development Environment (RStudio, PyCharm)

  • Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)

  • Linter (lintR, Pylint)

  • Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)

  • Unit testing (testthat, unittest)

  • Packaging

  • Version control (Git)

These best practices reduce technical debt in long term significantly, foster more collaboration and promote building of more sustainable data products in any organization.

Outline

  • Why Data Science Best Practices?

  • Why R & Python

  • Data Science Best Practices

  • Integrated Development Environment (RStudio, PyCharm)

  • Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)

  • Linter (lintR, Pylint)

  • Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)

  • Unit testing (testthat, unittest)

  • Packaging

  • Version control (Git)

  • Conclusion

Requirements

None

Speaker bio

I have BTech in Electrical Engineering from IIT Kanpur, Executive General Management from IIM, Bangalore and PhD in Bioinformatics / Computational Biology from UCD, Ireland. After PhD, I worked as the Head of Software Development at Genome Life Sciences, Chennai. I have more than 10 years of experience in the field of Genomic Data Science at international research organizations including UMH, Alicante (Spain), IGIB, Delhi and Monsanto, Bangalore. Currently, I am leading Data Analytics platform at Monsanto (a subsidiary of Bayer), Bangalore.

Comments

  • Zainab Bawa (@zainabbawa) Reviewer 6 months ago

    How are you arriving at these best practices? Will this be an experience talk?

  • Pushker Ravindra (@rpushker) Proposer 6 months ago

    It is based on experience and some of these best practices are followed in software industry. Since most part of data science involves coding (software development), it is good to follow these practices.

Login with Twitter or Google to leave a comment