The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Data Science Best Practices for R and Python

Submitted by Pushker Ravindra (@rpushker) on Apr 15, 2019

Session type: Workshop Status: Rejected

Abstract

How many times did you feel that you were not able to understand someone else’s code or sometimes not even your own? It’s mostly because of bad/no documentation and not following the best practices. Here I will be demonstrating some of the best practices in Data Science, for R and Python, the two most important programming languages in the world for Data Science, which would help in building sustainable data products.

  • Integrated Development Environment (RStudio, PyCharm)

  • Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)

  • Linter (lintR, Pylint)

  • Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)

  • Unit testing (testthat, unittest)

  • Packaging

  • Version control (Git)

These best practices reduce technical debt in long term significantly, foster more collaboration and promote building of more sustainable data products in any organization.

Outline

  • Why Data Science Best Practices?

  • Why R & Python

  • Data Science Best Practices

  • Integrated Development Environment (RStudio, PyCharm)

  • Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)

  • Linter (lintR, Pylint)

  • Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)

  • Unit testing (testthat, unittest)

  • Packaging

  • Version control (Git)

  • Conclusion

Requirements

None

Speaker bio

I have BTech in Electrical Engineering from IIT Kanpur, Executive General Management from IIM, Bangalore and PhD in Bioinformatics / Computational Biology from UCD, Ireland. After PhD, I worked as the Head of Software Development at Genome Life Sciences, Chennai. I have more than 10 years of experience in the field of Genomic Data Science at international research organizations including UMH, Alicante (Spain), IGIB, Delhi and Monsanto, Bangalore. Currently, I am leading Data Analytics platform at Monsanto (a subsidiary of Bayer), Bangalore.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('You need to be a participant to comment.') }}

{{ formTitle }}
{{ gettext('Post a comment...') }}
{{ gettext('New comment') }}

{{ errorMsg }}