Courses

Scalable Bioinformatics Bootcamp

In the Big Data era, scalability is becoming a prerequisite for a bioinformatics application to be able to efficiently process large scale datasets. This boot camp will explain how you can turn your bioinformatics applications into scalable workflows by analyzing available options, techniques and tools.

  1. Data Science
  2. Kepler Workflow System
  3. scalable

In the Big Data era, scalability is becoming a prerequisite for a bioinformatics application to be able to efficiently process large scale datasets. This boot camp will explain how you can turn your bioinformatics applications into scalable workflows by analyzing available options, techniques and tools.

Learn about distributed platforms and system Learn about Cloud and Big Data Learn about scalable workflow tools Learn how to make your science reproducible Gain hands-on-experience with bioKepler tools to build scalable bioinformatics workflows About the boot camp: This two-day accelerated training session will start with a crash course on workflow technology and a hands-on session for using the locally developed open source Kepler workflow system. We will then explore common computing platforms including Sun Grid Engine, NSF XSEDE high performance computing resources, the Amazon Cloud and Hadoop. We will explain how workflow systems can help with rapid development of distributed and parallel applications on top of any of these platforms. We will then discuss how to track data flow and process executions within these workflows (i.e. provenance tracking) including the intermediate results as a way to make workflow results reproducible. We will end with a session on using bioKepler to learn how to build and share scalable bioinformatics workflows in Kepler. We will provide lab sessions at the end of each section of the course to apply the explained concepts to real application case studies.