Austin
April 25-29, 2016

Event Details

Please note: All times listed below are in Central Time Zone


OpenStack for High-Performance Bioinformatics

This talk will describe the bioinformatics use cases, challenges and experiences of two leading research institutions: the Francis Crick Institute and Cambridge University.

Adam Huffman will describe how the Francis Crick Institute creates HPC clusters on OpenStack for genomics and scales them to 5,000 cores. He will report on the experience of setting up virtual clusters with batch schedulers on OpenStack to provide an HPC environment for life sciences users. These users build complex pipelines comprised of many tools, operating on multi-terabyte datasets, historically on centrally-provided bare-metal clusters. Adam will describe how the problems of reliably constructing such clusters were overcome with OpenStack, and the challenges in achieving high performance on OpenStack with clusters of this size.

Paul Calleja and Wojciech Turek will describe how Cambridge University is building an HPC bioinformatics platform upon OpenStack infrastructure. The performance of this software stack depends on an IO subsystem optimised for data access patterns characteristic to HPC and current bioinformatics workloads in genomics. Current high-throughput technologies such as Next-Generation Sequencing (NGS) produce unprecedented scales of data in genomics and clinical projects, with many projects producing petabytes of data. Most existing bioinformatics solutions have problems scaling and dealing efficiently with current data volumes, making it hard to store, analyze, share and visualize the data. Cambridge’s approach focuses on solutions that deliver low-latency and high-throughput access to storage.


What can I expect to learn?

Attendees of this session will learn how to create a functioning virtual HPC cluster and to scale that cluster up to 5,000 cores, maintaining good performance. Attendees will also learn about:

  • strategies to avoid cattle turning into pets
  • complications with restricted access datasets
  • rescuing instances affected by unreliable underlying storage
  • apparent differences in reliability between filesystems used in compute node instances
  • the need to engage actively with upstream software projects in order to address bugs and missing functionality
  • cultural issues for users
  • user expectations
  • provisioning of complex genomics software pipelines

 

 

Wednesday, April 27, 4:30pm-5:10pm (9:30pm - 10:10pm UTC)
Difficulty Level: Intermediate
Head of Research Computing Services, Cambridge Uni
Dr Paul Calleja, Director HPC Cambridge Paul has over 20 years experience in HPC starting as an HPC user developing and using HPC molecular modelling codes, then as an HPC vendor designing and implementing over 200 commercial HPC systems. Paul is now centre director for one of the largest University HPC centres in the UK. He was founder and inaugural chair of the UK HPC-SIG and currently drives... FULL PROFILE
Head of Research Computing Platforms
Wojciech Turek, Head of Research Computing Platforms at the University of Cambridge, has a Masters Degree in Computer Engineering and over 8 years of experience in designing and building large scale HPC clusters and storage platforms.  Wojciech played a key role in design and setting up of a number of top 500 super computers.  His domains of expertise are high performance networks and... FULL PROFILE