Event Details

Please note: All times listed below are in Central Time Zone

<< Go back

OpenStack for High-Performance Bioinformatics

HPC / Research

This talk will describe the bioinformatics use cases, challenges and experiences of two leading research institutions: the Francis Crick Institute and Cambridge University.

Adam Huffman will describe how the Francis Crick Institute creates HPC clusters on OpenStack for genomics and scales them to 5,000 cores. He will report on the experience of setting up virtual clusters with batch schedulers on OpenStack to provide an HPC environment for life sciences users. These users build complex pipelines comprised of many tools, operating on multi-terabyte datasets, historically on centrally-provided bare-metal clusters. Adam will describe how the problems of reliably constructing such clusters were overcome with OpenStack, and the challenges in achieving high performance on OpenStack with clusters of this size.

Paul Calleja and Wojciech Turek will describe how Cambridge University is building an HPC bioinformatics platform upon OpenStack infrastructure. The performance of this software stack depends on an IO subsystem optimised for data access patterns characteristic to HPC and current bioinformatics workloads in genomics. Current high-throughput technologies such as Next-Generation Sequencing (NGS) produce unprecedented scales of data in genomics and clinical projects, with many projects producing petabytes of data. Most existing bioinformatics solutions have problems scaling and dealing efficiently with current data volumes, making it hard to store, analyze, share and visualize the data. Cambridge’s approach focuses on solutions that deliver low-latency and high-throughput access to storage.

What can I expect to learn?

Attendees of this session will learn how to create a functioning virtual HPC cluster and to scale that cluster up to 5,000 cores, maintaining good performance. Attendees will also learn about:

strategies to avoid cattle turning into pets
complications with restricted access datasets
rescuing instances affected by unreliable underlying storage
apparent differences in reliability between filesystems used in compute node instances
the need to engage actively with upstream software projects in order to address bugs and missing functionality
cultural issues for users
user expectations
provisioning of complex genomics software pipelines

Wednesday, April 27, 4:30pm-5:10pm (9:30pm - 10:10pm UTC)

Austin Convention Center - Level 4 - MR 16 A/B

View video

Difficulty Level: Intermediate

Tags: Ops Upstream Community User Talk Cinder Neutron Nova Public Cloud

Dr. Paul Calleja

Head of Research Computing Services, Cambridge Uni

Dr Paul Calleja, Director HPC Cambridge Paul has over 20 years experience in HPC starting as an HPC user developing and using HPC molecular modelling codes, then as an HPC vendor designing and implementing over 200 commercial HPC systems. Paul is now centre director for one of the largest University HPC centres in the UK. He was founder and inaugural chair of the UK HPC-SIG and currently drives... FULL PROFILE

Wojciech Turek

Head of Research Computing Platforms

Wojciech Turek, Head of Research Computing Platforms at the University of Cambridge, has a Masters Degree in Computer Engineering and over 8 years of experience in designing and building large scale HPC clusters and storage platforms. Wojciech played a key role in design and setting up of a number of top 500 super computers. His domains of expertise are high performance networks and... FULL PROFILE

Event Details

Registration Opening Soon