The Garvan Institute of Medical Research has one of the world’s 5 biggest genome sequencing centres. Garvan’s Data Intensive Computer Engineering (DICE) group was set up to to provide innovative solutions for analysing genomic data to both its factory-scale genome sequencing centre, and its 80+ data scientists or bioinformaticians. We started using Openstack because we saw the value of segregating hardware resources and the opportunity to isolate and simplify the infrastructure configuration to fit the needs of each individual project. Because of that we are now able to deploy multiple environments from Hadoop/Spark clusters to projects with niche requirements without the burden of infrastructure configuration and everything running on commodity hardware and managed by a small team.
Nowadays, every organization is trying to get more out of their data and Genomics space is not different. We see how the amount of information generated by our instruments increases exponentially with every new release but also the number of tools needed to process this information. That raises the following questions; which new tools can we use in order to improve the efficiency of our process? How can we define an environment for our user community where they can define their own rules? How can we do this taking minimal resources while maximizing the results?
In order to answer these questions we turned to Openstack. Projects like kolla-ansible makes this easy by deploying hyper converged environments based on docker containers reducing the time and the number of servers needed for computing and storage in a distributed way or separating them across your whole infrastructure depending of your needs.
In this presentation Manuel Sopena Ballesteros will focus on the deployment challenges that any data driven organization may be facing in order to deploy environments for the data scientists or developers using Openstack as a PaaS, and provide criteria for a selected uses cases for data analytics and software deployment based on user’s needs rather than infrastructure availability.