"Big Data on OpenStack: A Rackspace Use Case"
By: Natasha Gajic
Rackspaces Enterprise Business Intelligence group (EBI) was seeking a way to move away from their current Data Warehouse solution. They were looking for a cost effective way to scale out new infrastructure in order to meet the increasing business demands of users, house increasing amounts of data, and customize the collection of data. For this, they utilized Hadoop, Cassandra and PostgreSQL with an OpenStack cloud and build the Analytical Compute Grid (ACG).
Analytical Compute Grid(ACG) is solution that enables Rackspace to:
- House an ever growing set of data collected from multiple business units.
- Allow for quick collection of data
- Rapidly scale up and down to meet fluctuating demands.
- Provision a wide variety of open sourced virtual machines.
- Utilize open source technology to move away from enterprise license fees and avoid vendor lock in to any one particular product.
The team selected OpenStack to be the heart of the Analytic Compute Grid for the following reasons:
- OpenStack is compromised of a rich and robust API allowing the ACG engine to interface with OpenStack to perform all of the necessary dynamic scaling functions.
- ACG needs to rapidly create and destroy virtual machines. OpenStack provides the necessary speed of provisioning and scale to accomplish these tasks.
- ACG utilize OpenStack images to create system VMs. An OpenStack image contains all components necessary for VM to join ACG system.
OpenStack allow us to configure images with different data stores:
- Cassandra database for columnar data structures
- PostgreSQL for relational data structures
- Hadoop distributed file system for large unstructured and noisy data
As the result, ACG enables users to select optimal data store for information collected. ACG provides SQL like syntax for data retrieval via standard JDBC interface regardless of the underlying data store type.
Come hear about how Rackspace is using OpenStack to help manage it's data.