This blog post was first featured on the SwiftStack Blog, and you can find the original post here.
Today I’m happy to announce the release of OpenStack Swift 2.0.0. This release includes storage policies – the culmination of a year of work from many members of the Swift contributor community. Storage policies are the biggest thing to happen in Swift since it was open-sourced four years ago. Storage policies allow you to tailor your storage infrastructure to exactly match your use case. This release marks a significant milestone in the life of the project that will lead to further adoption and community growth.
You can get Swift 2.0 from http://tarballs.openstack.org/swift/swift-2.0.0.tar.gz. As always, you can upgrade to this version without any client downtime.
What are storage policies, and why are they so important? Storage policies allow deployers to specifically configure their Swift cluster to support the different needs of data stored in the cluster.
Use case examples
Once the storage policies are configured, users can create a container with a particular policy, and all objects stored in that container will be stored according to that container’s storage policy.
Let’s explore two use cases enabled by storage policies: a reduced redundancy storage policy and a geographically-specific storage policy.
We normally recommend 3x replication in Swift clusters. It provides a good balance between durability and overhead for most data. However, some data is trivially re-creatable and doesn’t require the same durability. A very good example of this is image thumbnails. If the original resolution image is stored with 3x replication, then a resampled image can be stored with 2x replication. This saves 33% on storage costs, and any data loss is mitigated by the ability to recreate the resized image from the original.
When used at scale to store and serve on-demand user-generated content, as Swift is used today, a “reduced redundancy” storage policy can save significant hard drive space, thus lowering costs. Storage policies can be created to enable different replication factors to be used in the same cluster, depending on the type of data that needs to be stored.
Another example is using different storage policies to geographically distinguish data sets. Suppose your company has a central office in Dallas, a branch office in New York, and a branch office in San Francisco. The data stored and used in one branch office doesn’t need to be shared with the other branch office, but the central office should have a copy of everything. With Swift 2.0, you can create a policy that references the storage capacity in Dallas and New York and another policy that references the storage capacity in San Francisco and Dallas. Now, anything stored in the “New York” policy will be stored in New York and locally available for fast lookup. It is the same with the “San Francisco” policy. But also the central Dallas office has a copy of everything that is being stored in the branch offices.
The central office can easily manage offsite archives and has very good visibility into each branch’s data consumption. Storage policies in Swift 2.0 augment Swift’s existing global cluster capabilities and allows finer-grained control over where the data resides.
Deployer Impact of Storage Policies
Conceptually, storage policies are pretty simple: where a Swift cluster used to support only one object ring, now it can take advantage of many object rings. Each ring in Swift describes a set of storage volumes (i.e. drives), and it includes information necessary for data placement and failure handling. With storage policies, deployers can configure their Swift cluster to support the different needs of data stored in the cluster.
It is safe for deployers to upgrade their existing clusters to use storage policies. And clusters can still be downgraded, at least until you define a second storage policy. If you have multiple policies configured and you revert to pre-storage policy code, any data in the new storage policies will be inaccessible, since older Swift versions do not know how to access it.
Storage policies are defined in the swift.conf configuration file. Existing clusters are treated as having a default “policy zero”. This means existing clusters can take advantage of the new code without needing to immediately begin supporting additional policies. New policies can be configured in that same config file and are then made available for clients. Each storage policy has a new ring.
The developer docs for storage policies include quite a bit more information, including details about the on-disk data layout, deprecating policies, and changes to background consistency processes.
Client Impact of Storage Policies
Storage policies expand the Swift API in just one small way. When creating a container, a client can now send the X-Storage-Policy header to set the policy for that container. The value of the header is the name of the storage policy. And the name of the available storage policies is available from the result of a call to the cluster’s /info endpoint.
Existing Swift clients will still completely work with this new version of Swift. If a client sends a container create request and doesn’t also explicitly send the X-Storage-Policy value, the new container will be created with the cluster’s default policy. This means that existing Swift client applications will not stop working, and aside from setting a policy on a container, will be able to still take advantage of all Swift has to offer.
Storage polices can only be set on a container at the time of container creation. If you need to change a policy, you first must delete all data in the container, delete the container, then recreate the container with the new storage policy. However, since Swift places no limits on how many containers you can have, it’s normally easier to simply create a new container.
Storage policies in Swift would not be possible without the participation of the entire contributing community. In particular, Paul Luse (Intel), Clay Gerrard (SwiftStack), and Sam Merritt (SwiftStack) have been instrumental by providing tremendous focus, dedication, awesome ideas, and leadership to getting this feature designed, written, and merged.
Together with Paul Luse, I gave a talk on storage policies at the OpenStack Juno summit in Atlanta. You can watch it here.
Looking Forward to Erasure Codes
We began working on storage policies in Swift almost exactly one year ago. Last July, we wrote about adding erasure code support to Swift. Erasure codes are great because for some data sets they can offer tremendous savings in storage media while still providing very high durability. But to add erasure code support into Swift, we first needed to add storage policies.
Now that storage policies are available in Swift 2.0, the developer community is refocusing on building the necessary pieces to support an erasure code storage policy in Swift. Policies are the foundation upon which we are building erasure code support into Swift, and this will be a major focus of the Swift contributor community for the remainder of this year.