Amazon EBS – Elastic Block Store has launched

Today marks the launch of Amazon EBS (Elastic Block Store), the long awaited persistent storage service for EC2. Details can be found on the EC2 detail page and the press release.

With the launch of the Elastic Block Store we complete an important milestone in offering a complete suite of storage solutions as part of the Amazon Infrastructure Services. Back in the days when we made the architectural decision to virtualize the internal Amazon infrastructure one of the first steps we took was a deep analysis of the way that storage was used by the internal Amazon services. We had to make sure that the infrastructure storage solutions we were going to develop would be highly effective for developers by addressing the most common patterns first. That analysis led us to three top patterns:

1. Key-Value storage. The majority of the Amazon storage patterns were based on primary key access leading to single value or object. This pattern led to the development of Amazon S3.
2. Simple Structured Data storage. A second large category of storage patterns were satisfied by access to simple query interface into structured datasets. Fast indexing allows high-speed lookups over large dataset. This pattern led to the development of Amazon SimpleDB. A common pattern we see is that secondary keys to objects stored in Amazon S3 are stored in SimpleDB, where lookups result in sets of S3 (primary) keys.
3. Block storage. The remaining bucket holds a variety of storage patterns ranging special file systems such as ZFS to applications managing their own block storage (e.g. cache servers) to relational databases. This category is served by Amazon EBS which provides the fundamental building block for implementing a variety of storage patterns.

I have written before about the basic features of Amazon EBS:

  • Amazon EBS will be offered in the form of storage volumes which you can mount into your EC2 instance as a raw block storage device. It basically looks like an unformatted hard disk. Once you have the volume mounted for the first time you can format it with any file system you want or if you have advanced applications such as high-end database engines, you could use it directly.
  • Developers can create multiple volumes, in size ranging from 1 GB to 1TB. This volume will be created within a specified Availability Zone and will be accessible by your EC2 instances running in that Availability Zone. As to be expected with a volume abstraction only one instance can have the volume mounted at any given time. Volumes can migrate and be reattached to other instances if necessary for failure handling or application migration reasons.
  • The consistency of data written to this device is similar to that of other local and network-attached devices; it is under control of the developer when and how to force flush data to disk if you want to bypass the traditional lazy-writer functionality in the operating systems file-cache. Because of the session oriented model for access to the volume you do not need to worry about eventual consistency issues.

However Amazon EBS isn’t just a massive volume storage array within an Availability Zone, it provides a unique feature that allows for the creation of novel storage management scenarios: the ability to create snapshots and store those snapshots into Amazon S3. These snapshots can then be used as the starting point for creating new volumes within any availability zone.

We see developers use this feature for long term backup purposes, for use in rollback strategies, for (world-wide) volume re-creation purposes. Snapshots also play an important role in building fault-tolerance scenarios when combined with managing applications using Elastic IP addresses and Availability Zones.

Congratulations to the EBS team for delivering a great service that will help a lot of EC2 customers managing their storage efficiently.

Read the rest of this entry »

Posted in Internet, Software | No Comments »

Facebook – Keeping Up

Recently posted on the Facebook blog:

Almost two million new users from around the world sign up for Facebook each week—and we couldn’t be happier. It’s tremendously rewarding to see so many people find what we work on useful and fun. As we continue to add new users and features, however, the load on our thousands of servers continues to increase at a pretty astounding rate. A few weeks ago we reached full capacity in our California datacenters. In the past we handled this problem by purchasing a few dozen servers, hooking them up, and getting on with our lives, but this time we didn’t have it so easy. We’d actually run out of space in our datacenters for new machines.

Fortunately we saw this problem coming a long time ago and started work on a new datacenter in Virginia. Now, we identify whether a user would be better off talking to the east coast datacenter or a west coast data center. For people in Europe and the eastern half of the US, it’s noticeably faster to talk to a server in Virginia than in California. For these users we direct them to Virginia whenever they’re browsing the site and not making any changes.

Whenever that person goes to change some data—uploading a photo album, or changing profile info for example—we send them off to California so that all our modifying operations happen in the same location. This decision was made to prevent two or more modifications from conflicting with each other and messing up our data. It might sound like we’re forcing our users to go to California a lot but only about 10% of our traffic causes a modifying operation. MySQL has a great replication feature that allows us to, in real time, stream all the modifications happening on a California MySQL server to another one in Virginia. Replication happens so fast, even across the country, that the Virginia servers are almost never more than one or two seconds behind the California servers.

Even though all of the modification happens in California and streams instantly to Virginia, we were faced with another problem. Although Facebook’s data is stored in MySQL database servers, we use a large number of memcached servers to store copies of the data. Memcached is much faster and able to keep up with requests quicker than the databases themselves can keep up. We had to figure out a way for memcached servers to replicate data concurrently with the MySQL databases. Because of various technical limitations of our architecture there was no easy way to do so.

Fortunately MySQL is open source software, meaning we can actually change the way it works by modifying the code. We did just that—embedding extra information in to the MySQL replication stream that allows us to properly update memcached in Virginia. This ensures that the cache and the database are always in sync. Over the last seven months a great team of Facebook employees has been building new software and setting up new servers like I described above. Over Thanksgiving we finally flipped the switch and since then almost 30% of our traffic has been served from Virginia.

The east coast datacenter is a great first step towards keeping Facebook fast and reliable as the site grows. Going forward we have lots of exciting plans to expand our infrastructure and improve performance so no user ever has to sit around waiting for a page to load.

Read the rest of this entry »

Posted in Facebook, Internet | No Comments »


Copyright © 2009 Red Canyon Ltd. All rights reserved.

Company Registration No. 6688868



Find us on Facebook! Find us on twitter! Read our blog! Bookmark us on delicious! Bookmark us on Stumbleupon!

We are listed on the FreeIndex.co.uk Web Designers directory