ebs clappy award 6

Posted by peter on August 09, 2011

From Amazon’s status page regarding their recent outage in Dublin, there’s this little alarming snippet inside the wall of text (most of it having to do with failure due to lightning strike) that could easily go missed.

3:11 PM PDT Separately, and independent from the power issue in the affected availability zone, we’ve discovered an error in the EBS software that cleans up unused snapshots. During a recent run of this EBS software in the EU-West Region, one or more blocks in a number of EBS snapshots were incorrectly deleted. The root cause was a software error that caused the snapshot references to a subset of blocks to be missed during the reference counting process. This process compares the blocks scheduled for deletion to the blocks referenced in customer snapshots. As a result of the software error, the EBS snapshot management system in the EU-West Region incorrectly thought some of the blocks were no longer being used and deleted them. We’ve addressed the error in the EBS snapshot system to prevent it from recurring. We have now also disabled all of the snapshots that contain these missing blocks.

We are in the process of creating a copy of the affected snapshots where we’ve replaced the missing blocks with empty block(s). Customers can then create a volume from that copy and run a recovery tool on it (e.g. a file system recovery tool like fsck); in some cases this may restore normal volume operation. We will email affected customers as soon as we have the copy of their snapshot available. You can tell if you have a snapshot that has been affected via the DescribeSnapshots API or via the AWS Management Console. The status for the snapshot will be shown as “error.” Alternately, if you have any older or more recent snapshots that were unaffected, you will be able to create a volume from those snapshots without error. We apologize for any potential impact it might have on customers applications.

Another clappy for the EBS team, and another reason not to use EBS for anything you can’t lose.