News
Amazon S3 service fails
posted on 18 February 2008 09:34
There has been a failure of Amazon's S3 web services. It was unavailable for around 2.5 hours and then started coming back up. Amazon's appparently slow response and inability to keep users up to date via its S3 forum was not appreciated by users on it.
One user wrote: "Amazon's response was substandard in this case. I should, minimally, see a message on the front page at http://aws.amazon.com when there's a complete outage. Instead, I had to come into the forums to make sure it's not just my stuff."
"Like others here, I have a massive number of files (probably about 125,000 audio files, around 1TB of storage) that are for various music libraries. So I have customers with sites that are only partially functional, and nothing to tell them. That's unacceptable. "
"And I know you can do better. I'm not looking for details of the outage, just an acknowledgment (again, front page of aws) and ETA. "
"Thanks."
Amazon's explanation was this:-
"Early this morning, at 3:30am PST, we started seeing elevated levels of authenticated requests from multiple users in one of our locations. While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests. Importantly, these cryptographic requests consume more resources per call than other request types."
"Shortly before 4:00am PST, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place. In addition to processing authenticated requests, the authentication service also performs account validation on every request Amazon S3 handles. This caused Amazon S3 to be unable to process any requests in that location, beginning at 4:31am PST. By 6:48am PST, we had moved enough capacity online to resolve the issue."
This is what Amazon said it would do:-
"We are taking immediate action on the following: (a) improving our monitoring of the proportion of authenticated requests; (b) further increasing our authentication service capacity; and (c) adding additional defensive measures around the authenticated calls. Additionally, we’ve begun work on a service health dashboard, and expect to release that shortly."
So its authentication server had inadequate capacity. Tsk, tsk. We might expect start-up snafus to happen but the service providers need to have plans in place to be upfront and completely open about them so that users are kept in the loop rather than posting irritated messages into a public forum.
Life in the IT service cloud should not involve being kept in the murk.
The Amazon forum thread is here.
tags: Amazon Cloud S3
in News
LeftHand provides high-availability SAN kits for ESX server
you're reading:
Amazon S3 service fails


