Recently, both Amazon Web Services and Microsoft Azure suffered severe outages which affected customers and clients across the world, with human error and a brownout related to each issue respectively. System downtime is one of the biggest issues that a company can face, and the very real costs of downtime can include anything from the physical cost of lost income through to the impact of a negative experience for customers. Financial loss and loss of reputation are both difficult to recover from, and in the case of Amazon and Azure, the impact was not insignificant and required a release and damage control to mitigate the effects.
Amazon’s Outage Down to Human ErrorSource What happened with Amazon? According to a company update which was posted to explain some of the outages for Amazon, the problem was caused by a simple debugging procedure which inadvertently took down several servers. Due to these servers being down, other subsystems were taken down, and this cascading effect resulted in many cloud-based businesses experiencing disruption. Businesses like Xero, Expensify, Slack, Trello and Medium all experienced some disruption due to the partial failure at Amazon’s data centres. The issue affected businesses that used Simple Storage Service (S3) and was a cause for critical concern for those affected. The full impact of the issue was such that websites could function but that they couldn’t access their backend storage and had issues displaying stored images or sharing files. What was the fallout? Amazon was quick to say that they would be learning from this and making changes to ensure that human error couldn’t have such a significant impact in future.
Azure Suffered Under a Brownout
status page saying that the brownout affected a range of Azure services and features. The outage meant users couldn’t perform basic service management operations, and other services that relied on storage were also impacted. What was the fallout? Azure made efforts to redirect traffic from the affected regions and isolate the affected facilities to lessen the impact of the outage, but not before thousands of customers reported issues. It was a big wake up call for Azure who needed to implement systems to protect against brownouts and other power-related issues.