March 23, 2017

AWS and Azure Outages that Crippled the Internet

Recently, both Amazon Web Services and Microsoft Azure suffered severe outages which affected customers and clients across the world, with human error and a brownout related to each issue respectively. System downtime is one of the biggest issues that a company can face, and the very real costs of downtime can include anything from the physical cost of lost income through to the impact of a negative experience for customers. Financial loss and loss of reputation are both difficult to recover from, and in the case of Amazon and Azure, the impact was not insignificant and required a release and damage control to mitigate the effects.

Amazon’s Outage Down to Human Error

Source What happened with Amazon? According to a company update which was posted to explain some of the outages for Amazon, the problem was caused by a simple debugging procedure which inadvertently took down several servers. Due to these servers being down, other subsystems were taken down, and this cascading effect resulted in many cloud-based businesses experiencing disruption. Businesses like Xero, Expensify, Slack, Trello and Medium all experienced some disruption due to the partial failure at Amazon’s data centres. The issue affected businesses that used Simple Storage Service (S3) and was a cause for critical concern for those affected. The full impact of the issue was such that websites could function but that they couldn’t access their backend storage and had issues displaying stored images or sharing files. What was the fallout? Amazon was quick to say that they would be learning from this and making changes to ensure that human error couldn’t have such a significant impact in future.

Azure Suffered Under a Brownout

Source

What happened with Azure? Users of Microsoft Azure experienced severe and widening outages for over eight hours at a time. The downtime in the storage service left all of Europe and India affected and unable to access existing resources or set up new resources, knocking some big players offline in the process like Skype, Office 365, Outlook and Xbox Live. The reason? A brownout which hit the service’s east US region. Of their 28 data centres, 26 were reported to be experiencing storage issues. Microsoft was quick to attend to this, with the status page saying that the brownout affected a range of Azure services and features. The outage meant users couldn’t perform basic service management operations, and other services that relied on storage were also impacted. What was the fallout? Azure made efforts to redirect traffic from the affected regions and isolate the affected facilities to lessen the impact of the outage, but not before thousands of customers reported issues. It was a big wake up call for Azure who needed to implement systems to protect against brownouts and other power-related issues.

The Bottom Line

Unfortunately, most people have the perception these days that the cloud provides 100% uptime. But as we’ve established, no one organisation can (or does) guarantee 100% uptime, and even the ATO, banks and the biggest and best of Cloud Providers may have an outage from time to time. What we can do is strive to ensure these disruptions have as minimal impact on your business as possible, and we continually refine processes and systems to ensure outage time is kept to a minimum for clients. At Cymax, we have the ability to bring you the peace of mind that only reliable network provision can bring. You can always expect to enjoy significantly better uptime from a cloud server than from a traditional onsite server, and we have a proven track record of uptime (99.99% for 17 out of the last 18 months), reliability and consistency that is up there with the best of them. Because no matter what level your business is at, whether you’re a global corporation or a local consultancy firm: you need reliable online performance. We can provide your business with high uptime, a true solution that delivers dependable results and an agile, committed team that is passionate about ensuring we’re providing you with the best possible solution. Contact us today on 1300 790 690 for worry-free reliability of your online performance.

Let’s Talk About
Your IT Future

If your IT is reactive, disjointed, or slowing you down, now’s the time to stabilise, secure and modernise.

AWS and Azure Outages that Crippled the Internet

Amazon’s Outage Down to Human Error

Azure Suffered Under a Brownout

The Bottom Line

Let’s Talk About Your IT Future

Let’s Talk About
Your IT Future