expert reaction to Amazon internet services outage


expert reaction to Amazon internet services outage

you are here: science media centre > roundups for journalists > expert reaction to Amazon internet services outage

Scientists comment on an Amazon internet services outage affecting many websites and apps.

Prof Jon Crowcroft FRS FREng, Marconi Professor of Communications Systems, University of Cambridge, said:

"One interesting challenge is that the back channels a lot of tech people use to communicate information/tech details about ongoing outages are also taken down by this outage - hence our usual ways of learning (e.g. via signal or slack) are both currently stymied by the AWS outage."

Dr Saqib Kakvi, from the Department of Information Security at Royal Holloway, University of London, said:

"At 6.56am UTC (0756 BST) AWS started to receive issue reports from users and within a few hours more than 500 companies were reporting errors.

"The issue is rooted in the DynamoDB service in Amazon's US-EAST-1 region. The exact nature of the fault is currently not publicly available, but AWS reports they are working to repair it.

"The most likely mitigations are distributing the load to the three remaining US Regions or even further to the two Canadian Regions and the eight European Regions.

"Another option is to start up backup hardware in the US-EAST-1 region with a known working configuration as the faulty versions are repaired. It would be likely that full service will resume by EOD."

Rimesh Patel, IET member and Independent Cyber Specialist said:

"This major online outage underscores a stark reality: Business operations associated with one critical vendor in a region can cascade into global instability. What began as a service interruption has rippled outward, potentially compromising key systems at the very start of the business week - an illustration of how supply chain and infrastructure resiliency must be front of mind for every organisation. Amazon has reportedly committed full resources to restoring affected services, but in the interim the burden falls to other organisations to mobilise rapid responses, isolate impacts, and limit service degradation wherever possible."

Prof Alan Woodward, Visiting Professor of Computing, University of Surrey, said:

"Although we don't yet know the exact cause of the outage at AWS history suggests it will be something relatively simple like a misconfiguration in DNS or BGP. Once these errors propagate across the Internet it takes a while for the update to reach the far corners of the Internet so the outage can appear longer than you might expect for such minor errors.

"What this episode has highlighted is just how interdependent our infrastructure is. So many online services rely upon third parties for their physical infrastructure, and this shows that problems can occur in even the largest of those third-party providers. Small errors, often human made, can have widespread and significant impact."

Patrick Burgess of BCS, The Chartered Institute for IT said:

"It appears the issue has emerged from one of Amazon's US East data centres, resulting in errors across a range of services. At this stage there's no indication of any cyber-related cause; it looks to be a technical fault. Given the scale of Amazon Web Services, which supports much of the world's digital infrastructure, it's not unusual for incidents like this to have a broad impact.

"Amazon tends to be transparent and proactive when resolving outages, so we can expect further updates and a swift resolution. This does, however, highlight how interconnected and reliant our everyday digital services have become on a small number of global cloud providers. Building resilience and ensuring diversity across these systems is essential to maintaining trust and continuity in our digital economy.

"That resilience ultimately depends on skilled, ethical IT professionals who design, maintain and protect the systems we rely on every day."

Prof Nishanth Sastry, Director of Research, Department of Computer Science, University of Surrey, said:

"The issue seems to be a failure of DynamoDB. More specifically, increased errors which is causing things to spiral out and nothing works on US-east-1). The recovery should be soonish, as the severity has been upgraded from "Disrupted" to "Degraded" about a minute ago, with the message "Oct 20 2:27 AM PDT We are seeing significant signs of recovery. Most requests should now be succeeding. We continue to work through a backlog of queued requests. We will continue to provide additional information."

"This is a significant outage that has affected many huge companies, well known brands that most of us know about and rely on. It also includes Amazon's own services such as Alexa and home lighting systems, Ring Doorbell, etc.

"The main reason for this issue is that all these big companies have relied on just one service -- AWS -- without planning for redundancy, e.g., having a backup with Microsoft Azure cloud, rather than just AWS. Even within AWS, it appears that the errors are mostly concentrated in the US-East location in Virginia, so multi-region setups which includes backups in other AWS locations might be more resilient.

"We have all come to rely on Amazon cloud and it mostly operates without any errors (that are visible to the consumer), so when something extraordinary like this happens, we are all wrong footed.. A lesson for the future!"

Prof James Davenport, Hebron and Medlock Professor of Information Technology, University of Bath, said:

"Initially worrying to see that UK banks are affected by an outage which Amazon say is in US-EAST-1 region.

"UK Banks should be confining their usage to UK, or at least European regions, but it might be that they rely on some service that actually runs out of US-EAST-1.

"Obviously this is causing an impact now, but it might mean that some customer data is being handled in the U.S. or possibly that customer usage patterns, even if not actual banking data, can be inferred. We don't know.

"This would seem to indicate at least some unexpected dependency (easy enough to happen, but proper cloud auditing should have detected it if Lloyds itself is responsible - quite possibly a third-party dependency which Lloyds has not guarded against). In any case, worrying."

Konstantinos Mersinas PhD, Associate Professor, Information Security Group at Royal Holloway, University of London, said:

"At the moment, we do not have much more to report as we are waiting for evidence.

"Indeed, we are aware of the outage currently reported by AWS in the US-EAST-1 region, involving increased error rates and latency across multiple services. At this stage, the root cause remains under investigation.

"For now, there is no indication or evidence to suggest an attack or any particular motivation behind the incident. At the time of writing, no timeframe for resolution has been announced.

"Historically, such cloud outages can last a few hours for initial recovery. However, we can never exclude 'black swan' incidents; in this context, rare, unpredictable, and high-impact events.

"This indicates the cybersecurity mindset: we have to always expect the unexpected. Due to the fact that many major services and business applications rely on AWS infrastructure (particularly the US-EAST-1 region), the disruption is broadly significant, affecting consumer apps, enterprise services, and wider parts of the ecosystem.

The incident underscores the critical importance of focusing on both organisational and infrastructural cyber resilience."

Dr Junade AIi, Software Engineer, Cyber expert and Fellow at the Institution of Engineering and Technology said:

"The large-scale outages of web services appears to have been caused by a major incident affecting one system in one Amazon Web Services data centre location. Amazon Web Services provide computing resources to other companies to use to develop their own projects, housed in various locations around the world.

"So far, Amazon is reporting that the root cause appears to be an issue with one of the networking systems used to control a database product. As this issue can usually be resolved centrally, with multiple different options - unless there are further issues identified - the issue should be able to be mitigated over the coming hours.

"Single points of failure are a growing concern when it comes to the resilience of technical systems. This issue highlights the challenges with depending on single cloud computing regions from single cloud computing vendors and highlights the need for resilience to be built-in to essential services which people are expected to rely upon."

For all other experts, no reply to our request for DOIs was received.

Previous articleNext article

POPULAR CATEGORY

misc

16566

entertainment

17647

corporate

14637

research

8955

wellness

14490

athletics

18507