Click4Assistance glyph Flower shape graphic Rectangle shape
20 October 2025 | 61 views

Standing Strong: Surviving the AWS Outage

Standing Strong: Surviving the AWS Outage

In the early hours of 20 October 2025, a major Amazon Web Services (AWS) outage rippled across the internet. Popular platforms from WhatsApp, Snapchat and Fortnite to Amazon’s own Alexa and Ring, went dark. For a few tense hours, parts of the digital world simply stopped working.

Incidents like this remind us how deeply our lives and businesses depend on cloud infrastructure. When one of the world’s largest providers stumbles, the effects are felt everywhere.

Yet amid the disruption, we’re happy to say: Our systems stood strong.

We were not unscathed: the social media functionality such as WhatsApp integration was affected due to the 3rd party provider. However, primarily Click4Assistance operates outside of AMS (https://www.click4assistance.co.uk/servicestatus) and was able to continue to provide our live chat services uninterrupted.

But this is in no way a victory lap; it’s a moment for reflection. Here’s what today’s outage revealed, not just about AWS, but about modern cloud reliance and what it really means to build systems that can stand strong.

What Happened: A Quick Recap

Below is a summary of what is known so far (as of the time of writing):

  • The first public signs of trouble emerged around 03:11 a.m. Eastern Time (ET) when users and status dashboards reported increased error rates, latencies, and degraded performance in the US-EAST-1 (Northern Virginia) AWS region. TechRadar+3AP News+3Reuters+3
  • The outage cascaded quickly. Many services that depend on AWS infrastructure, either for compute, storage, database, identity, or DNS, started failing entirely or intermittently. DataCenterKnowledge+4The Verge+4AP News+4
  • AWS engineers identified that the root of the issue was linked to DNS resolution problems for the DynamoDB API endpoint in US-EAST-1. Tom's Guide+6AP News+6The Verge+6
  • As time passed, mitigations were applied. By ~05:27 a.m. ET, AWS reported “significant signs of recovery” and said that most requests were succeeding again. DataCenterKnowledge+6AP News+6Reuters+6
  • Later updates declared that the underlying issue had been fully mitigated, though residual throttling and queued request backlogs were expected. Techloy+5Newsweek+5AP News+5
  • AWS plans to publish a Post-Event Summary (PES) detailing the scope, root causes, contributing factors, and remediation steps. Amazon Web Services, Inc.

The outage spanned roughly 2 to 3 hours from peak disruption to substantial recovery, however for some services and users, lingering effects may persist beyond that.

Who Was Affected (and How Badly)

Because AWS underpins such a wide portion of the Internet’s infrastructure stack, the fallout was vast and diverse:

  • Social & Messaging / Entertainment: Snapchat, Roblox, Fortnite, Signal, Discord, etc. Many users were unable to log in, load content, or interact. Cybernews+8AP News+8The Verge+8
  • Amazon’s own services: Amazon’s retail site, Alexa smart home, Ring, Kindle, and more experienced failures or degraded performance. Newsweek+7AP News+7The Verge+7
  • Banks, Finance, and Payment Services: In the UK, users reported issues with Lloyds, Bank of Scotland, Halifax, and accessing HMRC services. Reuters+3The Guardian+3AP News+3
  • Enterprise and SaaS applications: Tools like Canva, Airtable, Slack, Duolingo, and more faced outages or degraded behaviour. The Guardian+4The Verge+4AP News+4
  • Cascading dependencies: Many services rely on global features or shared AWS endpoints (e.g. IAM, global tables, DNS). Disruptions in US-EAST-1 propagated across regions and to seemingly unrelated services. AP News+4TechRadar+4DataCenterKnowledge+4

The failure was less about a single service going down, and more about domain resolution and routing failure i.e. components responsible for directing traffic to correct services were broken, not necessarily the data itself. Because of that, many systems simply couldn’t ‘find’ their backends even if the storage or compute was intact.

Improve customer satisfaction with Click4Assistance today.

Why This Matters (and What It Reveals)

1. Cloud Monoculture is Fallible

So much digital infrastructure is provided by a few major cloud providers (Amazon, Microsoft, Google). When one of them fails, a huge swathe of the Internet feels it immediately.

2. DNS is Still Fragile

DNS resolution is at the heart of nearly all web routing. When DNS misbehaves, very little downstream works. This outage underscores that infrastructure services (like DNS) are critical.

3. Recovery is Complex & Non-Instant

Even after the root issue is mitigated, systems have to recover: queued requests, retries, throttling, all take time.

4. Trust & Reputation Are Fragile

For AWS, trust is everything. Every outage chips away at confidence, especially for mission-critical customers (banks, health systems, government). The post-outage communication, transparency, and remediation matter as much as the fix.

Lessons & Best Practices for the Future

Whether you're an SME or a big enterprise, here's what to take away:

Recommendation

Why It Helps

How to Do It

Test disaster recovery often

A plan is only as good as your ability to execute it in real life

Run trials regularly, engineer chaos experiments

Graceful ‘Out of Service’  modes

Let the user do something rather than failing completely

Software should provide read-only modes or caching, rather than full outage

Keep communication providers separate

Spread the risk of all communication channels being unavailable simultaneously

Use separate providers for telephone and text-based comms

Transparency & communication

Users are less upset by outage if they know what’s going on

Maintain status pages, push updates, issue postmortems

Final Thoughts

Today’s AWS outage is a lesson in systemic risk, architectural humility, and the fragility of a globally connected digital ecosystem.

As computing becomes more central to every facet of life, our infrastructure choices carry outsized consequences. Outages like this one remind us that no matter how advanced our systems are, the fundamental dependencies (DNS, routing, configuration) currently remain our Achilles’ heels.

Popular Blogs

Using ChatGPT for customer support: advantages and disadvantages 20 Nov 2024

Using ChatGPT for customer support: advantages and disadvantages

AI in customer support has surged 300%. If you want to add a chatbot to your website, is ChatGPT the right fit for your business? We explore the pros and cons.

Read more
How live chat software supports social media marketing strategies 23 Apr 2025

How live chat software supports social media marketing strategies

Discover how live chat helps marketers across industries boost engagement, drive conversions, and provide real-time support through their social media marketing strategies.

Read more
How Live Chat Integration Can Improve Support Services for People in Need 12 Mar 2025

How Live Chat Integration Can Improve Support Services for People in Need

Discover how live chat integration enhances support services for people in need, providing instant, accessible, and empathetic assistance for those facing challenges.

Read more

Find out more

Live chat dashboard with chat window example

Live chat

Learn how live chat can help empower your organisation.

Find out more
Coni chatbot live chat support Arti AI for live chat business support

Chatbots & AI

Learn how chatbots and AI can help you engage with your audience.

Find out more
integrated omnichannel communications

Omnichannel

Connect with your audience using multiple omnichannels.

Find out more

Discover more

Want to see how live chat can work for your organisation?

See examples of web chat and chatbot implementations for your industry. Be inspired by how other companies in your sector use live chat!

Download web chat and chatbot examples for your industry

Embrace new ways of engaging with your audience!