A problem in Amazon Web Services caused the collapse of various web pages and applications

According to reports, AWS services on the east coast of the United States are recovering after problems that impacted the media and third-party systems such as McDonald’s, Delta, Crunchyroll .

Amazon Web Services is by far the backbone of much of the internet. For this reason, various services report considerable drops in the last few minutes, according to the DownDetector notification system . Since two in the afternoon on Tuesday (Peru time) AWS reports alerted about a worrying degradation in the performance of the network dedicated to the East Coast in the United States, affecting each platform hosted in its systems: OpenAI, McDonald’s , Delta and other high traffic companies.

We continue to experience higher error rates and latencies for various AWS services in the US-EAST-1 region,” highlights the  AWS portal . “We have identified the root cause as an issue with AWS Lambda and we are actively working to resolve it. We are actively working on full mitigation and will continue to provide regular updates.”

Multiple user reports indicate that 40% of users have problems accessing the AWS Console , a web-based application that allows you to access and manage Amazon Web Services (AWS) resources. It provides a graphical user interface (GUI) for managing your AWS services, including Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), and Amazon Relational Database Service (RDS). Basically, it’s the AWS control booth on a domain.

“We are continuing to work to resolve the error rates invoking Lambda functions. We are also seeing elevated errors getting temporary credentials from AWS Security Token Service and are working in parallel to resolve these errors,” AWS says in an update . Lambda is a service that enables you to run code without provisioning or managing servers, and runs your code on a highly available computing infrastructure and performs all computing resource management, including operating system and server maintenance, capacity provisioning, autoscaling, and logging.

When a hosting service goes down, systems that use AWS to host content or serve automated services often present problems. That is, if a company has its system “mounted” on AWS; Google, Huawei or any other cloud, and it goes down, it becomes inaccessible. This has an impact on loss of traffic in real time, unanswered user problems and a host of operations -banking, logistics, financial and of all kinds- that remain on standby.

This is the case of Webflow , a system mounted on AWS that offers web hosting and design services. “The current AWS outage is causing degraded performance on Webflow and some outages on hosted websites. Our team is actively working to restore performance, we apologize for any inconvenience this may have caused.”

On December 7, 2021, OOKLA  recorded a series of outages in services hosted on AWS since 11 a.m. Internal Amazon reports identified the problem in the eastern part of the United States, but this caused a cascade of disconnections globally.

“We are experiencing API and console issues in the US-EAST-1 region,” Amazon said in a report on its  Service Health Dashboard.  “We have identified the root cause and are actively working towards recovery. This issue is affecting the global console home page, which is also hosted on US-EAST-1.”

The outage affected a wide range of AWS services, including EC2, S3, RDS, DynamoDB, and Route 53. It also affected several third-party services that rely on AWS, including Slack, Okta, and Duo Security . The outage caused significant disruption to many businesses and organizations.

Through its reporting site, AWS indicated that they have managed to identify the problem and have initiated a contingency plan to stabilize the system:“As of 11:49 am PDT, customers began experiencing errors and latencies with various AWS services in the US-EAST-1 Region. Our engineering teams immediately got involved and began investigating. We quickly narrowed down the cause root for it to be an issue with a subsystem responsible for AWS Lambda capacity management, causing errors directly to clients (including through API Gateway) and indirectly through usage by other AWS services. We are seeing a sustained recovery in Lambda invocation failure rates and the recovery of other affected AWS services. We continue to monitor closely as we work toward full recovery across all services.”

UPDATE: Finally, AWS indicated that the systems returned to normal and that, gradually, the services will begin to restore access. This event lasted almost two hours, from detection to recovery of operations.