Service disruption due to AWS outage

Incident Report for Fluid Attacks

Postmortem

Impact

Our monitoring tools detected abnormal behavior across all core services, including the platform, agent, and API-dependent components. Although AWS services began experiencing issues at UTC-5 25-10-20 00:11, our platform continued handling requests until approximately UTC-5 25-10-20 09:15 and was proactively discovered 15 minutes (TTD) later by one of our monitoring systems, which alerted our team to a service outage affecting several core components. The problem was resolved in 8.4 hours (TTF), resulting in a total window of exposure of 8.6 hours (WOE). [1].

Cause

Multiple AWS services experienced a major outage, as reported in the AWS Health Dashboard. This incident caused disruptions across several AWS components our infrastructure depends on, leading to widespread service failures within Fluid Attacks.

Solution

While AWS services were gradually recovering, our team worked in parallel to implement internal adjustments that allowed our infrastructure to deploy successfully once service stability was restored. These changes focused on ensuring that our components could reconnect, synchronize, and resume normal operation. Most of the downtime corresponded to AWS’s recovery period, followed by an additional time during which we completed the internal redeployment process.

Conclusion

We are actively working on migrating to more self-contained infrastructure stacks to reduce dependency-related issues and improve the overall reliability and reproducibility of our services. THIRD_PARTY_ERROR

Posted Oct 21, 2025 - 16:06 GMT-05:00

Resolved

The incident has been resolved and all our services are now fully operational. AWS services have stabilized, and we are monitoring performance to ensure continued reliability.

Posted Oct 20, 2025 - 18:02 GMT-05:00

Update

We are continuing to monitor the incident closely and are actively working to mitigate the impact on our platform and related services.

Posted Oct 20, 2025 - 15:45 GMT-05:00

Identified

An ongoing outage in multiple AWS services is disrupting our platform, the Agent and all API-dependent services. Our engineering team is actively working to mitigate the impact.

Posted Oct 20, 2025 - 09:48 GMT-05:00

This incident affected: Platform and Agent.