Service availability degradation

Incident Report for Fluid Attacks

Postmortem

Impact

An unknown number of internal users found the Fluid Attacks Platform unavailable. The issue started on UTC-5 25-10-21 12:51 and was immediately discovered by one of our monitoring tools, indicating the outage of these components. The problem was resolved in 14.4 minutes (TTF), resulting in a total window of exposure of 14.4 minutes (WOE).

Cause

A large volume of simultaneous internal processes triggered an unexpected overload in one of our core services, which temporarily degraded its performance and caused a platform outage. The root cause was the number of concurrent requests exceeding the system’s ability to scale quickly.

Solution

We optimized the internal scripts responsible for launching these processes to limit the number of concurrent requests and reduce system load. These changes have already been applied, and further improvements are planned to prevent similar issues in the future.

Conclusion

This incident revealed a scalability limitation in one of our core services. We are working on improvements to make the system more resilient and capable of handling high-demand scenarios without affecting availability. INFRASTRUCTURE_ERROR < PERFORMANCE_DEGRADATION

Posted Oct 23, 2025 - 16:26 GMT-05:00

Resolved

The incident has been resolved, and all affected services are now fully operational.

Posted Oct 23, 2025 - 14:17 GMT-05:00

Identified

Some Fluid Attacks products are experiencing availability issues.
Click for details: https://availability.fluidattacks.com

Posted Oct 23, 2025 - 14:16 GMT-05:00

This incident affected: Platform and Agent.