Partial platform service disruption

Incident Report for Fluid Attacks

Postmortem

Impact

An unknown number of users experienced various issues while using the platform. The issue started on UTC-5 25-06-03 16:56 and was proactively discovered 57.6 minutes (TTD) later by a staff member, who reported through our help desk [1] that it was not possible to resubmit vulnerabilities. Afterwards, additional reports related to vulnerability management, repository access, and event handling were received. The problem was resolved in 20.1 hours (TTF), resulting in a total window of exposure of 21.1 hours (WOE) [2].

Cause

While updating our build system to a new version, we also updated the tool responsible for managing dependencies for one of our platform components. In the newer version, a key command used to provide these dependencies was removed. Since this process was happening inside a part of our infrastructure setup that wasn’t designed to catch such errors early (due to technical debt), the problem wasn’t visible until it affected the platform directly [3].

Solution

We improved how dependencies are handled, using an updated, recommended method. Additionally, we moved the dependency-building process out of the infrastructure setup and into a simpler script. This way, if something goes wrong in the future, it will fail earlier and more visibly, preventing broken updates from reaching the platform [4].

Conclusion

By adjusting the process to detect errors earlier, we’ve made the system more reliable and easier to maintain moving forward. INFRASTRUCTURE_ERROR < INCOMPLETE_PERSPECTIVE

Posted Jun 05, 2025 - 09:33 GMT-05:00

Resolved

The incident has been resolved, and platform operations have returned to normal. All impacted components are now working as expected.
Posted Jun 04, 2025 - 17:35 GMT-05:00

Identified

Issues across multiple platform areas that potentially affect internal workflows and external users have been identified. These include unexpected errors, missing data in some views, and inconsistent behavior in automated processes.
Posted Jun 04, 2025 - 10:45 GMT-05:00
This incident affected: Platform.