Degraded service availability

Incident Report for Fluid Attacks

Postmortem

Impact

At least one user experienced service interruptions affecting the agent and other features that depend on the API. The issue started on UTC-5 25-12-29 22:25 and was proactively discovered 3.7 days (TTD) later by one of our monitoring tools, which reported failures in several system components. While investigating these alerts, the team identified that background processes required for the service to operate were not running correctly. The problem was resolved in 1.6 hours (TTF), resulting in a total window of exposure of 3.7 days (WOE) [1].

Cause

A required access key was missing from the production configuration. The key had been removed by mistake, which caused background processes to fail, and as a result, any functionality that depended on them stopped working [2].

Solution

The missing access key was restored in the production configuration file, allowing the system to authenticate properly and resume normal operations [3].

Conclusion

Once the key was restored, all affected components recovered, and the service returned to normal. This incident emphasizes the importance of safeguarding critical configuration values to prevent widespread service disruptions. INFRASTRUCTURE_ERROR

Posted Jan 15, 2026 - 09:17 GMT-05:00

Resolved

The incident has been resolved, and the affected services are now operating normally.

Posted Jan 02, 2026 - 08:53 GMT-05:00

Identified

It was identified that the CI Gate and services that depend on the API are currently failing.

Posted Dec 31, 2025 - 00:02 GMT-05:00

This incident affected: Platform and Agent.