Instability with CSPM vulnerability reports
Incident Report for Fluid Attacks
Postmortem

Impact

At least three groups experienced problems with the CSPM vulnerability reporting of our machine services. The issue started on UTC-5 24-11-28 23:27 and was proactively discovered 13.3 days (TTD) later by a staff member who reported to our product team that a group did not have CSPM reports on a recently added cloud environment. The problem was resolved in 2.8 hours (TTF) resulting in a total impact of 13.4 days (TTR).

Cause

The CSPM analysis runs daily on a scheduler. Due to an error in a refactor, this scheduler failed to execute the groups with that configuration [1].

Solution

The scheduler was fixed, and an execution was immediately queued for all groups [2].

Conclusion

The lack of adequate testing and monitoring for our machine services allowed this error to go unnoticed by the team. Tests were added to the code, and the team will implement extra monitoring tools for our machine services scanner.  MISSING_TEST

Posted Dec 13, 2024 - 10:58 GMT-05:00

Resolved
New CSPM reports are not available in the platform
Posted Dec 12, 2024 - 08:00 GMT-05:00