Impact
An unknown number of groups experienced issues with cloning repositories and some scanner tasks. The issue started on UTC-5 24-08-02 08:30 and was proactively discovered 57.6 minutes (TTD) later by a staff member who reported through our internal chat that the repositories in certain groups were stuck in the Queued
state and did not change from that status. The problem was resolved in 6.9 hours (TTF) resulting in a total impact of 7.9 hours (TTR) [1].
Cause
Due to an initialization error, the platform's task workers were unable to find the necessary image to run properly. This resulted in an invalid hash for the image the task workers were looking for, preventing them from functioning correctly.
Solution
The team updated the reference to the correct image, which enabled the task workers to locate and use the proper image [2].
Conclusion
The issue remained undetectable until the task workers' deployment schedule was triggered. To prevent similar problems in the future, it is essential to monitor and map the relevant configuration files for changes to the makes
images and tags. INFRASTRUCTURE_ERROR