Impact
An unknown number of users encountered difficulties with the Automated Software integration due to machine scaling issues in the CI. The issue started on UTC-5 24-02-06 08:00 and was proactively discovered 1.1 days (TTD) later by the product team during their regular workflow. The problem was resolved in 2.8 hours (TTF), resulting in a total window of exposure of 1.2 days (WOE) [1].
Cause
The workers' disks were nearly complete, leading to issues when the worker’s operating systems slightly increased in size.
Solution
The size of the workers' disks was increased [2].
Conclusion
The workers' operating systems gradually grew beyond the disk capacity, leading to the issue. Increasing disk size by 50% mitigates future occurrences. IMPOSSIBLE_TO_TEST