Impact
At least 16 groups experienced duplicates in the automatic event reports related to cloning issues. The issue started on UTC-5 24-02-16 15:36 and was proactively discovered 2.8 days (TTD) later by one of our engagement managers, who reported through our help desk that previously reported events, related to cloning issues, were duplicated. The problem was resolved in 3.8 hours (TTF) resulting in a total impact of 3 days (TTR) [1][2].
Cause
Duplicate events were generated during cloning operations on roots due to a validation error caused by a recent change in the email used by the scheduler. This led to confusion in event reporting, with events being created even for roots with unresolved issues. The validation mechanism failed to recognize the scheduler’s new email, allowing duplicate events to be generated [3].
Solution
The email in the BATCH_CLONE_EMAIL constant was updated to match the one used by the scheduler, fixing a validation issue [4]. Additionally, a migration was performed to remove all events generated automatically during the scheduler’s last execution [5].
Conclusion
The issue stemmed from a malfunctioning functional test due to a cache-related problem. Adjusting the test to accurately reflect real-time data has been implemented [6]. This ensures better detection of duplicate scenarios before deployment. INCOMPLETE_PERSPECTIVE