Impact
At least one user experienced service interruptions across the platform, the agent, and all API-dependent functionalities. The issue started on UTC-5 25-11-18 08:58 and was proactively discovered 14.4 minutes (TTD) later by one of our monitoring tools, which detected a generalized degradation in service availability, which aligned with a major Cloudflare incident affecting global traffic routing. As a result, several core operations across our ecosystem were temporarily unable to function as expected. The problem was resolved in 43.2 minutes (TTF), resulting in a total window of exposure of 57.6 minutes (WOE).
Cause
Cloudflare experienced a global network performance issue that affected its edge infrastructure. Because Cloudflare is part of the delivery path for our services, the degradation prevented requests from being routed and processed correctly, resulting in timeouts and temporary unavailability.
Solution
No internal action was required. Once Cloudflare resolved the underlying network issue and service performance returned to normal, all affected components on our side recovered automatically.
Conclusion
This event reinforces the operational impact that third-party infrastructure outages can have on our services. We will continue closely monitoring external provider incidents and evaluating ways to improve resilience and early detection for future occurrences. THIRD_PARTY_ERROR