Impact
An unknown number of users found increased response times when accessing the Platform. The issue started on UTC-5 23-11-08 12:54 and was proactively discovered 1.2 hours (TTD) later by one of our monitoring tools and staff members, indicating times above 5 seconds of web response in this component. The problem was resolved in 28.8 minutes (TTF) resulting in a total impact of 1.6 hours (TTR).
Cause
In favor of improving technologies that allow simplifying and adding value, a new experimental library was being tested in the authorization module [1]. When deploying a new version of the Platform, the change caused a performance downgrade, leading to increased response times.
Solution
The engineering team reverted the Platform version to the previous one without the changes of the experimental library [2].
Conclusion
A satisfactory and accurate way to measure performance degradation before reaching production has not yet been implemented. The engineering team is still investigating ways to have these measures in place to keep such situations under control. IMPOSSIBLE_TO_TEST