Summary
Between Tuesday, July 16, 2024, 06:35 UTC to 09:00 UTC, all users of the Infinity Portal (EU and US regions) experienced degraded performance of the login and using cloud applications hosted in the portal.
The event was triggered by an extremely high load on our internal portal’s services. Our internal alerting and client reports were clear to point on a major issue.
The high load caused database latency issues. The incident was mitigated eventually by restarting some services, identifying the origin of the load which made it possible for the services to become healthy again and cope with the load. The system then became stable again.
Incident Timeline
Tuesday, July 16, 2024, 06:35 UTC – An alert is triggered and delivered to the oncall team.
Root Cause Analysis
An internal integration caused increased load on our services and DB. The load was partly caused due to an unefficient DB query that needs to be optimized. On top of that, rate limiting should have been utilized to mitigate the effect.
Next Steps
We sincerely apologize for the recent outage of our system. We take our availability very seriously and we understand that this outage has caused you inconvenience. We appreciate your patience and understanding during this time. Further steps we are planning to take: