Users are experiencing session timeouts in the EU and US regions of Infinity Portal, we are investigating
Incident Report for Check Point Services Status
Postmortem

Summary
Between Tuesday, July 16, 2024, 06:35 UTC to 09:00 UTC, all users of the Infinity Portal (EU and US regions) experienced degraded performance of the login and using cloud applications hosted in the portal.

 

The event was triggered by an extremely high load on our internal portal’s services. Our internal alerting and client reports were clear to point on a major issue.

 

The high load caused database latency issues. The incident was mitigated eventually by restarting some services, identifying the origin of the load which made it possible for the services to become healthy again and cope with the load. The system then became stable again.

 

Incident Timeline

Tuesday, July 16, 2024, 06:35 UTC – An alert is triggered and delivered to the oncall team. 

  • Tuesday, July 16, 2024, 06:45 UTC – The team started investigating the alert cause
  •  Tuesday, July 16, 2024, 07:00 UTC – The team identified faulty services due to Startup failure.
  •  Tuesday, July 16, 2024, 08:00 UTC – A high load on the DB is noticed.
  •  Tuesday, July 16, 2024, 08:30 UTC – Identified the cause of the load. An app is triggering a retry mechanism heavily
  •  Tuesday, July 16, 2024, 08:45 UTC – Added additional DB resources which helped the load
  •  Tuesday, July 16, 2024, 09:00 UTC - Load reduced, system back to normal

 

Root Cause Analysis
An internal integration caused increased load on our services and DB. The load was partly caused due to an unefficient DB query that needs to be optimized. On top of that, rate limiting should have been utilized to mitigate the effect.

 

Next Steps
We sincerely apologize for the recent outage of our system. We take our availability very seriously and we understand that this outage has caused you inconvenience. We appreciate your patience and understanding during this time. Further steps we are planning to take:

  •  Refactor the problematic 'heavy' DB query 
  • Implement rate limit strategy
Posted Jul 17, 2024 - 21:04 UTC

Resolved
This incident has been resolved.
Posted Jul 16, 2024 - 11:01 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 16, 2024 - 08:58 UTC
Investigating
We are currently investigating this issue.
Posted Jul 16, 2024 - 08:05 UTC
This incident affected: Infinity Portal (Infinity Portal EU Region, Infinity Portal US Region).