Between Monday, May 6, 2024, 12:39 UTC to 13:55 UTC, all users of the CloudGuard (US region) experienced degraded performance and failure to login.
The event was triggered by an extreme load on our internal services caused by internal activity and external API calls.
Our internal alerting and client reports were clear to point on a major issue.
The high load caused CloudGuard database to stop functioning.
The incident was mitigated eventually by recovering the database.
The system then became stable again.
Thursday, May 6, 2024, 12:39 UTC – An alert is triggered. A war room is created to diagnose the issues.
Thursday, May 6, 2024, 12:54 UTC – It’s clear an incident has started. Database is started to be recovered
Thursday, May 6, 2024, 13:00 UTC – The status page is updated.
Thursday, May 6, 2024, 13:45 UTC – The system shows signs of recovering
Thursday, May 6, 2024, 13:55 UTC – The system is up and running and is being monitored
Thursday, May 6, 2024, 14:45 UTC – It’s clear the system is back to being fully operational. No reported issues by clients for meaningful time. Closing the incident.
It was a rare combination of calls to CloudGuard database that resulted in extreme load that caused it to reach its limits.
It became degraded which caused the major outage.
We sincerely apologize for the recent outage of our system. We take our availability very seriously and we understand that this outage has caused you inconvenience. We appreciate your patience and understanding during this time. Further steps we are planning to take: