Between Friday, Sep 09, 2022, 15:31 UTC to 17:40 UTC, most users of the Infinity Portal in the US region have experienced a partial outage of the login and usage of cloud applications hosted in the portal.
The event was triggered by an AWS Systems Manager incident in the N. Virginia region affecting a critical component in our system that consumes its configuration and credentials from the AWS SSM. Our internal alerting and client reports were clear to point on a major issue.
We have managed to quickly work around that dependency and recovered our system.
Friday, Sep 09, 2022, 15:30 UTC – Reports start to gather on issues with using the portal.checkpoint.com in the US region. Also an alert is triggered. A war room is created to diagnose the issues.
Friday, Sep 09, 2022, 15:36 UTC – AWS posts for the first time about an SSM incident.
Friday, Sep 09, 2022, 16:55 UTC – Root cause and scope of the incident are clear. The status page is updated.
Friday, Sep 09, 2022, 17:25 UTC – We are applying a work around to avoid the dependency with SSM and recover the system. The system shows signs of recovery.
Friday, Sep 09, 2022, 18:01 UTC – It’s clear the system is back to being fully operational. No reported issues by clients for meaningful time. Closing the incident.
The Infinity Portal uses several managed services by AWS. One of them is AWS Systems Manager which is used in one critical component of the system to load configuration and secrets upon startup. Once it became degraded that critical component malfunctioned.