Incident Duration: June 29, 08:00 UTC – June 30, 09:30 UTC (~26 hours) Affected Services: Datatube EU Production - West Europe region Affected Products: CloudGuard WAF, Playblocks, XDR
────────────────────────────
Overview
On June 29–30, 2026, EU production Datatube services experienced a partial degradation affecting data ingestion in the West Europe region, a small portion of events experienced delayed processing during this window.
────────────────────────────
Customer Impact
During the incident window, approximately ~0.2% of total EU ingestion events were affected across products.
Customers using CloudGuard WAF, Playblocks, or XDR in the West Europe region may have experienced: - Delayed visibility of security events in dashboards and reports - Delayed data in queries covering the June 29–30 window
────────────────────────────
Incident Timeline (UTC)
Jun 29 – 08:00 First customer reports received; investigation initiated immediately, cross-team investigation mobilized across infrastructure, networking, and product teams.
Jun 29 – 10:00 First mitigation applied - provided temporary partial relief.
Jun 29 – 13:30 Second mitigation applied, narrowing the root cause scope further, also provided temporary partial relief.
Jun 29 – 18:00 Issue confirmed as affecting multiple products; joint investigation continued.
Jun 29 – 21:15 Capacity expanded to stabilize the environment and protect against further degradation.
Jun 30 – 05:00 Final diagnostic steps executed to isolate root cause precisely.
Jun 30 – 09:30 Root cause confirmed and resolved; all services fully restored.
The investigation required coordinated effort across multiple teams because early symptoms were consistent with several possible root causes. Each mitigation step both provided partial relief and helped narrow the diagnosis - this is standard practice to avoid introducing new risk while a production system is under stress.
────────────────────────────
Root Cause
A misconfiguration in the EU gateway's communication protocols, which was dormant under normal operating conditions, was triggered by load introduced during routine maintenance. Once identified, the fix was straightforward and immediately effective.
────────────────────────────
Next Steps & Action Items
1. Audit and align protocol configurations across all regional gateways - to prevent recurrence in other regions. 2. Add capacity alerting on all regional gateways - to detect such anomalies before they cause customer impact. 3. Improve client-side monitoring to surface connection-level failures and latencies earlier. 4. Introduce stress testing for sensitive network components as part of the change process - validating configurations in an isolated, production-like environment before and after any change to catch hidden issues before they reach production.