Numerous Google Cloud services, including Google Vertex AI Online Prediction, Dialogflow CX, Agent Assist, and Contact Center AI, among others, were down due to an issue that affected users in the U.S., Southeast Asia and Europe for a duration of three hours and 53 minutes.
“To our customers who were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability,” Google said in a statement.
The tech giant said the root cause of the outage was that on June 8, users were sending a new type of request, which caused occasional crashes (called “segmentation fault”) on some servers handling AI responses.
When a server crashed it would automatically restart and incoming requests would be redirected to other working servers.
Initially, the requests were not a problem, as there were enough healthy servers to handle the load, Google said. However, when the number of these crash-triggering requests increased, too many servers were crashing at once meaning there were not enough available servers left to keep up with demand. Therefore, users started to see outages.
Google engineers were alerted to the issues on June 10, and at that time, no visible customer impact was noticed. The engineers identified the cause and began to roll out a fix.
However, today, numerous services using Vertex Prediction began experiencing user-facing issues linked to the same root cause. Google engineers identified a connection to the earlier problem and accelerated the rollout of the fix, resolving the issue.
Google stated that to prevent the issue from recurring, it will enhance production monitoring to detect early signs of server crashes and strengthen its validation process for feature changes and updates before releasing them to production.
THE LARGER TREND
Vertex AI is utilized by hospitals, digital health startups, research institutions and pharmaceutical companies for diagnostic support, personalized treatment recommendations using patient data, risk scoring, and operational support.
Dialogflow CX and Agent Assist are also increasingly being used within healthcare as clinical support tools and to assist with administrative workflows.
Contact Centre AI is actively used in healthcare for patient scheduling, triage, billing support and virtual front-door services.