A major outage that crashed websites including Spotify and Amazon was triggered by a single customer choosing to update their settings, the software company behind the meltdown revealed on Wednesday.
Infrastructure provider Fastly said it had introduced a new code in mid-May with a hidden bug within it. However, the bug wasn’t triggered until one customer chose to update their settings, setting off a flaw that ultimately took out 85 per cent of Fastly’s network.
Fastly is a San Francisco company that allows customer websites like The New York Times, The Guardian, Spotify, Amazon, Twitter and Reddit to store data like images and videos across mirror servers.
“On May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances,” Fastly head of engineering and infrastructure Nick Rockwell said.
“Early June 8, a customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85 per cent of our network to return errors.
“We detected the disruption within one minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95 per cent of our network was operating as normal.”
Rockwell said the company should have anticipated the outage and apologised.
“We provide mission-critical services, and we treat any action that can cause service issues with the utmost sensitivity and priority.”
Fastly should have identified the bug before the catastrophe, Rockwell added, and the company is now looking into why it failed detection.
The outage also affected the entire UK government’s website, CNN, the Financial Times and the Australian Financial Review.
Customers late on Monday were unable to access those websites with error messages including “Error 503 service unavailable” and “connection failure”.