Telstra has faced a number of network outage issues in the past two months, leaving angry customers asking for answers.
The company suffered a major network outage on March 17 with around half of its 16 million mobile customers unable to make calls or go online, similar to a country-wide outage on February 9.
A third incident, which affected around 500,000 of its pre-paid customers losing service, occurred at the start of March.
While the telco offered free data days on two occasions by way of an apology for the issues – the latest of which on Sunday, did not go to plan – it did little to offer any explanation for why the outage occurred.
But speaking to a telecom industry event yesterday, Telstra chief operations officer Kate McKenzie, took the opportunity to address the reason behind the recent mobile network challenges.
She explained that an initial review of the incidents confirmed they were not related, although she admitted that “two of the disruptions were due to delays in processing the registration of mobile devices”.
“While at no time did it suffer a system-wide failure, each of these events did impact varying numbers of our customers and we are working to ensure this does not happen again,” McKenzie said.
“It is important to note that our network is now stable and operating as it should.”
Outages on 9 February
McKenzie went on to explain that on the morning on the 9th February, technical staff were investigating a fault with one of the signaling notes used to manage 3G and 4G wireless data sessions and voice calls in the mobile network.
With evidence of increasing degradation on the health of the node and potential service risk, the issue was escalated and a decision was taken to isolate the node from the network – a standard operational procedure in such an event.
This occurred at around 12:30pm, but unfortunately due to processes not being followed properly the subsequent node restart initiated incorrectly.
“This meant that around 15 per cent of all mobile devices connected through this node needed to re-register when establishing a new voice call or data session,” McKenzie said.
“The mass re-registration of these mobile devices then overloaded the other mobile signaling nodes, impacting approximately 15 per cent of our customers directly and some more at times during the event where they were unable to establish new voice calls or data sessions.”
As soon as the telco identified what had happed it worked to bring customers back online as soon as possible, prioritising voice services over data services – the network was stabilised and all services restored at around 2:30pm.
Outages on 17 March
Later on 17th March, some customers nationally were sporadically unable to make 2G, 3G and 4G voice calls or establish a mobile data session at around 6pm.
Calls between Telstra mobiles were failing intermittently with voice call volumes dropping by approximately 50 per cent – calls to fixed lines and SMS services were largely unaffected and some data services were affected by customers unable to establish a connection.
McKenzie explained that service restoration commenced from 7pm through limiting the volume of 4G signaling for devices reconnecting with the network, and configuration changes were made in the mobile network to speed up recovery.
These changes reinstated network stability.
“Although the user experience was similar, the issue is different to the disruption that occurred in early February,” McKenzie said.
“The problem was caused when a significant number of customers – initially international roaming customers, and then domestic customers as well – were unexpectedly disconnected from the network.”
“When they all attempted to reconnect at the same time, which happens automatically, we saw a period of overload in the database used to register devices,” she said.
She added that the ability of mobile networks to deal with mass re-registration events it not unique and that industry experts have already told the telco that this is a global challenge faced by many in the industry.
Outages on 22 March
Finally, in relation to the service interruption of 22 March.
“Some Telstra mobile, IP Telephony (TIPT) and NBN voice customers may have been unable to make or receive calls intermittently between 11:30am and 12:50pm, primarily in Victoria and Tasmania,” McKenzie said.
This incident effected around 3 per cent of customers and services were restored by around 5.30pm.
McKenzie confirmed Telstra is taking all necessary steps to minimize the risk of another network outage, including a thorough review of the network.
“Changes have been implemented to increase the capacity and path diversity of critical signaling channels and a temporary layer of traffic management protection has been added to minimise the impact of events like the ones we saw on 9 February and 17 March,” she said.
Within a few days the company expects to augment capacity in a key platform (Home Location Register – Front End aka HLR-FE) that manages its customers’ subscription data.
“In conjunction with our global partners Ericsson, Cisco and Juniper we have assembled a team of internal and external engineering experts to do an end-to-end review of our network,” McKenzie said.
“While this work is underway, Telstra Operations has a heightened awareness plan including Executive-level review of any changes planned for the mobile and core IP networks.”
Read the full speech here.