Network outages can haunt mobile telecom operators, but there are ways to avoid the long-term embarrassment.
Too many users, not enough bandwidth, “busy hour congestion” and a plethora of other network issues – it’s all run of the mill for cellphone service operators these days, according to numerous industry reports, such as a new one from Spirent Communications that polled service providers about their service issues. And for cellphone companies that have too many service outages and slowdowns (with names lending themselves to tongue-in-cheek takeoffs that customers use to express their frustration), the pain of those outages is even greater.
As it turns out, outages and slowdowns on the Vodafone network in Australia were common enough that pundits coined a nickname for the company – “Vodafail” – that gave voice to their complaints after a string of network outages in late 2010 and 2011. That was a long time ago in cellphone industry terms, but the name has stuck – with websites applying the term as needed every time there is a Vodafone problem (the latest during a seven-hour outage on Sep. 25 ). A related term, “total inability to support usual performance,” has been used to describe outages by the company as well as its competitor Telstra (the term is also popular in the U.K. to slam service outages).
That’s perhaps a bit unfair; to the companies’ credit, both tried hard to resolve the issue and the more recent uses of the term could be chalked up to “media me-too” carpetbagging. Nevertheless, the lesson of “Vodafail” should give anyone responsible for a network pause; it’s hard to build up a reputation for service, but very easy to lose it and extremely difficult to regain it once lost.
And many will lose it – often due to circumstances that can’t be avoided. According to the Spirent study, “network congestion and overload” was named by 81% of respondents as the most common cause of network outages and degradation, far more than any other factor. After all, if everybody on a cell network decides to make a call at the same time the company can’t really be held to blame, can it? And considering how much companies like Vodafone are spending to remediate service problems – the astronomical sum of $20 billion per year – the company is certainly doing what it can to sort out the problems, isn’t it?
Maybe – or maybe not. The answer depends on how they are spending that $20 billion. The reasons for these outages run the gamut – and often it is very difficult to isolate a reason. In fact, according to a University of Chicago study, the most common reason for service outages is “unknown.” In a study of 516 outages, the examining team could not determine the root cause of 294 (48%) of them. That makes sense; if companies like Vodafone could figure out exactly what the cause of their network outages were, they’d probably dedicate their share of the $20 billion industry spend on fixing it, if only to avoid pejoratives like “Vodafail.”
Since, as the studies indicate, getting that clear picture of cause and effect in outages and slowdowns is difficult, if not impossible, Vodafone and its cell service competitors can do the next best thing: develop a big data, infrastructure-wide approach to IT stability.
With most telecommunications today IP-based, what happens in the network can also be an important factor in outages and slowdowns. Configuration changes – software upgrades and adjustments, patches, etc. – can take place in different parts of the network on a daily basis. While most changes are performed without a hitch, there is typically no visibility to the implications and risks introduced by such modifications on overall stability, service availability and disaster recovery readiness.
There are many monitoring tools that indicate when something isn’t working properly or when remediation is needed, but what companies really need is a system that will tell them in advance that things are likely to go south. Such systems can perform automated analysis, alert system managers in advance where a problem is likely to show up and guide them to make changes and fixes to prevent problems from happening.
With an advance warning system in place, managers could reroute resources where needed in order to ensure quality of service. If the culprit is a software glitch – a configuration issue that is slowing things down – the system will point that out as well.
Systems that can automatically detect changes in the network and inform administrators of potential issues could go a long way to preventing service outages – making customers and managers much happier, and making terms like “Vodafail” passé.
Editor’s Note: In an attempt to broaden our interaction with our readers we have created this Reader Forum for those with something meaningful to say to the wireless industry. We want to keep this as open as possible, but we maintain some editorial control to keep it free of commercials or attacks. Please send along submissions for this section to our editors at: dmeyer@rcrwireless.com.