Scaling up to the cloud is about being smarter, not working harder
5G Standalone (SA) requires a fundamentally cloud-native approach. The 5G core is built on cloud-native principles, designed to provide flexibility and scale. This approach increases carrier agility, but at the same time, going cloud-native requires a fundamentally different approach to managing operational support systems (OSS) and business support systems (BSS) which help carriers efficiently and reliably manage subscriber services. OSS and BSS systems sit in the center of the telco network, siloed. That needs to change with 5G SA.
It’s ironic that CSPs are undergoing the same digital transformation efforts as many other industries and public sectors, as entire economies are moving to the cloud. Cloud-native operations promise CSPs cost efficiency, scalability, agility and – hopefully – new monetization strategies. But there is both a business culture shock and tech debt that need to be paid when it comes to upgrading OSS and BSS services to handle 5G SA. Going cloud-native requires a different set of skills. In some respects, it’s an evolutionary step forward. But as many things associated with 5G, it’s also a fundamental rethink in terms of how the business operates – not just today, but how it needs to be built to operate for the future.
What is cloud-native service assurance?
Ericsson identifies five principles that should guide CSPs as they journey towards cloud-native service assurance operations. Paraphrasing:
- Choices – cloud-native apps should be infrastructure-independent, to align with new cloud technologies as necessary;
- Decomposition – Comprising modular and reusable software components;
- Resiliency – Responding to problems without service interruption;
- State optimization – Separation of app logic and data;
- Orchestration and automation; emphasizing zero-touch networking principles;
- and openness, or the ability for software apps and components to be modular and easily replaceable as necessary.
The challenges abound: CSPs need to get the balance right by creating environments that allow Virtual Network Functions (VNFs) and Cloud Native Functions (CNFs) to be aligned and operating perfectly, even while functioning on an increasingly distributed hybrid cloud network.
“It’s a journey when you think about it. Most of the vendors in this space are traditional bare-metal vendors,” said Mark Hiseman, director of service assurance strategy and platforms at EXFO. The move to virtual network functions (VNFs) and now to the cloud has required an evolutionary different mindset, he added.
“There’s a massive amount of inertia in the telco industry, which in general has been the last to step into the cloud, from an operational perspective. They’re the last man standing, right? They are really concerned about losing control,” he said.
BSS and OSS functions have been tightly integrated, proprietary efforts. Virtualization may have pushed those efforts forward, but they’re not sufficient to meet the growing need for automation, the desire for optimal performance, and the ever-pressing need to lower operating costs.
“Pre-cloud-native, even VNFs took a long time to deploy. It’s a manual process. Typically, to deploy a function could take months or weeks,” said Volt Active Data’s Senior Director of Product Management Andrew Keene. Automation tools for cloud-native functions can do the same in minutes or moments, he added.
“IT and the telco world are now merging together. Now we’ve been talking about this for twenty years, right? About the fact that eventually IT and network worlds would come together. The cloud is the catalyst for that,” he said.
Scaling OSS and BSS to the cloud
“What does it mean on the OSS side? You have to think about how we’ve always done this,” Hiseman said. From Hiseman’s perspective, it’s about using the data already there in smarter and more efficient ways. In his particular example, he mentioned the need to physically probe networks to assess service levels.
“In theory, we’ve had that technology from the RAN back in the 4G days. As soon as they put traceport on, you just needed someone to process the data. You no longer needed that probe tapped in the network,” he said.
“When that first came out, the reaction was ‘A RAN element can’t possibly offer the data that we get from a probe,’ and lo and behold, it did. Same noise we’re getting at the 5G SA Core level is, ‘What you’re going to get is events, and events aren’t rich enough to troubleshoot problems.’”
Turns out event data isn’t the problem, he said. Instead, it’s an issue of scale – how to manage the data you do need from the data you don’t. The standard mode in OSS has been to capture as much data as possible. That simply won’t scale as operators pivot to cloud-native.
“The problem is you have many layers: this physical machine level layer – what’s the machine doing, how much CPU and RAM is being used, disk I/O. Then you have a Kubernetes layer running. Then you’ve got network functions running, and then you have users on top of all of it,” he said. “So how on earth do you assure the entire stack?”
OSS and BSS vendors can no longer just focus on their own data, he said.
“Now we’re saying that’s not good enough. You need to be able to take data from elements, Kubernetes, your own active network tests, your own data, maybe your competitor’s data, and correlate all of this stuff. And be able to do it in real time, and do it quickly,” he added.
“We’re noticing that operators need to go on that data strategy journey,” he said. Moving the data and processing of that data to the edge is the strategy some are now employing to mitigate the worst of big data processing and cloud bills.
In the cloud, “data has a cost,” said Hiseman, and this requires vendors and operators to be mindful about what’s important to collect versus what isn’t.
“Imagine a RAN detail record has 400 fields. Do you need all 400 to build a KPI? No. You need five, maybe 10. Only when the KPI is a problem do you need more fields.”
So it’s not about collecting all the data, then sorting it out reactively to find anomalies. The new model best suited to going cloud-native is only to look for the data you need to solve the problem and focusing there.
Finding the single version of the network truth
Adaptive assurance is the solution that EXFO emphasizes in this environment. Service assurance systems based on big data collection will be increasingly unwieldy or simply not scalable to the cloud. Big data can fill data lakes much faster than the time it takes to it takes to extract meaningful, actionable small data from it.
“Process at the edge, store what you need,” Hiseman said. But more importantly, as carriers face increasingly distributed and complex network topologies, they need what Wiseman calls “a single version of the network truth…being able to pinpoint problems is something we’re seeing telcos focus more on. Because what they’re finding is that they’re wasting too much time looking for the actual problem.”
“One of the problems the telcos have is that the business model hasn’t been as favorable as they thought, because hyperscalers charge them for everything – data storage, transmission, regional replication. So if you can make sense of the data before you send it off, you can save yourself a lot of costs,” said Volt Active’s Keene.
Standalone 5G use cases and monetization strategies heavily leverage technology which fundamentally changes carrier relationships with their customers. Legacy OSS and BSS systems are optimized for a model where the carrier provides connectivity. But that’s on its head with 5G SA, as the use of private networks, network slicing, massive IoT deployments and other innovations will create myriad Service-Level Agreements (SLAs) which vendors have to abide. CSPs need to design service assurance systems from the ground up to manage the scale and flexibility of the cloud.