Use cases for analyzing data
Ron Bodkin, founder and president of Think Big, hosted a panel titled “The key to unlocking value in the internet of things? Managing data!” at the Teradata PARTNERS 2016 conference in Atlanta, Georgia, and emphasized that a variety of data is a key element to any IoT system, and that figuring out what data is valuable is essential.
He gave a number of use cases associated with analyzing data that comes from internet of things devices:
- Predictive maintenance
- Search and view product detail
- Identify critical alerts
- Root cause analysis
- Understand usage
Bodkin claims that companies have always been gathering and organizing data that is accessible and useful, but they haven’t modeled it for business intelligence (BI) consumption, or analyst consumption.
A changing landscape of data management
He spoke about the changing landscape of data analytics brought on by IoT:
- JSON-like structures: complex collections of relations, array, map of items
- Graphs: storing complex, dynamically changing, not static relationships
- Binary/CLOB/specialized data: ability to execute specialized programs to interpret and process
As well as new patterns seen in IoT space when looking at big data systems. He elaborated on four of these elements:
- Denormalized facts
- Profile
- Event history
- Timeline
- Network
- Distributed sources
- Late data
- Deep aggregates
- Recovery
- Multiple active cluster
Event history:
- Fact table about common events that allows analytics in context eg. wearable device, telematics
- Stored in columnar format
Timeline pattern:
- Table of actors with event over time
- Device history, usage in consumer journey
- Enable support/analytics on specific items, long-lived analysis
- May have hierarchy of actors (clusters, device, components) or array of events
Network: Ongoing status of configuration
- Parts in assembly
- Related items
- Peer groups
“Most IoT is not about a single device but a complex assemblance of devices and how they work together,” Bodkin said.
Late data:
- Delays from intermittent connectivity, upstream failures
- Linage tracking is critical
- Watermarks to identify when sufficient data has arrived
But there are challenges that come with collecting and analyzing data. Delay is endemic, so it is important to have a system that can account for that by recognizing when enough data is present to get insights out of it. Watermarking is one way to do so by answering ‘when do I have enough data in the system to reliably work with i?’, know when I have enough data to process, and have different times of triggering.
Case Studies
Bodkin introduced two case studies illustrating how Teradata helped companies that required big data analytics.
The first case study featured an unnamed “global manufacturer” of storage devices: hard-drives, SSDs, object storage. It produces hundreds of millions of products each year, built with complex components in geographically dispersed manufacturing sites. Making things even more difficult, the company had compiled five years of data because of warranty reasons. It’s goal was to expose the entire DNA of a device from development, manufacturing and reliability testing, as well as the “living” behavior of device.
The company’s problem was that its engineers were having a hard time finding the correct data, and needed to speed the cycle time for new product development.
Some technical challenges include:
- Data silos across manufacturing facilities
- Difficulty storing and exposing binary and other data types
- Current DW’s unable to keep pace with volume
- No platform for large-scale analytics
Solution
Teradata helped put the data together, solved new problems which allowed the scan of 380 billion test points for 8 million products. Several irregular distributions were found, which allowed the team to identify a software bug that caused failures, saving millions of dollars.
The second case study included an unnamed “global healthcare device manufacturer.”
Their goal was to focus on improving patient outcomes. To do so, it expanded data collection with a storage shift from 50TB to 20PB after beginning the initiative.
The solution included:
- Microservices architecture
- Operation real time analytics
- Data lake to feed warehouse
- Public cloud>security first approach
- Agile production releases