Skip to main content

HighByte Blog / Latest Articles / 6 reasons to clean data at the Edge

6 reasons to clean data at the Edge

John Harrington
John Harrington is the Chief Product Officer of HighByte, focused on defining the company’s business and product strategy. His areas of responsibility include product management, customer success, partner success, and go-to-market strategy. John is passionate about delivering technology that improves productivity and safety in manufacturing and industrial environments. John received a Bachelor of Science in Mechanical Engineering from Worcester Polytechnic Institute and a Master of Business Administration from Babson College.
How much time do you spend cleaning data?
 
If your factory is like most connected operations, you probably have tons of raw data streaming from connected devices to existing enterprise systems, bespoke databases, and a cloud data lake. This architecture often leads to inconsistent or even unusable data for several reasons.
 
We know the Cloud is a key tool for digital transformation. It provides the scalability and storage capacity you need to collect and interpret vast amounts of data coming from the operations level.
 
However, by nature, cloud platforms are IT-focused tools. They structure data differently than operational systems, which means IT must spend a lot of time cleaning the data before it can be used. And if the data moves directly to different enterprise systems, multiple teams across the organization will clean the data independently, leading to different versions of the truth.

At HighByte, we developed our Intelligence Hub for OT users. It’s purpose-built to model and manage plant floor data at the Edge without the need to write or maintain code. With that in mind, we’ve identified six ways OT modeling at the Edge can take your digital transformation to the next level.  
 

1. Give the OT Team Control

Working with the industrial data coming from PLCs, machine controllers, RTUs and smart sensors is an integration challenge. This data was put in place for process control and not the typical Cloud use cases like predictive asset maintenance or traceability. Even when the data is accessed over OPC, it is structured by the underlying device protocols. Therefore, it is not standardized across similar assets or processes, it does not have standard units or units of measure, and it lacks context. All of these challenges are fixable by someone who knows the production machinery and the automation devices controlling them. The OT team is the only team who can effectively decode the data—and their systems are the Edge. Cleaning data at the Edge puts the control and responsibility into the hands of the team most capable and efficient at accomplishing this task.
 

2. Optimize System Maintenance

Over time, the factory changes—you introduce new products, optimize processes, and replace machinery. Each change impacts integrated data systems across the organization. Now you need to perform updates to each individual system that collects, communicates, and processes data related to the changes on the plant floor. This is another example of how a single edge-based system can dramatically increase efficiency: you only need to maintain one system. Conversely, with a cloud-based system, the IT team maintains the applications and integrations. This results in reactive maintenance when data is missing or integrations break. 
 

3. Provide a Single Version of the Truth

Remember that meeting where everyone had a different interpretation of the same data? Maybe the operations leader based her OEE report on a single eight-hour shift, while the executive team used a cloud analytic that measured efficiency on a 24-hour day. Or maybe a supply chain executive calculated a different figure for scrap costs than the operations team after pulling the information from a cloud data lake instead of the MES platform.
 
Contextualizing data and defining metrics as close to the machinery as possible means that all the systems that get the data will be working off a single source of truth instead of performing custom transformations at the ingest of each system.
 

4. Reduce Latency Issues

Cleaning data is expensive and slow. You certainly don’t want to clean it multiple times for different applications. If you’re sending data to the Cloud and then restructuring it before sending it to other systems, you’re going to encounter latency issues. Streaming data at the Edge reduces latency that is common with batch-processed data in the Cloud.

​Cloud and enterprise systems don’t interpret operational data very well. They typically require data to be presented in a different format, such as name-value pairs, rather than the standard operational model of ID, value, quality, and time stamp. Edge-based modeling tools designed specifically for OT provide a standardized way to present the data to multiple systems across the organization in the format and frequency they want to consume it and reduce the latency and expense of batch cleaning in each system.
 

5.  Minimize Costs

Transforming data in the Cloud is not free. When you push “all the OT data” to the Cloud without specific use cases or plans, you’re typically dealing with more data points than you actually need. This increases data ingestion, storage, and the amount of bandwidth consumed. Cleanup in the Cloud also requires processing and secondary storage resulting in more cost. When you transform data at the Edge for a specific usage, you reduce the burden on your cloud system by only sending data that you need at the frequency required for use.
 

6.  Ensure Your Data is Secure

Data security is about leveraging secure protocols, knowing where data is going, and minimizing the distance the data travels and systems it goes through to get to the final destination. Integrations are established over time to different systems.  Knowing what systems are accessing what data from what systems is important information but rarely well documented. Taking a proactive approach to data integrations where a single system is the clearing house for all industrial data integrations and administrators are able to see and manage which data is being sent to which system is critical.

Industrial data starts at the device. Minimizing the actual distance it travels and both the hardware and software applications it passes through is important to ensure security. Pushing all data to a cloud system to be cleaned then pulled down to an on-premises application is not only expensive and slow but is also less secure than just routing through the internal network. 

Successful digital transformation depends on Edge-to-Cloud technologies working together to deliver meaningful data quickly and securely. As Tony Antoun, senior vice president of edge and digital at GE Digital, told Automation World:

​“To enable digital transformation, you have to build out the edge computing side and connect it with the cloud—it’s a journey from the edge to the cloud and back, and the cycle keeps continuing. You need both to enrich and boost the business and take advantage of different points within this virtual lifecycle.”

Get started today!

Join the free trial program to get hands-on access to all the features and functionality within HighByte Intelligence Hub and start testing the software in your unique environment.

Related Articles