As bandwidth and latency issues continue to be hindrances to big data transport, you can expect to see edge computing take a more central role in corporate big data architecture.

Edge computing refers to the generation, collection, and analysis of data at the actual site where data generation occurs, and not necessarily in a centralized computing area like a data center. The technology covers sensor-generated input, robotics, automated machines on manufacturing floors, smartphones and tablets, and distributed analytics servers that are used for “on the spot” computing and analytics.

For instance, if a manufacturer with production facilities around the world wants to automate manufacturing floors and machine-to-machine communications, the ability to control and monitor these communications in real time could be handled within the remote facilities. This eliminates the need to pass continuous streams of machine data over the internet to a central data repository that could be thousands of miles away.

SEE: Free ebook–How to automate the enterprise (TechRepublic)

There is no reason why local data marts at the remote sites couldn’t also maintain this data for historical and analytical purposes, with IT scheduling periodic times at which the data is moved to a central data repository where analytics on a total population of all machine-generated data from all plants could be operated on so it could provide insights into overall manufacturing. This strategy greatly depends on organizing an efficient batch processing workflow.

This is a proven, though not new, strategy. How many times have we heard about batching up data locally, establishing local data repositories to house this data, and then enacting a weekly or a monthly process that collects all of this data and unifies it in a single data repository? This is the stuff of old IT playbooks. And yet, strategies centered around a well-orchestrated and continuous execution of batch processing at multiple sites are not yet written into big data playbooks as often as they could be.

SEE: From cloud to edge: The next IT transformation (ZDNet special report) | Download the report as a PDF (TechRepublic)

The advantages of edge computing

  • By distributing and storing your data into “bite-size” data repositories, you can enhance data aggregation and agility by segmenting manufacturing analytics into specific types of manufacturing and/or geographic regions, without necessarily having to pull data extracts from a centralized corporate database to do the same thing.
  • You can provide real-time analytics directly to managers at your local manufacturing sites, and you can use the strength of your own internal networks to stream the data.

What you need to make all of this come together

  • You must have well-defined information management policies that govern not only the cycles at which real-time and batch data will be processed, but also retention policies for each type of data, as well as the data backup schedules for purposes of disaster recovery and business continuation. Equally important is a list of people who are authorized to access the local real-time and batch data and the corporate-wide data.
  • An analytics team (and their accountabilities) needs to be established so everyone understands who is working on which data repositories and/or data marts.
  • A set of analytics tools and capabilities should be put in place for working data at headquarters and at the edges of the enterprise, as well as for preparing and moving data between data repositories.

Initially, a combination approach with centralized and edge computing of big data may seem complicated in terms of data management and staff management, but once both are in motion, the agility that IT staff and business managers gain will improve operational efficiency and enable the right people to respond to alert situations and to resolve these situations quickly. Just as importantly, the combination of local and centralized analytics for a global organization assists in the management of internet communications bandwidth and latency issues that have yet to be solved.