This tool doesn't work for big data, but it can help you get your data ready to be analyzed. Here's why.
Most IT professionals have seen business processes in which users had to key documents like invoices into one system and then rekey the same information into another system because there wasn't an easy way to automate integration between the systems. In analytics projects, we've also seen where, as part of the data cleansing preparatory process, users manually cull through data to find address fields, ZIP codes or names that are either incomplete or duplicated by other entries, then have to correct them by hand.
SEE: Report: SMB's unprepared to tackle data privacy (TechRepublic Premium)
This is painstaking manual work for users, and it dramatically slows down business processes. Yet, if this work is left undone, business processes don't flow, and the accuracy of analytics outcomes is at risk because of poor data quality.
This begs the question: Can a technology like robotic process automation (RPA) do some of this painstaking data cleansing work?
What is RPA?
RPA is software that partly or fully automates human activities that are manual, rule-based, and repetitive. RPA does this by replicating the actions of humans in rote tasks such as data entry.
SEE: Navigating data privacy (free PDF) (TechRepublic)
Here's how RPA would work in the above example. The user keys in new invoice data once. After this, the automation software takes over. It does this by scraping the data off the screens that the user has entered and then moving this data into other systems that also require it.
This ensures uniformity of data between systems. Business and data edit rules are also able to be coded into RPA that have the ability to normalize or correct data to standards that the business or its systems set.
Where RPA fits (and doesn't fit) in data cleansing
Because you can program your own data edit and normalization rules into an RPA routine, there is the ability to automate the manual work that users must sometimes do to ensure high-quality data for analytics.
There are also some limitations. For instance, RPA can only operate on standard, structured, transactional data. It does not work on big data.
But RPA is a tool that can be added to an analytics tool set that uses big data.
SEE: Can AI replace human decision-making? Most companies say no, but it can help (TechRepublic)
Most analytics use data that are a combination of both structured and unstructured data. For instance, if you want to model the incidence of COVID-19 among residents in your city and map the hot spots, you have to combine the transactional data from medical systems with mapping tools on the big data side.
It's incumbent to cleanse all of this data to guarantee true results. While your data staff will use specialized tools to cleanse unstructured big data, they can also plug in RPA to cleanse the transactional, structured data that is a part of the analytics.
Over time, new business rules for RPA can be developed that improve performance, In some cases, organizations have even used machine learning to train the RPA logic for continuous process improvement and higher-quality transactional data.
Consider RPA an option in your analytics
While RPA's primary purpose is automation of transactional data entry that saves end users time, it can also be used as a tool to assist in the upfront cleansing of transactional data that is subsequently used in analytics.
In this respect, IT can leverage tools like RPA that at first glance seem unrelated to the analytics data cleansing process, but that help IT staff and data scientists save valuable time just as it is reducing time for end users.
- How to become a data scientist: A cheat sheet (TechRepublic)
- Big data's role in COVID-19 (free PDF) (TechRepublic download)
- Power checklist: Local email server-to-cloud migration (TechRepublic Premium)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)
- Best cloud services for small businesses (CNET)
- Big data: More must-read coverage (TechRepublic on Flipboard)