Big data storage concept art.
Image: knssr/Adobe Stock

Migration of data from on-premises to cloud systems or between multiple cloud systems is a common and complex event across companies of all sizes and industries. The types of data being migrated can range from email messages to Office documents and PDF files to databases, website data and code repositories.

SEE: Data migration testing checklist: Through pre- and post-migration (TechRepublic Premium)

Regardless of the complexity of the data migration you’re completing, it’s important to complete testing in the pre-migration, migration and post-migration stages.

This can be a tedious process: It’s very easy to miss a key step and hurt the overall security, performance and/or accuracy of your migration. However, if you take the time to automate your data migration testing processes, you can save yourself time in the long run while establishing a clear and controlled testing plan.

Jump to:

Types of data migration testing

It’s tricky to define the “types” of data migration testing that exist, because data migration testing can be categorized in a variety of ways. For starters, testing methodologies may look different depending on the type of systems you’re migrating to and from.

SEE: Best practices to follow for data migration (TechRepublic)

For each of the following types of data migration testing, it’s important to consider how much data is stored in the system, how the data is formatted and how it might need to be transformed going forward. Consider also any security or compliance features that are built into the system and how crucial that data is to daily business operations.

With that framing in mind, these are the different types of data migration testing, based on source system format:

  • Database migration testing
  • Operating system migration testing
  • Server migration testing
  • Application migration testing
  • Data center migration testing
  • Cloud migration testing

The type of data migration testing you choose to do may depend on a variety of other factors as well, such as your timeline, your budget, and the in-house resources and teams you have on-hand to support the process.

Factors to consider when testing migrated data

The following ten data migration factors should be tested and confirmed functional to ensure the success of the migration cutover. While many of these factors should be tested pre-migration, several others need to be reviewed throughout the migration process — even post-migration.

  • Accessibility: The data can be accessed on the target source(s).
  • Accuracy: The data is intact and usable.
  • Reliability of transfer: Whether all of the data is transferred over to achieve a 100% transfer rate. Testing this will likely involve comparing dataset sizes on the source versus the target.
  • Reliability of automation: Whether the automated transfers can be counted on to kick off and complete their tasks as expected.
  • Speed: The rate at which data is transferred so as to establish a predictable baseline.
  • Repeatability: Whether the test can be run numerous times with the same results.
  • Error checking: Whether any errors occur in reading, transferring or writing the data elsewhere, and how these errors can be corrected.
  • Security: Making sure only the appropriate individuals and groups have access to the data on the target source(s).
  • Enrichment: Whether the data and access can be optimized on the target source(s).
  • Protection: The data is backed up and can be restored on the target source(s).

Data migration tools

While there are plenty of consumer-focused tools that can move relatively small sets of data from a single system to another, the focus of this article is on business-level migration tools, intended for larger datasets:

  • Apex Data Loader: An open source Salesforce data migrator.
  • AWS Data Pipeline: A solution that migrates data between AWS data stores.
  • Azure Cosmos DB: An open source command line tool that works with various data sources.
  • Azure DocumentDB: An open source data migration tool by Microsoft.
  • Configero Data Loader: A web-based data loader application for Salesforce.
  • Dell EMC Rainfinity: A data migration tool that works across heterogeneous environments.
  • IBM Informix: An SQL-based data migration tool that works across multiple operating systems.
  • Informatica Cloud Data Wizard: A Salesforce data loader application that works with common and custom objects.
  • SnapLogic: An integration platform as a service tool.
  • Stitch Data: A cloud-based ETL platform.

Even the plain old rsync command is a quality data migration tool I myself consider a go-to option. When vetting out a potential data migration vendor, focus on compatibility with your environment, reliability, speed, security and scalability.

Strategies for automating data migration tests

Testing with plenty of time before the official cutover deadline is usually the bulk of the hard work involved in data migration. The testing might be brief or extended, but it should be thoroughly conducted and confirmed before the process is moved forward into the “live” phase.

An automated data migration approach is a key element here. You want this process to work seamlessly while also operating in the background with minimal human intervention. This is why I favor continuous or frequent replication to keep things in sync.

SEE: A guide to effective data migration testing (TechRepublic)

One common strategy is to run automated data synchronizations in the background via a scheduler or cron job, which only syncs new data. Each time the process runs, the amount of information transferred will become less and less.

This is known as trickle data migration, and it works well because most companies use and update a small set of their data on a daily basis. An initial migration of 10TB of data on day one of testing might lead to a migration of merely 30GB of recently changed or updated data during the moments before the actual cutover.

Steps for automating data migration testing

Back up your data

Always make sure to back up your data before proceeding, even if your migration involves merely copying data from source to target. System and human errors can be a fearsome combination; I’ve seen instances of rsync operations gone horribly awry where target data was mistakenly rsynced against source data such that data was accidentally removed.

Identify datasets, source systems and target systems for migration

Identify the data to be migrated and where it is to be migrated. There may be multiple sources and multiple targets involved and different priority levels for different datasets. Ensure you’re only going to migrate data you actually need — consider running a data deduplication solution to streamline your dataset at this point — but be cognizant of any requirements involving data retention policies so you adhere to them.

You should have a full understanding of what is located where. Most crucially, you should know the total amount of data to be migrated. You must ensure you have sufficient resources on the target end, particularly for data storage.

Use a trickle data migration strategy to test and migrate existing data

Whenever possible, plan to implement a trickle data migration copy strategy, where your source is synced to target periodically and only the new files must be transferred in subsequent runs. Obviously, this means your first migration operation will be the longest and most complex. Enlist vendor support as needed.

Identify your automation technique and spot-check its accuracy

Identify the automatic techniques and principles that will ensure the data migration runs on its own. These should be applied across the board, regardless of the data sources and/or criticality, for consistency and simplicity’s sake.

Monitoring and alerts that notify your team of data migration progress are key elements to consider now. Manual data verification on the target end can be conducted via a “spot check” process, but you simply can’t check hundreds or thousands of files on a one-by-one basis.

Apply necessary security measures

Ensure security is properly applied in the source target environments, not only for data protection but to ensure migration tools can function properly. Especially for certain industries and operating regions, it’s also important to consider what data governance and regulatory protocols need to be added or maintained.

Go into live testing with test data

Implement the solution and conduct a live test of irrelevant data. This often involves using dummy files, but you should avoid using empty files; empty files won’t be useful, as you want to confirm the contents appear the same on the target and the source system.

Configure automation and monitor results

Configure and run the automated data migration process and monitor the results. Ensure every element in the Types Of Data Migration Testing in this article is satisfactorily met.

This task, as well as the rest of these steps, can be handled by an internal data migration team, but it may also be necessary to bring in vendor support to implement this level of automation and testing.

Read next: Top cloud and application migration tools (TechRepublic)

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays