By Jeff Tickner | 19 Nov 2020
Every IT organization does some kind of testing – whether it is unit testing by developers, QA testing by QA staff or full-on automated regression testing. In each of these techniques, one area that is largely underestimated is the role of test data. Yet how effectively this data is managed has a direct impact on the time spent in testing and the quality of the results. We will look at why test data can be hard to manage, what are the best practices based on our experience and which tangible benefits you stand to gain from a consistent set of test data.
1. Why does test data matter?
Let’s talk first about the foundation of testing, the test data set. Everyone understands that having data to test against is a requirement, but how should that data be managed? Too often I see staff testing against data that becomes more and more corrupted as testing progresses causing friction in the testing process. When a test fails or does not produce the expected results, users have to determine whether those erroneous results were caused by the program changes they are testing or instead by data that is bad or shifting while they are testing. Most commonly they report back to a developer pointing out a defect, who then has the challenge of trying to reproduce the error – impossible if they are using a ‘better’ set of data. You can easily see how bad data can have a snowball effect on wasting time in testing. So how can we manage data to keep testing efficient?
Before we dig deeper into data let’s look closer at testing itself. We should distinguish between two main types, unit testing and functional testing (commonly used for regression testing). Arcad has experience with both types (since we have tools for both) but we aren’t here to talk about our tools but one of the most critical aspects of good testing: good data. Our tools have the same dependency as any other testing, bad data = bad testing.
2. Test data matters, even for unit tests!
Unit testing is normally very granular and involves calling a program or procedure with a set of parameter values and checking the return parameters for the correct values. However, some programs may chain out to files to validate or read additional data before returning values. So while unit testing is not normally considered data dependent, in certain cases (and especially with legacy applications developed on the IBM i platform) we come across monolithic programs that have many business functions behind the input and output parameters. If you don’t have consistent data for your programs to consume during unit testing you may not get consistent results. Unit testing tools often have a provision for the data that will be consumed by the program can be updated before the unit test is executed. Or if not, they can be scripted to cater for this data. However, getting this provisioning right is not only time consuming but may require technical skills in scripting a file update for the unit test.
3. Consistent data is the foundation of functional regression testing
Functional testing is based on executing user-level business functions, just like the test cases that are developed to test and verify that a specific function is operating correctly. Often functional tests are scripted or executed and recorded in a tool (like Arcad Verifier) and the results are checked for consistent values. Once these functional tests are developed they can be used again and again for regression testing, in which one or more functional tests are executed to validate normal application operation after an update to QA. This technique is very dependent on having consistent data so that the functional tests return consistent results and pass.
If the data is changing due to other processes running while the scripted tests are being executed, a false positive can occur that will be interpreted as failure. So here again consistent data = consistent results and enables a much higher level of automation. The importance of having a dedicated set of test data becomes even more apparent.
4. How to ensure the privacy of test data – and refresh it when needed?
The most common reason why businesses do not have multiple sets of test data is because the starting point is production data which can have millions of records and/or sensitive data. Arcad has a tool to handle that case as well (DOT Anonymizer) but that is another topic and we are here to talk about managing that data to your advantage. If you build a good set of test data that is big enough to support all of your test cases but no bigger, it may contain sensitive data that is of high value to your organization and which requires masking and otherwise careful handling. Having a copy of that set aside for refreshing QA environments ideally in a SAVF so it cannot be inadvertently updated is a recommended best practice. Alternatively building a function to refresh that data from production (or using an Arcad tool) is an option and will also provide a means of refreshing test data with more recent production data. Either method means that when a set of test data gets corrupted it is easy to refresh it with good data and, in the case of automation, with consistent values.
This ability to automatically refresh data is valuable for all the testing you do, even developer-driven unit testing which may be dependent on data read from files. Certainly, functional testing by QA needs the ability to refresh their data easily since they are trying to uncover bugs that might corrupt their test data as they are testing. Even using testing tools that manage data updates through a rollback process can leave behind bad data if the QA staff hits a significant error and the testing script process is interrupted.
5. To sum up: continuous test (CT) needs consistent data!
We’ve seen that good consistent data is key to all forms of testing and we should be able to easily restore it when it’s corrupted. It is a bonus if that process can also be used to refresh existing data with more current or realistic values. We need multiple sets of data to support concurrent testing efforts without conflict so this set of data should be a subset of the complete production data. This provides the foundation for our testing methodology and it is the very basis of automating testing.
Consistent data providing consistent results means that not only can the test cases be scripted, but also that defects can be detected automatically during an automated test run. That translates to way fewer manual actions – faster results, less cost overall. It is an often-ignored prerequisite for true continuous test and an optimized DevTestOps cycle.