Mitigate Golang Flaky Testing Pain with Rerun Support

Background

● In software API automation testing, users will often encounter issues when conducting automation testing in a heterogeneous environment such as:

○ intermittent unstable network

○ 3rd party API temporary outage

○ temporary OS environment issues

● Test cases failures in such situations do not truly represent product code failure. Triaging these kinds of test failures are often manual and time consuming. To mitigate the risk of false alarms and improve testing efficiency, we need to build a solution to automate rerun upon failure.

● There are similar RERUN features in other languages, like Pytest and JUnit. However, this is the first time in the market that a RERUN functionality is introduced within Golang Testing Framework. It is the first time RERUN feature is implemented for Golang Testing Framework

● In a typical automation testing, users will simply deploy all to-be-tested services in the same environment/network which is less complicated. However, in our team, when conducting API automation testing, we aim to simulate a heterogeneous environment as much as we can. This inevitably exponentially increases the complexity and likelihood of intermittent failures. To achieve higher automation ROI and testing efficiency within heterogeneous environments, the implementation of a RERUN feature is highly necessary.

● We leverage on the Golang Testing framework to conduct API functional testing. Unfortunately Golang Testing framework only supports three test statuses for a test case: PASS, FAIL, and SKIPPED. There is no existing way to represent a “RERUN” case. To overcome this, we choose a new way to represent “RERUN”. Under “RERUN” mode, if a case runs more than once, the previous test result will be marked as “SKIPPED” with the detailed error msg attached by our report cleanup utility which is part of the patent.

● Allowing test reruns on Golang testing framework when test cases fail due to interim issues. Existing Golang testing does not have any rerun functionality.

Solution

● There are two components of the solution. One is to conduct “RERUN”, another to perform report cleanup operation

○ RERUN:

■ FIG.1 illustrates: Upon test case failure, a test case rerun will be triggered to tackle interim failure issues.

■ FIG.3 illustrates: the core code implementation of RERUN

○ Report Cleanup:

■ FIG. 2 illustrates: Upon completion of automation test suites, the Report Cleanup will tidy up the report such that rerun-ed cases will be marked as skipped, whilst the final passed and failed rerun-ed cases will be marked respectively. ■ FIG. 4 illustrates: The core code implementation of Report Cleanup

■ FIG. 5 illustrates: An example of a test case being skipped, where the next test case has the same name.

● FIG. 1 Golang Rerun

[102] The Golang testing will start as per normal.

[103] The test suite will initialise with the necessary setups or data preparation.

[104] Iteratively, each test case will start to run beginning from the first one.

[105] If the test case passes, rerun will not be triggered. And will therefore continue to the next test case (if any). If the test case fails, proceed to [106]

[106] Upon failure, Golang will perform a check on the environment variable whether “RERUN” is set. If it is, the same case will be triggered again [104]. Otherwise, record the current test case as “FAIL” and proceed to the next test case [103].

[107] If the test suite has not ended, proceed to the next test case [103]. Otherwise, the process ends [108].

Example (OS environment Rerun flag set to true) :

	Initial Run Rerun
Test Case A	Pass nil
Test Case B	Fail Pass
Test Case C	Fail Fail

● FIG. 2 Report Cleanup

[201] When testing completes, a .xml file is generated.

[202] The report cleanup will be triggered to clean up the .xml file.

[204] For each of the test cases, the tool checks if it is the same test case name as the next test case. If yes, [205]. Else, the status (Pass/Fail) remains the same [206].

[205] If it is, then the current test case will be marked as “Skipped”.

[206] Checks if there’s any more test cases to be processed. If yes, proceed to [203], else the process ends [207].

Example (illustration of sample .xml file):

Test Execution Report	Status (before cleanup) Status (after cleanup)
Test Case A	Pass Pass
Test Case B (initial run)	Fail Skipped
Test Case B (rerun)	Pass Pass
Test Case C (initial run)	Fail Skipped
Test Case C (rerun)	Fail Fail

● OS Environment Rerun flag must be set to true before testing begins.

○ Eg. export RERUN=true

● It is impossible because the current Golang testing framework doesn’t support RERUN and we need an automated RERUN when a test failure occurs.

● Allowing multiple reruns (based on configurable parameters). Currently our solution supports only rerun once. We can extend it to support multiple reruns by configuration.

● This solution can be applied to any field as soon as they use the golang testing framework, it is not limited to use it only under an API testing scenario or micro service scenario.

Outcome

● Pursuit of better engineering productivity and achieving better engineering ROI is the reason why we invest resources on API automation testing. However, triaging test failures is truly time consuming. In a heterogeneous environment, it is not rare to see a test case fail due to unstable network, OS, and 3rd interim outage. Having a tool to support test case RERUN when it fails in the previous run, can increase testing stability and improve engineer productivity.

● For example, if we have 10K cases running in a complicated heterogeneous environment, and have a total of 1% (100) fail rate – 0.2% (20) due to product code issues, 0.8% (80) due to interim failures – if 1 case requires 5mins to triage, it will require 500min (or 8hrs, ~one human day). With the present solution of rerun function, we can reduce interim failures down to 0.01% (10), effectively saving ~6hrs per day, even on weekends and holidays.

● This solution can be used in any scenarios as soon as users choose Golang testing to conduct automation testing, especially API testing. If a company utilizes many Golang services, this solution should be able to help the organization to achieve better engineering ROI.

Published by Huang Luohua Locke