Regex handling in deqp-runner has high memory overhead for dEQP
We are experiencing memory limit issues in the deqp-runner due to the improper handling of dots (.
) in the dEQP test case expectation files. Dots, which are typically used to separate subgroups, are being interpreted as regex wildcard characters, causing the underlying regex library to compile them unnecessarily. This results in memory overflows in some cases, especially as the size and number of files increase, leading to performance degradation.
Proposed Solutions:
-
Primary Fix (Implemented by !72 (merged)): Adjust the deqp-runner to treat lines that are not explicitly intended as regexes as fixed strings. This includes requiring all inputs to explicitly escape dots when they are meant to be interpreted as regex wildcards.
-
Secondary Option: Increase the memory limit using the
RegexSetBuilder
. However, as highlighted in the regex documentation, this approach might not be scalable due to potential exponential growth in the automata size.
These changes aim to ensure that our naming conventions for readability do not inadvertently affect performance or cause legitimate jobs to fail due to large expectation files.
Additional Context and References:
- This issue was first identified during a job execution, where the regex set exceeded the allowable memory size. The problematic interpretation of dots as regex characters has led to significant stability concerns.
- Reference to the problematic expectation file and job log:
Please review the adjustments proposed in this merge request and provide feedback or suggest further optimizations. This resolution is essential for maintaining the efficiency and reliability of our testing infrastructure.