Move test environment setup into the test files

added Testing label

I have the "just call wet_main() to run" working as a proof of concept, but I started wondering about the test clients.

Should clients be:

fork()'s and exec()'d like they are now, or
only fork()'d but don't exec() because the code to run is already loaded, or
stop forking a new process and use a thread instead, so that everything happens in the one process?

Opinions? @daniels ?

I think multiple threads in one process would be great, especially for debugging.

Not forking means the first sub-test that fails will stop the whole test (that is one Meson test()). The only way around that is that Meson would have one test() for each sub-test, which means each sub-test gets e.g. the compositor started from scratch. It would improve parallelism and test reporting, and cause a lot more work in running the tests in total. I have an idea how to make that neatly happen, but should I?

I'm already making each sub-test runnable in isolation for debugging purposes.

Currently in master we run each Meson test() in parallel, and each sub-test serially within (with forking).

I just realized something.

Running the tests in a thread instead of a forked process means we must stop using assert() if we want to continue running the current executable after a failing test.

That can be left for later though, if necessary. We have no tests we'd expect to fail like that. It does mean the fail counts will not be accurate... unless, I fake a skip count.

Ah, but I can't fake skip counts because it's dead by then.

That can be left for later though, if necessary. We have no tests we'd expect to fail like that. It does mean the fail counts will not be accurate... unless, I fake a skip count.

Right. Given that we don't have expected-fail/expected-skip subtests, any subtest failing is a bug which needs to be fixed immediately. So it's slightly annoying to the developer that they have to fix the subtests one by one rather than being able to see up front a complete list of what's passing and failing, but not catastrophic I don't think.

Improving that at some point in the future with a real test framework (I don't really care which one it is - ZUC, GTest, check, whatever) would be great, but I don't really see a reason to tie 'rewrite all the tests away from assert()' in with 'move more harness setup into the tests themselves'.

Ok. I assume that I should just make multiple runs a.k.a fixture setups work and move the test configuration completely into the test .c files, but not yet consider moving on to a proper test harness like Check.

Test suite hierarchy:

test executables (the individual .c files)
- fixture setup
  - named tests (using the TEST macro et al.)
    - sub-cases (from the data array with TEST_P macro)

This hierarchy also implies that that test setups are completely defined inside the executables, and not partially in meson.build like they used to be.

The fixture setup level is new. It is needed for running the same test executable in different e.g. Weston configurations. This will be used for running screenshot-based tests with both Pixman- and GL-renderer.

Meson would be able to create test() targets separately even down to the sub-case level if we want to, by recording just the number of { fixture setups, named tests } and for each named test the number of sub-cases in meson.build and iterating over them. The test executable would be passed an argument defining the exact test to run: fixturenumber/testnumber.subnumber and an additional argument fixturemax/testmax.submax so that the test executable can verify meson.build file is up-to-date. Hence the only bits we need to maintain in meson.build are the counts of tests for each executable.

This same notation should allow manually running a specific test, without the verification argument.

Creating separate Meson test() down to the sub-level probably becomes a little inconvenient to maintain in meson.build, so maybe it is best to limit to the named test level or fixture level.

The test executable would be passed an argument defining the exact test to run: fixturenumber/testnumber.subnumber and an additional argument fixturemax/testmax.submax so that the test executable can verify meson.build file is up-to-date. Hence the only bits we need to maintain in meson.build are the counts of tests for each executable.

Mm, this would be a good safety check if we want to expose every sub-test directly into the build definition. But I think it points us in the direction of not doing that.

Creating separate Meson test() down to the sub-level probably becomes a little inconvenient to maintain in meson.build, so maybe it is best to limit to the named test level or fixture level.

Right. Even if Meson itself only outputs a single test pass/fail, developers can easily read the raw test log, and GitLab CI can read the JUnit XML which can be exported by our test framework.

Given that, I'd be happiest where we are now, where the build system is only ever aware of separate test source files, and anything to do with subtests is only ever defined within those tests themselves.

Btw. Meson supports TAP from the test executables starting from version 0.50.

What JUnit XML? I don't think we create anything like that. The ZUC framework might, but we're not using that for the most of the tests, and we replaced the program forking harness with Meson.

Since ~~redacted~~ exists, it seems it should be reasonable to get JUnit from Meson's JSON. We use Meson to run each test program, programs might be able to use TAP to report subtests, and then convert the JSON to JUnit if JUnit is the format we need.

Hmm, so not even fixture setup count in meson.build. I'll have to see about that.

Just noticed the script I linked to was LGPL. Removed link.

https://www.bassi.io/articles/2019/04/13/adventures-in-ci/ describes things better, and I think I want to stay away from JUnit for now.

I could look into using TAP for subtests->meson communication, if that is desired?

I'm fine with that, yeah. We can look into TAP -> JUnit to get results visible in MRs later.

I don't think Meson produces TAP, it only consumes it. Meson produces a JSON file of test results and I'm trying to figure out if that format is specified or stable.

Right, Meson produces only a custom almost-JSON format (does not validate as JSON: https://github.com/mesonbuild/meson/issues/5458), but jpakkane says it is meant for IDEs to consume, so sounds like the format is stable.

The idea of making Meson write out proper JUnit came up in IRC with jpakkane, and sounded like it could be acceptable to upstream. Just needs someone to work on it, and that is not me. Given GNOME seems to have interest in bridging Meson tests to Gitlab, maybe someone from there?

@dbaker said in IRC that he might be looking into making Meson output JUnit because of Gitlab.

mentioned in issue #296 (closed)

Traditionally Weston test suite program source files do not contain main() at all. It comes from a static library instead. How surprising or confusing is that?

I'm contemplating between keeping that style vs. requiring a trivial boilerplate main() that needs to be copied to all programs.

gtest does exactly what you're doing (main in a static lib) so it's not an alien idea for unittesting. Piglit does something kinda down the middle, there's macros to generate all of the boilerplate top and bottom of the main() function, but you can put your own code in the middle, ala:

START_MAIN
   <stuff>
END_MAIN

Just ideas to think about.

Thanks. At the moment I'm thinking of an approach where main() lives in a static lib, and one uses macros to hook up optional setup functions. It'll be clear what I mean once I have some code to show.

In my current draft, the standalone tests remain as is wrt. main() or boilerplate.

Client tests need something like this added:

static enum test_result_code
setup(struct weston_test_runner *runner)
{
	struct compositor_setup setup;

	compositor_setup_defaults(&setup);
	setup.logging_scopes = "log,proto,test-harness-plugin";

	return weston_test_runner_execute_as_client(runner, &setup);
}
DECLARE_FIXTURE_SETUP(setup);

Plugin tests need the same as above, except call a different function at the end.

Client tests (and plugin tests) that need to run the compositor multiple times differently (a.k.a with different compositor fixture setups) do this:

static const enum renderer_type renderers[] = {
	RENDERER_PIXMAN,
	RENDERER_GL,
};

static enum test_result_code
setup(struct weston_test_runner *runner, const enum renderer_type *arg)
{
	struct compositor_setup setup;

	compositor_setup_defaults(&setup);
	setup.renderer = *arg;
	setup.width = 320;
	setup.height = 240;
	setup.shell = SHELL_TEST_DESKTOP;
	setup.logging_scopes = "log,test-harness-plugin";

	return weston_test_runner_execute_as_client(runner, &setup);
}
DECLARE_FIXTURE_SETUP_WITH_ARG(setup, renderers);

weston_test_runner_execute_as_{client,plugin}() will start the compositor and run the tests from this program once (from a thread if client, from a compositor idle callback if plugin).

I'm pretty happy with that.

I should probably separate standalone programs from compositor-using programs and give each a different Meson dependency() object to link with. Then standalone programs do not pull in compositor libs for no reason. The test program definition in meson.build could also make use of this:

{ 'name': 'roles', 'type': tt_compositor },
{ 'name': 'matrix', 'type': tt_standalone },

On second thought, no. Scratch that. We want to rely less on meson.build for test setups, not more.

Yeah, I definitely like the idea of keeping the environment out of the build. One thing I would like to ultimately see is the tests being able to (relatively easily) inject their own configuration directly, rather than having separate text files. Giving more control to the test itself seems like the right thing to do there.

I have no real opinion on where main() should live, but I think the salient point is how easy each option makes it to fix what you described above, where an assert() failure prevents other subtests from running. (The answer there is probably recovering the useful bits of ZUC and actually using it consistently everywhere.)

Where main() lives has no relevance to whether the first assert() failure kills the whole test set. When we decided to never fork(), we also implicitly decided that that assert() failing is fatal to the whole set. So if we want the test set to continue beyond a failure, we must rewrite everything that uses assert() to clean up and return instead.

ZUC is totally reliant on fork()ing to continue testing AFAIU. That's why it has the IPC stuff etc.

Right. What I meant is that ZUC already has the infrastructure to handle assertions in a way which doesn't necessarily involve killing the process. If that's something we can take and modify to ultimately avoid forking, I think that would be helpful.

('One thing I would like to ultimately see' -> 'in the future it would be awesome if we could'.)

Ah, sure.

mentioned in issue #297

One more complication to re-running a set of tests is that we have, especially screenshooting, tests that write out result files. Fixture iteration would need to be part of the file name, or each iteration will overwrite the files of the previous one.

This was implemented as get_test_name() returning a name with fixture number embedded.

mentioned in merge request !287 (merged)

mentioned in issue #311

mentioned in merge request !308 (closed)

!287 (merged) is merged and solves everything needed in this issue.

closed

Move test environment setup into the test files

Child items ...

Activity

Admin message

Admin message

Move test environment setup into the test files

Activity