Self test breakage on armhf and ppc64el in spice 14.2
Hi, I was taking a look at spice 0.14.2 and realized that the build is breaking on armhf and ppc64el. I'm on Ubuntu Eoan and the issue is reproducible in a local build on ppc64el. On the other hand amd64, arm64 and i386 build fine.
Build logs can be found in my experimental PPA. Here you can see and compare working and failing ones.
On y ppc64el system I fetched all the usual build dependencies and ran a build in a container. The issue reproduced and at first looks like this:
../../test-driver: line 107: 35292 Aborted (core dumped) "$@" > $log_file 2>&1
FAIL: test-qxl-parsing
[...]
FAIL: test-qxl-parsing
======================
/server/memslot-invalid-addresses: **
Spice:ERROR:test-qxl-parsing.c:110:test_memslot_invalid_addresses: stderr of child process (/server/memslot-invalid-addresses/subprocess/slot_id [35295]) failed to match: *slot_id 1 too big*
stderr was:
FAIL test-qxl-parsing (exit status: 134)
It seems after the build I can run the test as-is and trigger it (without the build system):
./server/tests/test-qxl-parsing
/server/memslot-invalid-addresses: **
Spice:ERROR:test-qxl-parsing.c:110:test_memslot_invalid_addresses: stderr of child process (/server/memslot-invalid-addresses/subprocess/slot_id [35440]) failed to match: *slot_id 1 too big*
That file is not stripped so maybe debug info shows anything. Both functions are in the binary.
- memslot_get_virt (supposed to reply)
- test_memslot_invalid_addresses (the test)
As expected there isn't much to see on the side of test_memslot_invalid_addresses
.
The abstraction by g_test_* hides most of it so I only know what I knew before that the stderr isn't containing the expected content.
With set follow-fork-mode child
I get to the backend that is spawned for the test.
Unfortunately slot_id is optimized, so I inserted -O0
into the build options and ran the same again.
To make sure I don't hit (by accident) the first check for groupid I commented this section in server/tests/test-qxl-parsing.c
// g_test_trap_subprocess("/server/memslot-invalid-addresses/subprocess/group_id", 0, 0);
// g_test_trap_assert_stderr("*group_id too big*");
Still it fails on ppc64el
and works on amd64
.
Entering gdb with the same settings again now shows that on ppc64 memslot_get_id(info, addr)
really is returning 0 while on x86 it is 1 and due to that triggering the bug on just ppc64.
ppc64el:
Thread 2.1 "test-qxl-parsin" hit Breakpoint 1, memslot_get_virt (info=0x7fffffffe860, addr=0, add_size=16, group_id=0) at memslot.c:100
100 if (group_id >= info->num_memslots_groups) {
(gdb) p *info
$1 = {mem_slots = 0x1002365a0, num_memslots_groups = 1, num_memslots = 1, mem_slot_bits = 1 '\001', generation_bits = 1 '\001', memslot_id_shift = 63 '?', memslot_gen_shift = 62 '>', internal_groupslot_id = 0 '\000', memslot_gen_mask = 1,
memslot_clean_virt_mask = 4611686018427387903}
(gdb) p group_id
$2 = 0
(gdb) p info->num_memslots_groups
$3 = 1
(gdb) n
105 slot_id = memslot_get_id(info, addr);
(gdb) n
106 if (slot_id >= info->num_memslots) {
(gdb) p slot_id
$4 = 0
amd64:
Thread 2.1 "test-qxl-parsin" hit Breakpoint 1, memslot_get_virt (info=0x7fffffffe0a0, addr=18446744071562067968, add_size=16, group_id=0) at memslot.c:100
100 if (group_id >= info->num_memslots_groups) {
(gdb) p *info
$1 = {mem_slots = 0x5555556e75b0, num_memslots_groups = 1, num_memslots = 1, mem_slot_bits = 1 '\001', generation_bits = 1 '\001', memslot_id_shift = 63 '?', memslot_gen_shift = 62 '>', internal_groupslot_id = 0 '\000', memslot_gen_mask = 1,
memslot_clean_virt_mask = 4611686018427387903}
(gdb) p group_id
$2 = 0
(gdb) p info->num_memslots_groups
$3 = 1
(gdb) n
105 slot_id = memslot_get_id(info, addr);
(gdb) n
106 if (slot_id >= info->num_memslots) {
(gdb) p slot_id
$4 = 1
It might be worth to note that amd64 actually has addr
being set when called, while ppc64 doesn't (=0).
I think (not entirely sure of the plumbing/infrastructure here) that comes from the call in test_memslot_invalid_slot_id
.
The backtrace is the same on x86/ppc64
(gdb) bt
#0 0x000000010000e868 in memslot_get_virt (info=0x7fffffffe860, addr=0, add_size=16, group_id=0) at memslot.c:106
#1 0x000000010000d354 in test_memslot_invalid_slot_id () at test-qxl-parsing.c:101
#2 0x00007ffff70b05dc in () at /lib/powerpc64le-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff70b038c in () at /lib/powerpc64le-linux-gnu/libglib-2.0.so.0
#4 0x00007ffff70b038c in () at /lib/powerpc64le-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff70b038c in () at /lib/powerpc64le-linux-gnu/libglib-2.0.so.0
#6 0x00007ffff70b0c6c in g_test_run_suite () at /lib/powerpc64le-linux-gnu/libglib-2.0.so.0
#7 0x00007ffff70b0d20 in g_test_run () at /lib/powerpc64le-linux-gnu/libglib-2.0.so.0
#8 0x000000010000e2f0 in main (argc=1, argv=0x7ffffffff528) at test-qxl-parsing.c:329
So all the args should be from that test function.
The addr
arg is defined as 1 << mem_info.memslot_id_shift
The shift is defined as uint8_t memslot_id_shift;
and in both cases set to 63.
ppc64el
(gdb) p mem_info
$7 = {mem_slots = 0x1002365a0, num_memslots_groups = 1, num_memslots = 1, mem_slot_bits = 1 '\001', generation_bits = 1 '\001', memslot_id_shift = 63 '?', memslot_gen_shift = 62 '>', internal_groupslot_id = 0 '\000', memslot_gen_mask = 1,
memslot_clean_virt_mask = 4611686018427387903}
amd64
(gdb) p mem_info
$7 = {mem_slots = 0x5555556e75b0, num_memslots_groups = 1, num_memslots = 1, mem_slot_bits = 1 '\001', generation_bits = 1 '\001', memslot_id_shift = 63 '?', memslot_gen_shift = 62 '>', internal_groupslot_id = 0 '\000', memslot_gen_mask = 1,
memslot_clean_virt_mask = 4611686018427387903}
The target variable is of type QXLPHYSICAL
Which is from spice-protocol at (0.14) and defined as typedef uint64_t QXLPHYSICAL;
So we have 1 << (uint8_t=63) => uint64_t
delivering 0 in the amd
case and a value on ppc64el
.
1<<63 is actually exactly a full word overflow isn't it? Architectures might handle it this overflow differently and that is what seems to break ppc64el/armhf.
So we'd need to fix that code to more reliably generate an address in the second mem slot right? My assumption was that on armhf/ppc64el the 1 will be shifted, then part of the content lost and then converted. So I thought making it a 64bit value right away might help and in fact it did.
diff --git a/server/tests/test-qxl-parsing.c b/server/tests/test-qxl-parsing.c
index edccfee4..d5559f1c 100644
--- a/server/tests/test-qxl-parsing.c
+++ b/server/tests/test-qxl-parsing.c
@@ -99,6 +99,7 @@ static void test_memslot_invalid_slot_id(void)
init_meminfo(&mem_info);
memslot_get_virt(&mem_info, 1 << mem_info.memslot_id_shift, 16, 0);
+ memslot_get_virt(&mem_info, 1ULL << mem_info.memslot_id_shift, 16, 0);
}
static void test_memslot_invalid_addresses(void)
This would help IMHO. I'm reaching out to you with this bug report hoping for some guidance until we have that fixed, especially as I got into quite some assumptions eventually. Lets see if gitlab allows me to provide a PR, but even if not this one line change should be easy to merge.