igt@xe_drm_fdinfo@subtests - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent

The CI Bug Log issue associated to this bug has been updated by Vinay.

New filters associated

LNL: igt@xe_drm_fdinfo@drm-busy-exec-queue-destroy-idle - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent

A CI Bug Log filter associated to this bug has been updated by rveesamx.

Description: BMG LNL: igt@xe_drm_fdinfo@drm-busy-exec-queue-dbtestroy-idles - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent

Equivalent query: runconfig_tag IS IN ["xe"] AND machine_tag IS IN ["LNL", "BMG"] AND ((testsuite_name = "IGT" AND test_name IS IN ["igt@xe_drm_fdinfo@utilization-single-full-load-destroy-queue", "igt@xe_drm_fdinfo@drm-busy-exec-queue-destroy-idle"])) AND ((testsuite_name = "IGT" AND status_name IS IN ["fail"])) AND stderr ~= 'Test assertion failure function check_results.*\n.*Failed assertion: 95.0 < percent'

New failures caught by the filter:

changed title from igt@xe_drm_fdinfo@drm-busy-exec-queue-destroy-idle - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent to igt@xe_drm_fdinfo@subtests - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent

added platform: BMG label

mentioned in issue #2666 (closed)

mentioned in issue #2713 (closed)

The CI Bug Log issue associated to this bug has been updated by Vinay.

New filters associated

ADL_P BMG: igt@xe_drm_fdinfo@drm* - fail - Test assertion failure function check_results, Failed assertion: percent < 105.0 (No new failures associated)

added platform: ADL_P label

The CI Bug Log issue associated to this bug has been updated by Vinay.

New filters associated

ADL_P BMG LNL: igt@xe_drm_fdinfo@drm* - fail - Test assertion failure function check_results, Failed assertion: percent >= 95 && percent <= 100 (No new failures associated)

Ravi V changed title from igt@xe_drm_fdinfo@drm-busy-exec-queue-destroy-idle - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent to igt@xe_drm_fdinfo@subtests - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent 3 weeks ago

@rveesam please fix the filter and let this bug be about the exec-queue-destroy only. Other possible bugs are not the same thing.

assigned to @demarchi

A CI Bug Log filter associated to this bug has been updated by Vinay.

Description: DG2 BMG LNL: igt@xe_drm_fdinfo@subtests - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent

Equivalent query: runconfig_tag IS IN ["xe"] AND machine_tag IS IN ["LNLDG2", "BMG", "LNL"] AND ((testsuite_name = "IGT" AND test_name IS IN ["igt@xe_drm_fdinfo@utilization-others-idle", "igt@xe_drm_fdinfo@utilization-single-full-load-destroy-queue", "igt@xe_drm_fdinfo@utilization-single-full-load-isolation", "igt@xe_drm_fdinfo@utilization-others-full-load", "igt@xe_drm_fdinfo@drm-busy-exec-queue-destroy-idle"])) AND ((testsuite_name = "IGT" AND status_name IS IN ["fail"])) AND stderr ~= 'Test assertion failure function check_results.*\n.*Failed assertion: 95.0 < percent'

New failures caught by the filter:

added platform: DG2 label

Looks like exec queue destroy ioctl will erase the exec queue from the xef xa_array. Later when we try to dump run ticks, there is no exec queue in the array, so only GPU timestamp is updated. I see that at a later point when the job is freed, the correct value of the run ticks is being updated, but it's too late since IGT already sampled the ticks.

I think we may need to add one more point in the kernel where we update the run ticks. If not, we should just have a retry policy in the IGT for this test. I would look into the former.

mentioned in issue #3100 (closed)

Additional notes: When this test does pass, it just means that the job was freed as soon as it was done or when the queue was destroyed and it updated the stats on free. For such cases, even though the drminfo returns only captures the GPU timestamp, the test still works since the update of run ticks (into the xef object) and the GPU timestamp query happen close to each other and in the right order.

The CI Bug Log issue associated to this bug has been updated by dstenka.

New filters associated

LNL: igt@xe_drm_fdinfo@utilization-others-full-load - fail - CRITICAL: Test assertion failure function check_results CRITICAL: Failed assertion: percent < 105.0 CRITICAL: error: 105.665476 >= 105.000000 (No new failures associated)

https://patchwork.freedesktop.org/series/140538/ should fix these issues and also reduce the number of times we need to update the timestamp. This latter part should make another race much less probable: updating the delta on xef is not protected by any lock and the update on fdinfo query could race with the one from the workqueue.... since now the update only happens when the exec queue is going away, or when someone is querying it, I think it should be safe.

mentioned in issue #3283 (closed)

igt@xe_drm_fdinfo@subtests - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent

Designs

Child items ...

Activity

New filters associated

New filters associated

New filters associated

New filters associated

Admin message

Admin message

igt@xe_drm_fdinfo@subtests - fail - Test assertion failure function check_results, Failed assertion: 95.0 < percent

Activity

New filters associated

New filters associated

New filters associated

New filters associated