Commits · drm-xe-next · Matthew Brost / xe kernel driver

Nov 01, 2024

drm/xe: Restore system memory GGTT mappings · a19d1db9

Matthew Brost authored 3 months ago


GGTT mappings reside on the device and this state is lost during suspend
/ d3cold thus this state must be restored resume regardless if the BO is
in system memory or VRAM.

v2:
 - Unnecessary parentheses around bo->placements[0] (Checkpatch)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031182257.2949579-1-matthew.brost@intel.com

a19d1db9

drm/xe: Don't unnecessarily invoke the OOM killer on multiple binds · 1a7b7180

Thomas Hellström authored 3 months ago

Multiple single-ioctl binds can be split up into multiple bind
ioctls, reducing the memory required to hold the bind array.

So rather than allowing the OOM killer to be invoked, return
-ENOMEM or -ENOBUFS to user-space to take corrective action.

v2:
- Add __GFP_NOWARN to __GFP_RETRY_MAYFAIL allocations to avoid
  spamming the kernel log if a recoverable allocation fails.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: drm/xe/kernel#2701


Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031153732.164995-3-thomas.hellstrom@linux.intel.com

1a7b7180

drm/xe: Avoid the OOM killer on buffer object memory allocation · 6bd49cc1

Thomas Hellström authored 3 months ago

Rather than invoking the OOM killer on buffer object memory
allocations and validations, have the allocations fail and
pass the error to user-space if applicable.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: drm/xe/kernel#2701


Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031153732.164995-2-thomas.hellstrom@linux.intel.com

6bd49cc1

drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout · e1f6fa55

Nirmoy Das authored 3 months ago and

Lucas De Marchi committed 3 months ago

Flush the g2h worker explicitly if TLB timeout happens which is
observed on LNL and that points to the recent scheduling issue with
E-cores on LNL.

This is similar to the recent fix:
commit e5152723 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
response timeout") and should be removed once there is E core
scheduling fix.

v2: Add platform check(Himal)
v3: Remove gfx platform check as the issue related to cpu
    platform(John)
    Use the common WA macro(John) and print when the flush
    resolves timeout(Matt B)
v4: Remove the resolves log and do the flush before taking
    pending_lock(Matt A)

Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Link: drm/xe/kernel#2687


Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-3-nirmoy.das@intel.com


Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

e1f6fa55

drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout · 38c4c872

Nirmoy Das authored 3 months ago and

Lucas De Marchi committed 3 months ago

Flush xe ordered_wq in case of ufence timeout which is observed
on LNL and that points to recent scheduling issue with E-cores.

This is similar to the recent fix:
commit e5152723 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
response timeout") and should be removed once there is a E-core
scheduling fix for LNL.

v2: Add platform check(Himal)
    s/__flush_workqueue/flush_workqueue(Jani)
v3: Remove gfx platform check as the issue related to cpu
    platform(John)
v4: Use the Common macro(John) and print when the flush resolves
    timeout(Matt B)

Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Link: drm/xe/kernel#2754


Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-2-nirmoy.das@intel.com


Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

38c4c872

drm/xe: Move LNL scheduling WA to xe_device.h · cbe006a6

Nirmoy Das authored 3 months ago and

Lucas De Marchi committed 3 months ago

Move LNL scheduling WA to xe_device.h so this can be used in other
places without needing keep the same comment about removal of this WA
in the future. The WA, which flushes work or workqueues, is now wrapped
in macros and can be reused wherever needed.

Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
cc: stable@vger.kernel.org # v6.11+
Suggested-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029120117.449694-1-nirmoy.das@intel.com

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

cbe006a6

drm/xe: Use the filelist from drm for ccs_mode change · 1c35f1ed

Balasubramani Vivekanandan authored 4 months ago and

Lucas De Marchi committed 3 months ago

Drop the exclusive client count tracking and use the filelist from the
drm to track the active clients. This also ensures the clients created
internally by the driver won't block changing the ccs mode.

Fixes: ce8c161c ("drm/xe: Add ref counting for xe_file")
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-3-balasubramani.vivekanandan@intel.com

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

1c35f1ed

drm/xe: Set mask bits for CCS_MODE register · 23ea2c75

Balasubramani Vivekanandan authored 4 months ago and

Lucas De Marchi committed 3 months ago


CCS_MODE register requires setting mask bits from Xe2+ platforms. Set
the mask bits unconditionally, as those bits are unused for older
platforms.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008073628.377433-2-balasubramani.vivekanandan@intel.com


Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

23ea2c75

drm/xe: Move Wa 1607983814 to oob · 8262db9e

Lucas De Marchi authored 3 months ago


needs_wa_1607983814() predates wa_oob, so it was not being printed
in /sys/kernel/debug/dri/0/*/workarounds. Port it to OOB rules.
This makes the WA show up in debugfs. For TGL:

	OOB Workarounds
		1607983814
		22012773006
		1409600907

Eventually the RTP infra may add support for writing registers in a
loop, which would allow to keep track of the registers as well. But for
now, just listing it as OOB workaround is already an improvement.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029193258.749882-1-lucas.demarchi@intel.com


Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

8262db9e

Oct 31, 2024

drm/xe/guc: Fix dereference before NULL check · 2aff81e0

Everest K.C. authored 3 months ago and

John Harrison committed 3 months ago

The pointer list->list is dereferenced before the NULL check.
Fix this by moving the NULL check outside the for loop, so that
the check is performed before the dereferencing.
The list->list pointer cannot be NULL so this has no effect on runtime.
It's just a correctness issue.

This issue was reported by Coverity Scan.
https://scan7.scan.coverity.com/#/project-view/51525/11354?selectedIssue=1600335

Fixes: 0f1fdf55 ("drm/xe/guc: Save manual engine capture into capture list")
Signed-off-by: Everest K.C. <everestkc@everestkc.com.np>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023233356.5479-1-everestkc@everestkc.com.np

2aff81e0

drm/xe: Don't short circuit TDR on jobs not started · 35d25a4a

Matthew Brost authored 3 months ago and

Lucas De Marchi committed 3 months ago


Short circuiting TDR on jobs not started is an optimization which is not
required. On LNL we are facing an issue where jobs do not get scheduled
by the GuC if it misses a GGTT page update. When this occurs let the TDR
fire, toggle the scheduling which may get the job unstuck, and print a
warning message. If the TDR fires twice on job that hasn't started,
timeout the job.

v2:
 - Add warning message (Paulo)
 - Add fixes tag (Paulo)
 - Timeout job which hasn't started after TDR firing twice
v3:
 - Include local change
v4:
 - Short circuit check_timeout on job not started
 - use warn level rather than notice (Paulo)

Fixes: 7ddb9403 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out")
Cc: stable@vger.kernel.org
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@intel.com


Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

35d25a4a

drm/xe: Add mmio read before GGTT invalidate · 5a710196

Matthew Brost authored 3 months ago and

Lucas De Marchi committed 3 months ago

On LNL without a mmio read before a GGTT invalidate the GuC can
incorrectly read the GGTT scratch page upon next access leading to jobs
not getting scheduled. A mmio read before a GGTT invalidate seems to fix
this. Since a GGTT invalidate is not a hot code path, blindly do a mmio
read before each GGTT invalidate.

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org
Fixes: dd08ebf6 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Closes: drm/xe/kernel#3164

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023221200.1797832-1-matthew.brost@intel.com

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

5a710196

Oct 30, 2024

Revert "drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs" · 0191fddf

Ashutosh Dixit authored 3 months ago and

Matt Roper committed 3 months ago

This reverts commit 55858fa7.

'55858fa7 ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist
regs")' was not properly reviewed and also causes dmesg asserts in
CI. Revert it.

Closes: drm/xe/kernel#3295


Fixes: 55858fa7 ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029200147.1476513-1-ashutosh.dixit@intel.com

0191fddf

Oct 29, 2024

drm/xe/guc: Separate full CTB content from guc_info debugfs · db38fdb7

John Harrison authored 3 months ago

The guc_info debugfs file is meant to be a quick view of the current
software state of the GuC interface. Including the full CTB contents
makes the file as a whole much less human readable and is not
partiular useful in the general case. So don't pollute the info dump
with the full buffers. Instead, move those into a separate debugfs
entry that can be read when that information is actually required.

Also, improve the human readability by adding a few extra blank lines
to delimt the sections.

v2: Hide the internal capture/print params from external callers that
don't need to know (review feedback from Matthew Brost).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241024002554.1983101-3-John.C.Harrison@Intel.com

db38fdb7

drm/xe/guc: Capture all available bits of GuC timestamp · 833b2ec3

John Harrison authored 3 months ago

The extra bits are not hugely useful because the GuC log only uses
32bit time stamps. But they exist so might as well provide them.

833b2ec3

Oct 28, 2024

drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs · 55858fa7

Jonathan Cavitt authored 3 months ago and

Ashutosh Dixit committed 3 months ago

Several OA registers and allowlist registers were missing from the
save/restore list for GuC and could be lost during an engine reset.  Add
them to the list.

v2:
- Fix commit message (Umesh)
- Add missing closes (Ashutosh)

v3:
- Add missing fixes (Ashutosh)

Closes: drm/xe/kernel#2249


Fixes: dd08ebf6 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Suggested-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
CC: stable@vger.kernel.org # v6.11+
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023200716.82624-1-jonathan.cavitt@intel.com

55858fa7

Oct 23, 2024

drm/xe/oa: Allow only certain property changes from config · 85d3f9e8

Ashutosh Dixit authored 3 months ago

Whereas all properties can be specified during OA stream open, when the OA
stream is reconfigured only the config_id and syncs can be specified.

v2: Use separate function table for reconfig case (Jonathan)
Change bool function args to enum (Matt B)
v3: s/xe_oa_set_property_funcs/xe_oa_set_property_funcs_open/ (Jonathan)

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Suggested-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-8-ashutosh.dixit@intel.com

85d3f9e8

drm/xe/oa: Add syncs support to OA config ioctl · 9920c8b8

Ashutosh Dixit authored 3 months ago

In addition to stream open, add xe_sync support to the OA config ioctl,
where it is even more useful. This allows e.g. Mesa to replay a workload
repeatedly on the GPU, each time with a different OA configuration, while
precisely controlling (at batch buffer granularity) the workload segment
for which a particular OA configuration is active, without introducing
stalls in the userspace pipeline.

v2: Emit OA config even when config id is same as previous, to ensure
consistent sync behavior (Jose)

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-7-ashutosh.dixit@intel.com

9920c8b8

drm/xe/oa: Move functions up so they can be reused for config ioctl · cc4e6994

Ashutosh Dixit authored 3 months ago

No code changes, only code movement so that functions used during stream
open can be reused for the stream reconfiguration
ioctl (DRM_XE_OBSERVATION_IOCTL_CONFIG).

cc4e6994

drm/xe/oa: Signal output fences · 343dd246

Ashutosh Dixit authored 3 months ago


Introduce 'struct xe_oa_fence' which includes the dma_fence used to signal
output fences in the xe_sync array. The fences are signaled
asynchronously. When there are no output fences to signal, the OA
configuration wait is synchronously re-introduced into the ioctl.

v2: Don't wait in the work, use callback + delayed work (Matt B)
    Use a single, not a per-fence spinlock (Matt Brost)
v3: Move ofence alloc before job submission (Matt)
    Assert, don't fail, from dma_fence_add_callback (Matt)
    Additional dma_fence_get for dma_fence_wait (Matt)
    Change dma_fence_wait to non-interruptible (Matt)
v4: Introduce last_fence to prevent uaf if stream is closed with
    pending OA config jobs
v5: Remove oa_fence_lock, move spinlock back into xe_oa_fence to
    prevent uaf

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-5-ashutosh.dixit@intel.com

343dd246

drm/xe/oa: Add input fence dependencies · 2fb4350a

Ashutosh Dixit authored 3 months ago

Add input fence dependencies which will make OA configuration wait till
these dependencies are met (till input fences signal).

v2: Change add_deps arg to xe_oa_submit_bb from bool to enum (Matt Brost)

2fb4350a

drm/xe/oa/uapi: Define and parse OA sync properties · c8507a25

Ashutosh Dixit authored 3 months ago

Now that we have laid the groundwork, introduce OA sync properties in the
uapi and parse the input xe_sync array as is done elsewhere in the
driver. Also add DRM_XE_OA_CAPS_SYNCS bit in OA capabilities for userspace.

v2: Fix and document DRM_XE_SYNC_TYPE_USER_FENCE for OA (Matt B)
Add DRM_XE_OA_CAPS_SYNCS bit to OA capabilities (Jose)

Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-3-ashutosh.dixit@intel.com

c8507a25

drm/xe/oa: Separate batch submission from waiting for completion · dddcb19a

Ashutosh Dixit authored 3 months ago

When we introduce xe_syncs, we don't wait for internal OA programming
batches to complete. That is, xe_syncs are signaled asynchronously. In
anticipation for this, separate out batch submission from waiting for
completion of those batches.

v2: Change return type of xe_oa_submit_bb to "struct dma_fence *" (Matt B)
v3: Retain init "int err = 0;" in xe_oa_submit_bb (Jose)

dddcb19a

drm/xe: Mark GT work queue with WQ_MEM_RECLAIM · e2d84e5b

Matthew Brost authored 3 months ago

GT ordered work queue can be used to free memory via resets and fence
signaling thus we should allow this work queue to run during reclaim.
Mark with GT ordered work queue with WQ_MEM_RECLAIM appropriately.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021175705.1584521-5-matthew.brost@intel.com

e2d84e5b

drm/xe: Mark G2H work queue with WQ_MEM_RECLAIM · 179e0179

Matthew Brost authored 3 months ago

G2H work queue can be used to free memory thus we should allow this work
queue to run during reclaim. Mark with G2H work queue with
WQ_MEM_RECLAIM appropriately.

179e0179

drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM · 60df57e4

Matthew Brost authored 3 months ago

GGTT work queue is used to free memory thus we should allow this work
queue to run during reclaim. Mark with GGTT work queue with
WQ_MEM_RECLAIM appropriately.

60df57e4

drm/xe: Take ref to job's fence in arm · 059c2a79

Matthew Brost authored 3 months ago

Take ref to job's fence in arm rather than run job. This ref is owned by
the drm scheduler so it makes sense to take the ref before handing over
the job to the scheduler. Also removes an atomic from the run job path.

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021173512.1584248-1-matthew.brost@intel.com

059c2a79

drm/xe: Don't restart parallel queues multiple times on GT reset · c8b0acd6

Nirmoy Das authored 3 months ago

In case of parallel submissions multiple GuC id will point to the
same exec queue and on GT reset such exec queues will get restarted
multiple times which is not desirable.

v2: don't use exec_queue_enabled() which could race,
    do the same for xe_guc_submit_stop (Matt B)

Link: drm/xe/kernel#2295


Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022103555.731557-1-nirmoy.das@intel.com


Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>

c8b0acd6

Oct 22, 2024

drm/xe/pf: Show VFs LMEM provisioning summary over debugfs · b982cba5

mwajdecz authored 3 months ago

While we can already view individual VF LMEM provisioning using
lmem_quota debugfs attribute, we want to have unified way to show
summary across all VFs, like we do for GGTT or GuC doorbells/IDs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241021201506.1771-1-michal.wajdeczko@intel.com

b982cba5

drm/xe/guc: Prevent GuC register capture running on VF · dd1ba621

Zhanjun Dong authored 3 months ago


GuC based register capture is not supported by VF, thus prevent it
running on VF.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241022010116.342240-2-zhanjun.dong@intel.com


Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

dd1ba621

Oct 21, 2024

drm/xe: enable lite restore · 6ef3bb60

Fei Yang authored 4 months ago and

John Harrison committed 3 months ago


The lite restore is a performance improvement feature which avoids
unnecessary context switch (flush, save and restore) if the incoming
context has a ContextID matching that of the outgoing context. The
scheduling is done by the GuC firmware, so on the driver side it's
just a matter of setting corresponding GUC_CTL_FEATURE flag.
This is supposed to be enabled by default, thus the flag is set
unconditionally.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017162710.942553-2-fei.yang@intel.com

6ef3bb60

Oct 20, 2024

drm/xe: Use __counted_by for flexible arrays · 26775201

Matthew Brost authored 4 months ago

Good practice to use __counted_by in kernel coding for flexible arrays.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241018030039.1077842-1-matthew.brost@intel.com

26775201

Oct 18, 2024

drm/xe/ufence: Warn if mmget_not_zero() fails · 66426bf9

Nirmoy Das authored 4 months ago

This shouldn't happen but seen this while debugging ufence timeout
issue time to time so log it to isolate this particular case.

v2: s/XE_WARN_ON/drm_dbg(Maarten)

Link: drm/xe/kernel#1630


Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241016082304.66009-3-nirmoy.das@intel.com


Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>

66426bf9

drm/xe/ufence: Prefetch ufence addr to catch bogus address · 9408c450

Nirmoy Das authored 4 months ago

access_ok() only checks for addr overflow so also try to read the addr
to catch invalid addr sent from userspace.

Link: drm/xe/kernel#1630


Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241016082304.66009-2-nirmoy.das@intel.com


Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>

9408c450

drm/xe: Handle unreliable MMIO reads during forcewake · a9fbeabe

Shuicheng Lin authored 4 months ago


In some cases, when the driver attempts to read an MMIO register,
the hardware may return 0xFFFFFFFF. The current force wake path
code treats this as a valid response, as it only checks the BIT.
However, 0xFFFFFFFF should be considered an invalid value, indicating
a potential issue. To address this, we should add a log entry to
highlight this condition and return failure.
The force wake failure log level is changed from notice to err
to match the failure return value.

v2 (Matt Brost):
  - set ret value (-EIO) to kick the error to upper layers
v3 (Rodrigo):
  - add commit message for the log level promotion from notice to err
v4:
  - update reviewed info

Suggested-by: Alex Zuo <alex.zuo@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Badal Nilawar <badal.nilawar@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017221547.1564029-1-shuicheng.lin@intel.com


Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

a9fbeabe

Oct 17, 2024

drm/xe/ptl: Apply Wa_14022866841 · 61ef737d

Vinay Belgaumkar authored 4 months ago and

John Harrison committed 4 months ago


As part of this WA, GuC will hold a forcewake for certain
MMIO accesses outside the GT/media domains.

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241015234428.2004825-1-vinay.belgaumkar@intel.com

61ef737d

drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout · e5152723

bnilawar authored 4 months ago and

Matthew Brost committed 4 months ago

In case if g2h worker doesn't get opportunity to within specified
timeout delay then flush the g2h worker explicitly.

v2:
  - Describe change in the comment and add TODO (Matt B/John H)
  - Add xe_gt_warn on fence done after G2H flush (John H)
v3:
  - Updated the comment with root cause
  - Clean up xe_gt_warn message (John H)

Closes: drm/xe/kernel#1620
Closes: drm/xe/kernel#2902


Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017111410.2553784-2-badal.nilawar@intel.com

e5152723

drm/xe: Change return type to void for xe_force_wake_put · 76eb09c8

Himal Ghimiray authored 4 months ago


There is no need to return an error from xe_force_wake_put(), as a
failure implicitly indicates that the domain failed to sleep.

v3
- Move kernel-doc to this patch (Badal)

v5
- change parameter to unsigned int in xe_force_wake_put()

v6
- Remove unneccsary wrapping (Michal)
- Remove non required header (Michal)
- Mention timeout(Michal)

v8
- Fix kernel-doc

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-27-himal.prasad.ghimiray@intel.com


Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

76eb09c8

drm/xe: Ensure __must_check for xe_force_wake_get() return · 9ee17807

Himal Ghimiray authored 4 months ago


Add __must_check attribute for xe_force_wake_get().

Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-26-himal.prasad.ghimiray@intel.com


Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

9ee17807

drm/xe: forcewake debugfs open fails on xe_forcewake_get failure · 6c0a15e7

Himal Ghimiray authored 4 months ago


A failure in xe_force_wake_get() no longer increments the domain's
refcount. Therefore, if xe_force_wake_get() fails during forcewake
debugfs open, return an error. This ensures there are no valid file
descriptors to close via forcewake debugfs, preventing refcount
mismanagement.

v3
- return xe_wakeref_t instead of int in xe_force_wake_get()

v5
- return unsigned int from xe_force_wake_get()

v6
- Use helper xe_force_wake_ref_has_domain()
to determine the status of the call.

Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-25-himal.prasad.ghimiray@intel.com


Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

6c0a15e7

Admin message

Admin message