[Crash Report] kernel - shmem_get_pages: bad page non-NULL mapping
/cc @bgeffon@google.com /cc @ShawnC /cc @vsyrjala /cc @jani.saarinen /cc @ulisses.furquim@intel.com /cc @rong.wang@intel.com
ChromeOS (Google) reported this crash. Upon debug, Google engineers think the cause is "43e2b37e drm/i915/dpt: Make DPT object unshrinkable"
They explained: this patch exacerbates an existing problem where when we run out of memory we expose another bug and that change prevents us from shrinking a lot of objects and thus causes us to run out of memory more easily
43e2b37e "drm/i915/dpt: Make DPT object unshrinkable" this commit prevents the free, but then memory cannot be released causing the remapping failure (given below) because the free happens during the failure path and now manifests in this new way.
They are seeking advice on alternative for 43e2b37e "drm/i915/dpt: Make DPT object unshrinkable" For function i915_gem_object_is_shrinkable, would it be better to use
return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
(!obj->is_dpt || !obj->mm.mapping);
From Intel side, we are unable to reproduce the DMA remap failure messages with upstream kernel or older kernels. We tried testing their suggestion for the older bug for which this patch was merged and we are not hitting the issue.
We have pushed their suggested patch to trybot and BAT seems to PASS. https://patchwork.freedesktop.org/series/141608/
@vsyrjala could you kindly help/guide on the same? Do you think using !obj->mm.mapping instead of !obj->is_dpt can save memory and avoid hitting the below scenario?
Attached document with crash message