Skip to content
  • Qiang Yu's avatar
    drm/lima: driver for ARM Mali4xx GPUs · 4cac0ddf
    Qiang Yu authored
    
    
    - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
      OpenGL vertex shader processing and PP is for fragment shader
      processing. Each processor has its own MMU so prcessors work in
      virtual address space.
    - There's only one GP but multiple PP (max 4 for mali 400 and 8
      for mali 450) in the same mali 4xx GPU. All PPs are grouped
      togather to handle a single fragment shader task divided by
      FB output tiled pixels. Mali 400 user space driver is
      responsible for assign target tiled pixels to each PP, but mali
      450 has a HW module called DLBU to dynamically balance each
      PP's load.
    - User space driver allocate buffer object and map into GPU
      virtual address space, upload command stream and draw data with
      CPU mmap of the buffer object, then submit task to GP/PP with
      a register frame indicating where is the command stream and misc
      settings.
    - There's no command stream validation/relocation due to each user
      process has its own GPU virtual address space. GP/PP's MMU switch
      virtual address space before running two tasks from different
      user process. Error or evil user space code just get MMU fault
      or GP/PP error IRQ, then the HW/SW will be recovered.
    - Use GEM+shmem for MM. Currently just alloc and pin memory when
      gem object creation. GPU vm map of the buffer is also done in
      the alloc stage in kernel space. We may delay the memory
      allocation and real GPU vm map to command submition stage in the
      furture as improvement.
    - Use drm_sched for GPU task schedule. Each OpenGL context should
      have a lima context object in the kernel to distinguish tasks
      from different user. drm_sched gets task from each lima context
      in a fair way.
    
    v2:
    - fix syscall argument check
    - fix job finish fence leak since kernel 5.0
    - use drm syncobj to replace native fence
    - move buffer object GPU va map into kernel
    - reserve syscall argument space for future info
    - remove kernel gem modifier
    - switch TTM back to GEM+shmem MM
    - use time based io poll
    - use whole register name
    - adopt gem reservation obj integration
    - use drm_timeout_abs_to_jiffies
    
    Cc: Eric Anholt <eric@anholt.net>
    Cc: Rob Herring <robh@kernel.org>
    Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Alex Deucher <alexdeucher@gmail.com>
    Signed-off-by: default avatarAndreas Baierl <ichgeh@imkreisrum.de>
    Signed-off-by: default avatarErico Nunes <nunes.erico@gmail.com>
    Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
    Signed-off-by: default avatarMarek Vasut <marex@denx.de>
    Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
    Signed-off-by: default avatarSimon Shields <simon@lineageos.org>
    Signed-off-by: default avatarVasily Khoruzhick <anarsoul@gmail.com>
    Signed-off-by: default avatarQiang Yu <yuq825@gmail.com>
    4cac0ddf