kernel driver for amdgpu hangs when executing h264encode
System information
- OS:
NAME="Ubuntu"
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
- GPU:
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] [1002:6861]
- Kernel version:
Linux n130-127-016 4.15.0-65-bd-arm64 #3 SMP Debian 4.15.0-65-bd-4-gf0ea3d485 Thu Dec 12 16:09:06 CST aarch64 aarch64 aarch64 GNU/Linux
- Mesa version:
Mesa Gallium driver 19.0.8 for Radeon Pro WX 9100 (VEGA10, DRM 3.23.0, 4.15.0-65-bd-arm64, LLVM 7.0.0)
- Xserver version (if applicable):
X.Org X Server 1.19.6
Release Date: 2017-12-20
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.4.0-148-generic aarch64 Ubuntu
Current Operating System: Linux n130-127-016 4.15.0-65-bd-arm64 #3 SMP Debian 4.15.0-65-bd-4-gf0ea3d485 Thu Dec 12 16:09:06 CST aarch64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-65-bd-arm64 root=UUID=0b3c5129-8f94-4fdf-bd64-169ee4d3ad7a ro quiet splash
Build Date: 03 June 2019 08:11:53AM
xorg-server 2:1.19.6-1ubuntu4.3 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.34.0
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
- Desktop manager and compositor: no X window started.
If applicable
- DXVK version:
- Wine/Proton version:
Describe the issue
I was trying to use libva-utils to do h264 encoding with hardware acceleration. So I :
git clone https://github.com/intel/libva-utils.git;
git checkout c1535bb7e7e7c0670f118de9752e25674bac8002;
and then I applied a patch to replace vaDeriveImage with vaCreateImage and vaPutImage. The patch is ::
diff --git a/common/loadsurface.h b/common/loadsurface.h
index c5cb9d4..c6a0880 100755
--- a/common/loadsurface.h
+++ b/common/loadsurface.h
@@ -37,6 +37,27 @@ static int scale_2dimage(unsigned char *src_img, int src_imgw, int src_imgh,
return 0;
}
+static const VAImageFormat formats[] =
+{
+ {VA_FOURCC('N','V','1','2')},
+ {VA_FOURCC('P','0','1','0')},
+ {VA_FOURCC('P','0','1','6')},
+ {VA_FOURCC('I','4','2','0')},
+ {VA_FOURCC('Y','V','1','2')},
+ {VA_FOURCC('Y','U','Y','V')},
+ {VA_FOURCC('U','Y','V','Y')},
+ {.fourcc = VA_FOURCC('B','G','R','A'), .byte_order = VA_LSB_FIRST, 32, 32,
+ 0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000},
+ {.fourcc = VA_FOURCC('R','G','B','A'), .byte_order = VA_LSB_FIRST, 32, 32,
+ 0x000000ff, 0x0000ff00, 0x00ff0000, 0xff000000},
+ {.fourcc = VA_FOURCC('B','G','R','X'), .byte_order = VA_LSB_FIRST, 32, 24,
+ 0x00ff0000, 0x0000ff00, 0x000000ff, 0x00000000},
+ {.fourcc = VA_FOURCC('R','G','B','X'), .byte_order = VA_LSB_FIRST, 32, 24,
+ 0x000000ff, 0x0000ff00, 0x00ff0000, 0x00000000}
+};
+
+
+
static int YUV_blend_with_pic(int width, int height,
unsigned char *Y_start, int Y_pitch,
@@ -255,8 +276,14 @@ static int upload_surface(VADisplay va_dpy, VASurfaceID surface_id,
VAStatus va_status;
unsigned int pitches[3]={0,0,0};
+ /*
va_status = vaDeriveImage(va_dpy,surface_id,&surface_image);
CHECK_VASTATUS(va_status,"vaDeriveImage");
+ */
+
+ VAImageFormat format = formats[0];
+ va_status = vaCreateImage(va_dpy, &format, 1440, 720, &surface_image);
+ CHECK_VASTATUS(va_status,"vaCreateImage");
vaMapBuffer(va_dpy,surface_image.buf,&surface_p);
assert(VA_STATUS_SUCCESS == va_status);
@@ -300,6 +327,9 @@ static int upload_surface(VADisplay va_dpy, VASurfaceID surface_id,
box_width, row_shift, field);
vaUnmapBuffer(va_dpy,surface_image.buf);
+ vaPutImage(va_dpy, surface_id , surface_image.image_id, 0,0 , surface_image.width,
+ surface_image.height, 0, 0, surface_image.width, surface_image.height);
+
vaDestroyImage(va_dpy,surface_image.image_id);
then I compiled libav-utils. ./autogen.sh; make
and run ./encode/h264encode
.
It outputs:
INPUT:Try to encode H264...
INPUT: Resolution : 176x144, 60 frames
INPUT: FrameRate : 30
INPUT: Bitrate : 182476
INPUT: Slieces : 1
INPUT: IntraPeriod : 30
INPUT: IDRPeriod : 60
INPUT: IpPeriod : 1
INPUT: Initial QP : 26
INPUT: Min QP : 0
INPUT: Source YUV : AUTO generated
INPUT: Coded Clip : /tmp/test.264
INPUT: Rec Clip : Not save reconstructed frame
error: can't connect to X server!
libva info: VA-API version 1.9.0
libva info: Trying to open /usr/local/lib/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_1
libva info: va_openDriver() returns 0
Use profile VAProfileH264High
Support rate control mode (0x16):CBR VBR CQP
RateControl mode: VBR
Support VAConfigAttribEncPackedHeaders
Support 1 RefPicList0 and 0 RefPicList1
Loading data into surface 15.....Complete surface loading
and then stuck. Ctrl+C cannot stop it. In another terminal, I ps -ef | grep h264encode
, it says:
oot@n130-127-016:~# ps -ef | grep h264encode
root 7376 7264 0 16:45 pts/0 00:00:00 [h264encode] <defunct>
and after that, the driver does not work any more.
dmesg says:
[185102.584500] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.598598] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.605777] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.612896] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.626987] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.634171] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.641337] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.655442] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.662608] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.669717] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.683785] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.690961] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.698067] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.712139] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.719314] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.726410] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.740479] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.747641] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.754733] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.768812] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.775975] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.783082] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.797150] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.804308] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.811452] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.825507] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.832672] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.839770] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.853902] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.861058] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
Regression
older version does not work either.
Log files as attachment
- Output of
dmesg
[185102.584500] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.598598] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.605777] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.612896] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.626987] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.634171] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.641337] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.655442] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.662608] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.669717] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.683785] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.690961] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.698067] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.712139] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.719314] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.726410] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.740479] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.747641] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.754733] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.768812] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.775975] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.783082] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.797150] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.804308] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.811452] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.825507] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.832672] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
[185102.839770] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vm_id:2 pas_id:0)
[185102.853902] amdgpu 0000:03:00.0: at page 0x0000000204fff000 from 18
[185102.861058] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020413D
- Backtrace
before I try to kill h264encode, it's stack is like:
#0 0x0000ffff8aa135f0 in __lll_lock_wait (futex=futex@entry=0xaaaada9eec50, private=0) at lowlevellock.c:43
#1 0x0000ffff8aa0c7d4 in __GI___pthread_mutex_lock (mutex=0xaaaada9eec50) at pthread_mutex_lock.c:78
#2 0x0000ffff8a293248 in vlVaCreateBuffer () from /usr/local/lib/dri/radeonsi_drv_video.so
#3 0x0000ffff8aa51a1c in vaCreateBuffer (dpy=0xaaaada9eec50, context=3667844560, type=648, size=3094383816, num_elements=3308179996, data=0x17, buf_id=0xffffc52ed21c)
at va.c:1367
#4 0x0000aaaab8682050 in render_picture () at h264encode.c:1589
#5 0x0000aaaab8682560 in encode_frames () at h264encode.c:2235
#6 0x0000aaaab867f35c in main (argc=<optimized out>, argv=<optimized out>) at h264encode.c:2431
- Gpu hang details
before I tried to kill h264encode, ps -ef -o wchan, say kernel stuck at dma_fence_wait
. but sometimes it says futex_wait_queue_me
.
Screenshots/video files (if applicable)
Any extra information would be greatly appreciated
Edited by zhangfuwen