Skip to content
  • Petr Mladek's avatar
    tracing: Initialize iter->seq after zeroing in tracing_read_pipe() · d303de1f
    Petr Mladek authored
    A customer reported the following softlockup:
    
    [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
    [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
    [899688.160002] RIP: 0010:up_write+0x1a/0x30
    [899688.160002] Kernel panic - not syncing: softlockup: hung tasks
    [899688.160002] RIP: 0010:up_write+0x1a/0x30
    [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
    [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
    [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
    [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
    [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
    [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
    [899688.160002]  tracing_read_pipe+0x336/0x3c0
    [899688.160002]  __vfs_read+0x26/0x140
    [899688.160002]  vfs_read+0x87/0x130
    [899688.160002]  SyS_read+0x42/0x90
    [899688.160002]  do_syscall_64+0x74/0x160
    
    It caught the process in the middle of trace_access_unlock(). There is
    no loop. So, it must be looping in the caller tracing_read_pipe()
    via the "waitagain" label.
    
    Crashdump analyze uncovered that iter->seq was completely zeroed
    at this point, including iter->seq.seq.size. It means that
    print_trace_line() was never able to print anything and
    there was no forward progress.
    
    The culprit seems to be in the code:
    
    	/* reset all but tr, trace, and overruns */
    	memset(&iter->seq, 0,
    	       sizeof(struct trace_iterator) -
    	       offsetof(struct trace_iterator, seq));
    
    It was added by the commit 53d0aa77 ("ftrace:
    add logic to record overruns"). It was v2.6.27-rc1.
    It was the time when iter->seq looked like:
    
         struct trace_seq {
    	unsigned char		buffer[PAGE_SIZE];
    	unsigned int		len;
         };
    
    There was no "size" variable and zeroing was perfectly fine.
    
    The solution is to reinitialize the structure after or without
    zeroing.
    
    Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
    
    
    
    Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
    Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
    d303de1f