public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x
@ 2016-10-27 19:47 dsmith at redhat dot com
  2016-11-01 20:15 ` [Bug runtime/20742] " dsmith at redhat dot com
  2016-11-10 16:00 ` dsmith at redhat dot com
  0 siblings, 2 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2016-10-27 19:47 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=20742

            Bug ID: 20742
           Summary: kernel pointer dereference BUG on RHEL6 s390x
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: dsmith at redhat dot com
  Target Milestone: ---
              Host: s390x

When running the testsuite (in standard mode, not in parallel mode) on RHEL6
(2.6.32-642.el6.s390x.debug), I'm seeing the following crash:

====
Unable to handle kernel pointer dereference at virtual kernel address
000000007c 
264000                                                                          
Oops: 0011 [#1] SMP DEBUG_PAGEALLOC                                             
Modules linked in: modloop(U) uprobes(U) ipv6 qeth_l2 vmur qeth qdio lcs ctcm
fs 
m ccwgroup ext4 jbd2 mbcache dasd_fba_mod dasd_eckd_mod dasd_mod dm_mirror
dm_re 
gion_hash dm_log dm_mod [last unloaded: modloop]                                
CPU: 0 Not tainted 2.6.32-642.el6.s390x.debug #1                                
Process loop (pid: 44856, task: 0000000062e24340, ksp: 000000007b91f978)        
Krnl PSW : 0704000180000000 0000000000513600 (down_write+0x74/0x114)            
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3              
Krnl GPRS: 0000000000000000 ffffffff00000001 0000000000000000 0000000062e24340  
           00000000005135fa 0000000000000000 0000000000000002 0000000036726188  
           000000007c264bf0 000000007a88e288 000000007c264bf0 000000007c264c48  
           000003e001fab084 000000000051f8d0 00000000005135fa 000000007b91fc08  
Krnl Code: 00000000005135ee: e310f0a00024       stg     %r1,160(%r15)           
           00000000005135f4: c0e5ffe3fb38       brasl   %r14,192c64             
           00000000005135fa: e310d0000004       lg      %r1,0(%r13)             
          >0000000000513600: e320a0000004       lg      %r2,0(%r10)             
           0000000000513606: b9020022           ltgr    %r2,%r2                 
           000000000051360a: a7740007           brc     7,513618                
           000000000051360e: eb21a0000030       csg     %r2,%r1,0(%r10)         
           0000000000513614: a744fff9           brc     4,513606                
Call Trace:                                                                     
([<00000000005135fa>] down_write+0x6e/0x114)                                    
 [<000003e001fab084>] uprobe_report_clone+0xb0/0x71c [uprobes]                  
 [<00000000001bde74>] utrace_report_clone+0xbc/0x170                            
 [<000000000014c754>] do_fork+0x3d8/0x47c                                       
 [<000000000010b2a2>] SyS_clone+0x62/0x70                                       
 [<000000000011a272>] sysc_tracego+0xe/0x14                                     
 [<0000004f48ec35d4>] 0x4f48ec35d4                                              
INFO: lockdep is turned off.                                                    
Last Breaking-Event-Address:                                                    
 [<0000000000192d46>] lock_acquire+0xe2/0x138                                   

Kernel panic - not syncing: Fatal exception: panic_on_oops                      
CPU: 0 Tainted: G      D    -- ------------    2.6.32-642.el6.s390x.debug #1    
Process loop (pid: 44856, task: 0000000062e24340, ksp: 000000007b91f978)        
000000007b91f8a8 000000007b91f828 0000000000000002 0000000000000000             
       000000007b91f8c8 000000007b91f840 000000007b91f840 000000000050ed16      
       0000000000000000 0000000000000001 0000000000060011 0000000000000088      
       000000000000000d 000000000000000c 000000007b91f898 0000000000000000      
       0000000000000000 0000000000105fbc 000000007b91f828 000000007b91f868      
Call Trace:                                                                     
([<0000000000105eb4>] show_trace+0xf0/0x148)                                    
 [<000000000050eb40>] panic+0xcc/0x240                                          
 [<0000000000106502>] die+0x162/0x164                                           
 [<00000000001014f6>] do_no_context+0xae/0xec                                   
 [<00000000005158f8>] do_dat_exception+0x218/0x318                              
 [<000000000011a43a>] pgm_exit+0x0/0x14                                         
 [<0000000000513600>] down_write+0x74/0x114                                     
([<00000000005135fa>] down_write+0x6e/0x114)                                    
 [<000003e001fab084>] uprobe_report_clone+0xb0/0x71c [uprobes]                  
 [<00000000001bde74>] utrace_report_clone+0xbc/0x170                            
 [<000000000014c754>] do_fork+0x3d8/0x47c                                       
 [<000000000010b2a2>] SyS_clone+0x62/0x70                                       
 [<000000000011a272>] sysc_tracego+0xe/0x14                                     
 [<0000004f48ec35d4>] 0x4f48ec35d4                                              
INFO: lockdep is turned off.01: HCPGSP2629I The virtual machine is placed in CP 
mode due to a SIGP stop from                                                    
 CPU 01.                                       
====

The last test run in systemtap.log is systemtap.unprivileged/pr16806.exp.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug runtime/20742] kernel pointer dereference BUG on RHEL6 s390x
  2016-10-27 19:47 [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x dsmith at redhat dot com
@ 2016-11-01 20:15 ` dsmith at redhat dot com
  2016-11-10 16:00 ` dsmith at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2016-11-01 20:15 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=20742

--- Comment #1 from David Smith <dsmith at redhat dot com> ---
Created attachment 9606
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9606&action=edit
proposed patch

Here's a patch that attempts to fix this problem. There are actually 3 fixes in
this patch.

Fix #1: uprobe_report_clone(), did the following:

====
        rcu_read_lock();
        ptask = (struct uprobe_task *)rcu_dereference(engine->data);
        uproc = ptask->uproc;
        rcu_read_unlock();

        /*
         * Lock uproc so no new uprobes can be installed 'til all
         * report_clone activities are completed.  Lock uproc_table
         * in case we have to run uprobe_fork_uproc().
         */
        lock_uproc_table();
        down_write(&uproc->rwsem);
====

This isn't correct. When we're under the rcu_read_lock(), the ptask and uproc
points should be valid. However, outside of the rcu protection, the ptask/uproc
memory can be freed.

To fix this, I changed the code to the following:

====
        rcu_read_lock();
        ptask = (struct uprobe_task *)rcu_dereference(engine->data);
        BUG_ON(!ptask);
        /* Keep uproc intact until just before we return. */
        uproc = uprobe_get_process(ptask->uproc);
        rcu_read_unlock();

        if (!uproc)
                /* uprobe_free_process() has probably clobbered utask->proc. */
                return UTRACE_DETACH;

        /*
         * Lock uproc so no new uprobes can be installed 'til all
         * report_clone activities are completed.  Lock uproc_table
         * in case we have to run uprobe_fork_uproc().
         */
        lock_uproc_table();
        down_write(&uproc->rwsem);
====

This changes the code to increase the reference count on uproc so it won't get
freed. Similar code is present in uprobe_report_signal(), uprobe_report_exit(),
etc.

I also added code to uprobe_report_clone() to decrement the reference count on
uproc before returning.

When the systemtap.unprivileged/pr16806.exp is run with the
uprobe_report_clone() changes, the test no longer crashed the system, but 2 new
problems popped up:

[ BUG: lock held when returning to user space! ]                                
------------------------------------------------                                
loop/4866 is leaving the kernel with locks still held!                          
2 locks held by loop/4866:                                                      
 #0:  (&uproc->rwsem){+++++.}, at: [<000003e00108ad32>]
uprobe_report_signal+0x2
96/0xe80 [uprobes]                                                              
 #1:  (&slot->rwsem){+.+...}, at: [<000003e0010898e0>]
uprobe_find_insn_slot+0x1
60/0x33c [uprobes]                                                              


Fix #2: To fix first lock being held problem above I took a look at
uprobe_report_signal(). That code is quit complicated, and has 'goto'
statements that jump from one switch statement case to another. I could see
several paths through that code that could lead to uproc->rwsem still being
held on exit.

The changes to uprobe_report_signal() attempt to fix those issues.


Fix #3: uprobe_find_insn_slot() always returns the slot read-locked. It is only
called by uprobe_get_insn_slot(), which returns that read-locked slot.
uprobe_get_insn_slot() is only called by uprobe_pre_ssout(), which wasn't
unlocking the slot. I changed uprobe_pre_ssout() to unlock the slot before
returning.


With this path the test passes fine. However, I'll need to do more testing to
ensure I haven't broken anything else.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug runtime/20742] kernel pointer dereference BUG on RHEL6 s390x
  2016-10-27 19:47 [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x dsmith at redhat dot com
  2016-11-01 20:15 ` [Bug runtime/20742] " dsmith at redhat dot com
@ 2016-11-10 16:00 ` dsmith at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2016-11-10 16:00 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=20742

--- Comment #2 from David Smith <dsmith at redhat dot com> ---
Unfortunately, this patch causes the at_var.exp test to get hung, so I'll need
to do more work here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-11-10 16:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27 19:47 [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x dsmith at redhat dot com
2016-11-01 20:15 ` [Bug runtime/20742] " dsmith at redhat dot com
2016-11-10 16:00 ` dsmith at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).