public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug inline-asm/95692] New: PPC64, suspicious store in front of inline assembly section
@ 2020-06-15 21:53 markalle at us dot ibm.com
  2020-06-15 22:05 ` [Bug inline-asm/95692] " pinskia at gcc dot gnu.org
  2020-06-17  5:43 ` markalle at us dot ibm.com
  0 siblings, 2 replies; 3+ messages in thread
From: markalle at us dot ibm.com @ 2020-06-15 21:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95692

            Bug ID: 95692
           Summary: PPC64, suspicious store in front of inline assembly
                    section
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: inline-asm
          Assignee: unassigned at gcc dot gnu.org
          Reporter: markalle at us dot ibm.com
  Target Milestone: ---

Created attachment 48735
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48735&action=edit
Reproducer, includes Makefile (which downloads and configures openmpi-4.0.3)
and a couple .c filles that use the patcher out of Open MPI

*** System/version info:

The system on which the suspicious behavior was observed is PPC64 running
Redhat 8.1.
% uname -a
Linux f8n02 4.18.0-147.13.2.el8_1.ppc64le #1 SMP Wed May 13 15:23:36 UTC 2020
ppc64le ppc64le ppc64le GNU/Linux
% cat /etc/redhat-release
Red Hat Enterprise Linux release 8.1 (Ootpa)
% gcc --version
gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I also downloaded and built gcc-10.1.0 and observed the same behavior

*** Problem:

Open MPI has a symbol patcher it uses to intercept calls like munmap to
know when virtual to physical memory mappings can change.

On PPC64 the patcher code replaces "munmap" with a branch to a new
function intercept_munmap().  The target function intercept_munmap()
wants its own r2 table of contents, so it has a BEGIN/END macro where
it saves, overwrites, then restores r2.

We can edit the code in small ways to produces passing/failing versions of
this.  Starting with one of the passing versions, the instructions for the
pre-patched version of the target function "intercept_munmap()" are
   0x0000000010000ff4 <+28>:    std     r3,128(r31)
   0x0000000010000ff8 <+32>:    std     r4,136(r31)
   0x0000000010000ffc <+36>:    std     r2,104(r31)
   0x0000000010001000 <+40>:    nop
   0x0000000010001004 <+44>:    nop
   0x0000000010001008 <+48>:    nop
   0x000000001000100c <+52>:    nop
   0x0000000010001010 <+56>:    nop
where the table of contents r2 is being saved in a local variable, and
those place holder NOP operations will be modified to overwrite r2.

The failing versions of the program are the ones where intercept_munmap()
is generated as
   0x0000000010000ff4 <+28>:    std     r2,24(r1)
   0x0000000010000ff8 <+32>:    std     r3,128(r31)
   0x0000000010000ffc <+36>:    std     r4,136(r31)
   0x0000000010001000 <+40>:    std     r2,104(r31)
   0x0000000010001004 <+44>:    nop
   0x0000000010001008 <+48>:    nop
   0x000000001000100c <+52>:    nop
   0x0000000010001010 <+56>:    nop
   0x0000000010001014 <+60>:    nop
Here that first save of r2 to $r1+24 is picking up the pre-patched value,
and if that gets used later it will fail.

When it comes to tracing through the Open MPI code, the central function in
the patcher is mca_patcher_overwrite_apply_patch() at
openmpi-4.0.3_mod/opal/mca/patcher/overwrite/patcher_overwrite_module.c:203

The main values there are
  sys_addr = munmap from glibc
  hook_addr = intercept_munmap, the function to replace munmap

At the bottom of mca_patcher_overwrite_apply_patch() the instructions at
sys_addr are replaced by instructions that change r11 to hook_addr and
branch there.  Above that the target code in intercept_munmap() is also
modified to save, overwrite, and later restore the table of contents r2.

If it matters, the particulars of how this failing case accesses bad data
relative to r2 is that when intercept_munmap() calls fn() via a function
pointer, the return from that only restores r2 to $r1+24, and then the
syscall() that disassembles as
[plt_call.syscall@@GLIBC_2.17]
   0x0000000010000bc0 <+0>:     std     r2,24(r1)
   0x0000000010000bc4 <+4>:     ld      r12,-32304(r2)
   0x0000000010000bc8 <+8>:     mtctr   r12
   0x0000000010000bcc <+12>:    bctr
   0x0000000010000bd0 <+16>:    .long 0x0
   0x0000000010000bd4 <+20>:    .long 0x0
   0x0000000010000bd8 <+24>:    .long 0x0
   0x0000000010000bdc <+28>:    .long 0x0
is pulling from the wrong table of contents.  If r2 had been restoredd
to $r31+128 then the syscall branch would have gone to the right place.

I think the root of the problem is the early store of r2, and it should
be considered a bug that gcc put this in front of the inline assembly
that's put in place by OPAL_PATCHER_BEGIN.

For what it's worth, editing openmpi-4.0.3_mod/opal/mca/patcher/patcher.h
to add "ld r2,24(r1)" or similar (to make it contain instructions that
write to r2) doesn't keep gcc from putting the suspicious early save of r2
in front of OPAL_PATCHER_BEGIN.

*** Explanation of reproducer package:

The Makefile wgets openmpi-4.0.3.tar.gz and configures it which makes
all the necessary headers exist and take the PPC64 path.  Also the main
entry function mca_patcher_overwrite_patch_symbol() is static in Open MPI
so we edit out the static so we can call it directly.

After that the compilation is pretty simple.  The code defines an
intercept_munmap() that can be disassembled to see the suspicious early
store of r2.  Running ./x will segv in the syscall to SYS_munmap when
it tries to branch to $r2-32304 when r2 is loaded with the wrong table
of contents.

*** Example:

% make
% gdb ./x
(gdb) break intercept_munmap
(gdb) run
(gdb) disassemble $pc
=> 0x0000000010000ff4 <+28>:    std     r2,24(r1)
   0x0000000010000ff8 <+32>:    std     r3,128(r31)
   0x0000000010000ffc <+36>:    std     r4,136(r31)
   0x0000000010001000 <+40>:    std     r2,104(r31)
   0x0000000010001004 <+44>:    lis     r2,0
   0x0000000010001008 <+48>:    ori     r2,r2,0
   0x000000001000100c <+52>:    rldicr  r2,r2,32,31
   0x0000000010001010 <+56>:    oris    r2,r2,4098
   0x0000000010001014 <+60>:    ori     r2,r2,32512
   0x0000000010001018 <+64>:    addis   r9,r2,-2
   0x000000001000101c <+68>:    addi    r9,r9,-28612
   0x0000000010001020 <+72>:    std     r9,112(r31)
   0x0000000010001024 <+76>:    ld      r9,112(r31)
   0x0000000010001028 <+80>:    mr      r12,r9
   0x000000001000102c <+84>:    mtctr   r12
   0x0000000010001030 <+88>:    bctrl
   0x0000000010001034 <+92>:    ld      r2,24(r1)
   0x0000000010001038 <+96>:    ld      r5,136(r31)
   0x000000001000103c <+100>:   ld      r4,128(r31)
   0x0000000010001040 <+104>:   li      r3,91
   0x0000000010001044 <+108>:   bl      0x10000bc0
<00000022.plt_call.syscall@@GLIBC_2.17>
   0x0000000010001048 <+112>:   ld      r2,24(r1)
   0x000000001000104c <+116>:   mr      r9,r3
   0x0000000010001050 <+120>:   stw     r9,96(r31)
   0x0000000010001054 <+124>:   ld      r2,104(r31)
   0x0000000010001058 <+128>:   nop
   0x000000001000105c <+132>:   mr      r3,r9
   0x0000000010001060 <+136>:   addi    r1,r31,160
   0x0000000010001064 <+140>:   ld      r0,16(r1)
   0x0000000010001068 <+144>:   mtlr    r0
   0x000000001000106c <+148>:   ld      r31,-8(r1)
   0x0000000010001070 <+152>:   blr
   0x0000000010001074 <+156>:   .long 0x0
   0x0000000010001078 <+160>:   .long 0x1000000
   0x000000001000107c <+164>:   .long 0x1000180
(gdb) p /x $r2
   0x200000307300  (old toc saved at $r1+24)
(gdb) stepi 9
(gdb) p /x $r2
   0x10027f00  (after the instructions that replace $r2)
(gdb) stepi 7      (branches into foo() via function pointer)
(gdb) fin          (come back from foo())
(gdb) p /x $r2
   0x10027f00  (this is the new value and would be fine if it stayed)
(gdb) p /x *(long long*)((char*)$r2 - 32304)
   0x10003ef0
(gdb) info symbol 0x10003ef0
   syscall@plt (just confirming that the current $r2 has good data at
$r2-32304)
(gdb) stepi    (executing the instruction at <+92>)
(gdb) p /x $r2
   0x200000307300  (old toc saved at $r1+24)
(gdb) p /x *(long long*)((char*)$r2 - 32304)
   0x200000304fb0
(gdb) info symbol 0x200000304fb0
   __exit_funcs_lock (I think this is leftover, not the function we want to
branch to)
(gdb) stepi 4
(gdb) disassemble $pc
   0x0000000010000bc0 <+0>:     std     r2,24(r1)
   0x0000000010000bc4 <+4>:     ld      r12,-32304(r2)
   0x0000000010000bc8 <+8>:     mtctr   r12
   0x0000000010000bcc <+12>:    bctr
   0x0000000010000bd0 <+16>:    .long 0x0
   0x0000000010000bd4 <+20>:    .long 0x0
   0x0000000010000bd8 <+24>:    .long 0x0
   0x0000000010000bdc <+28>:    .long 0x0
(gdb) stepi 3
(gdb) p /x $r12
   0x200000304fb0
Now it's about to branch into __exit_funcs_lock and segv when it wanted to
branch to syscall@plt which is what it would have got if $r2 had the right
table of contents.

*** section of preprocessed test.i

If -save-temps is added to the Makefile, here's a clip of the output for test.i

int
intercept_munmap(void *start, size_t length)
{
    unsigned long toc_save; asm volatile ("std 2, %0" : "=m" (toc_save)); asm
volatile ("nop; nop; nop
; nop; nop");;
    volatile MyFunction_t fn;
    fn = foo;
    fn();
    int result = syscall(
# 26 "test.c" 3 4
                        91
# 26 "test.c"
                                  , start, length);
    asm volatile ("ld  2, %0" : : "m" (toc_save));;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug inline-asm/95692] PPC64, suspicious store in front of inline assembly section
  2020-06-15 21:53 [Bug inline-asm/95692] New: PPC64, suspicious store in front of inline assembly section markalle at us dot ibm.com
@ 2020-06-15 22:05 ` pinskia at gcc dot gnu.org
  2020-06-17  5:43 ` markalle at us dot ibm.com
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-06-15 22:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95692

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-06-15
             Status|UNCONFIRMED                 |WAITING
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Why do you think this is a GCC bug?  saving/restoring the TOC register behind
the back of GCC is incorrect and causes other issues.  Why not use pure
assembly code here instead of inline-asm and C?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug inline-asm/95692] PPC64, suspicious store in front of inline assembly section
  2020-06-15 21:53 [Bug inline-asm/95692] New: PPC64, suspicious store in front of inline assembly section markalle at us dot ibm.com
  2020-06-15 22:05 ` [Bug inline-asm/95692] " pinskia at gcc dot gnu.org
@ 2020-06-17  5:43 ` markalle at us dot ibm.com
  1 sibling, 0 replies; 3+ messages in thread
From: markalle at us dot ibm.com @ 2020-06-17  5:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95692

--- Comment #2 from Mark Allen <markalle at us dot ibm.com> ---
Hi, thanks for the input.

I was figuring gcc should see the inline assembly modifying r2
and any function call out of intercept_munmap() would be doing the
logical equivalent of saving the current r2, calling something
where a different r2 might be used, then upon return restoring
r2 to whatever it was when it left intercept_munmap().

But it sounds like you're saying gcc should be the exclusive
manager of r2.  And I get that the activity of making a function
call isn't always going to be as linear as "save current r2,
set r2 for whatever new context it's calling into, restore r2".

I'm open to other solutions.  For most alternatives I can think
of, I can at least imagine potential gcc behavior that would
conflict with us hacking r2, which I expect is why you're
suggesting doing nothing but our own assembly from
intercept_munmap() downward.

That would be rough though, we'd definitely like to be able to
call other functions from intercept_munmap().

The problem starts when we turn munmap into a branch into
intercept_munmap.  At that point r2 would have been set up as
valid for munmap() but then we branched into intercept_munmap()
which presumably needs its own r2.

So far the best hack I can see is to put it all in the
initial branch.  Eg overwrite munmap not just with a branch to
intercept_munmap, but replace r2 at that level (rather than at
the start of intercept_munmap).  That essentially puts both of the
things that hack the state into the same location, after which
the state is consistent for gcc to continue doing whatever it
wants.

I'm getting the impression this isn't going to receive a
blessing as a "good" solution though.  Any other ideas for
a "least bad" solution if we want to be able to make function
calls into other gcc-compiled functions from inside
intercept_munmap()?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-06-17  5:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-15 21:53 [Bug inline-asm/95692] New: PPC64, suspicious store in front of inline assembly section markalle at us dot ibm.com
2020-06-15 22:05 ` [Bug inline-asm/95692] " pinskia at gcc dot gnu.org
2020-06-17  5:43 ` markalle at us dot ibm.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).