public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers
@ 2020-10-19 16:39 krebbel at gcc dot gnu.org
  2020-10-19 16:41 ` [Bug rtl-optimization/97497] " krebbel at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: krebbel at gcc dot gnu.org @ 2020-10-19 16:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

            Bug ID: 97497
           Summary: gcse wrong code generation with partial register
                    clobbers
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: krebbel at gcc dot gnu.org
  Target Milestone: ---

Compiling the attached testcase produces wrong code on IBM Z:

cc1 t.c -m31 -mzarch -march=z900 -O2 -fpic -o t.s

foo:
        stm     %r11,%r15,44(%r15)
        larl    %r12,_GLOBAL_OFFSET_TABLE_
        lr      %r11,%r2
        l       %r1,t@GOT(%r12)
        ahi     %r15,-96
        lhi     %r2,1
        l       %r3,0(%r1)
        brasl   %r14,bar@PLT
        ltr     %r11,%r11
        jne     .L8
        lhi     %r2,1
        l       %r3,0                 <--- dereference address 0
        brasl   %r14,bar@PLT
        l       %r4,152(%r15)
        lm      %r11,%r15,140(%r15)
        br      %r4
.L8:
        lhi     %r3,1
        l       %r2,0                 <--- dereference address 0
        brasl   %r14,baz@PLT
        lhi     %r2,1
        l       %r3,0
        brasl   %r14,bar@PLT
        l       %r4,152(%r15)
        lm      %r11,%r15,140(%r15)
        br      %r4


gcse decides to remove the load from t in the subsequent bbs but does not
generate the load into a temp reg in the first bb leaving the bbs loading from
an uninitialized pseudo.

With -mzarch -m31 we have the GOT pointer marked as partially clobbered. The
loads from t use the GOT pointer explicitly in the RTX. Since this patch r12 is
considered to be fully clobbered by call insns:

commit a4dfaad2e5594d871fe00a1116005e28f95d644e (refs/bisect/bad)
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Mon Sep 30 16:20:44 2019 +0000

    Remove global call sets: gcse.c

    This is another case in which we can conservatively treat partial
    kills as full kills.  Again this is in principle a bug fix for
    TARGET_HARD_REGNO_CALL_PART_CLOBBERED targets, but in practice
    it probably doesn't make a difference.

    2019-09-30  Richard Sandiford  <richard.sandiford@arm.com>

    gcc/
            * gcse.c: Include function-abi.h.
            (compute_hash_table_work): Use insn_callee_abi to get the ABI of
            the call insn target.  Invalidate partially call-clobbered
            registers as well as fully call-clobbered ones.

    From-SVN: r276323

Now the RTX for t which references r12 is considered to be not available
anymore in the later bbs due to r12 being clobbered by the calls. Hence no load
of the original expression is being emitted.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
@ 2020-10-19 16:41 ` krebbel at gcc dot gnu.org
  2020-10-19 22:00 ` rsandifo at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: krebbel at gcc dot gnu.org @ 2020-10-19 16:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #1 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Created attachment 49402
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49402&action=edit
Proposed fix

With the patch only regs are considered which aren't "fixed" assuming that for
fixed_regs the backend takes care of only actually using the well-defined part
of the hard regs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
  2020-10-19 16:41 ` [Bug rtl-optimization/97497] " krebbel at gcc dot gnu.org
@ 2020-10-19 22:00 ` rsandifo at gcc dot gnu.org
  2020-10-20  5:31 ` krebbel at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-10-19 22:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Andreas Krebbel from comment #1)
> Created attachment 49402 [details]
> Proposed fix
> 
> With the patch only regs are considered which aren't "fixed" assuming that
> for fixed_regs the backend takes care of only actually using the
> well-defined part of the hard regs.
I don't think this is right.  The principle is the same for
all types of register.

The idea is that a partial clobber can conservatively be treated
as a read of the old value and a set of the new value.  That might
be suboptimal, but it should never lead to wrong code.  It sounds
like there's a deeper issue here.

(BTW, the testcase attachment seems to be missing.)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
  2020-10-19 16:41 ` [Bug rtl-optimization/97497] " krebbel at gcc dot gnu.org
  2020-10-19 22:00 ` rsandifo at gcc dot gnu.org
@ 2020-10-20  5:31 ` krebbel at gcc dot gnu.org
  2020-10-20  6:17 ` krebbel at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: krebbel at gcc dot gnu.org @ 2020-10-20  5:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #3 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Created attachment 49405
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49405&action=edit
testcase

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-10-20  5:31 ` krebbel at gcc dot gnu.org
@ 2020-10-20  6:17 ` krebbel at gcc dot gnu.org
  2020-10-20 10:30 ` rsandifo at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: krebbel at gcc dot gnu.org @ 2020-10-20  6:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #4 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Reading from symbol t uses the GOT pointer in r12. The call then partially
clobbers r12 but does not affect the lower 32 bits where the GOT pointer
resides. So the GOT pointer stays in fact live across the call.

The way this is currently handled in gcse the second access of t reads a
different value than the first and this is wrong I think. This leads to a
disagreement between the pre_delete_map and the insert locations. The later
read of t is removed because it is an anticipated use but no copy of the value
is inserted after the first read of t because the expression is not considered
to be available anymore at the second location. The availability of the entire
expression is broken by the set of r12 at the call insns.

I didn't know how to solve this without being able to keep track of what parts
of hard regs are clobbered.

The idea behind the fixed_reg check is to trust the backend that it does not
emit uninitialized uses of hard regs in the first place.

t.c.250r.cprop1:

(insn 6 3 7 2 (set (reg/f:SI 65)
        (mem/u/c:SI (plus:SI (reg:SI 12 %r12)
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("t") [flags 0x6c0]  <var_decl
0x3fffdff42d0 t>)
                        ] UNSPEC_GOT))) [0  S4 A8])) "t.c":8:3 1387
{*movsi_zarch}
     (nil))
(insn 7 6 8 2 (set (reg:SI 3 %r3)
        (mem/f/c:SI (reg/f:SI 65) [1 t+0 S4 A32])) "t.c":8:3 1387
{*movsi_zarch}
     (expr_list:REG_DEAD (reg/f:SI 65)
        (nil)))
(insn 8 7 9 2 (set (reg:SI 2 %r2)
        (const_int 1 [0x1])) "t.c":8:3 1387 {*movsi_zarch}
     (nil))
(call_insn 9 8 10 2 (parallel [
            (call (mem:QI (const:SI (unspec:SI [
                                (symbol_ref:SI ("bar") [flags 0x41] 
<function_decl 0x3fff0555200 bar>)
                            ] UNSPEC_PLT)) [0 bar S1 A8])
                (const_int 0 [0]))
            (clobber (reg:SI 14 %r14))
        ]) "t.c":8:3 2053 {*brasl}
     (expr_list:REG_DEAD (reg:SI 3 %r3)
        (expr_list:REG_DEAD (reg:SI 2 %r2)
            (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41] 
<function_decl 0x3fff0555200 bar>)
                (nil))))
    (expr_list (use (reg:SI 12 %r12))
        (expr_list:SI (use (reg:SI 2 %r2))
            (expr_list:SI (use (reg:SI 3 %r3))
                (nil)))))


...

(insn 13 12 14 4 (set (reg/f:SI 66)
        (mem/u/c:SI (plus:SI (reg:SI 12 %r12)
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("t") [flags 0x6c0]  <var_decl
0x3fffdff42d0 t>)
                        ] UNSPEC_GOT))) [0  S4 A8])) "t.c":10:5 1387
{*movsi_zarch}
     (nil))

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-10-20  6:17 ` krebbel at gcc dot gnu.org
@ 2020-10-20 10:30 ` rsandifo at gcc dot gnu.org
  2020-10-20 12:54 ` krebbel at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-10-20 10:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I think the problem is a disconnect between compute_transp
and the code in gcse.c itself.  compute_transp considers %r12
to be transparent in all blocks despite the partial clobbers.
But whether that's true is context-dependent.

I think the fix is to make transp also handle partial clobbers
in a conservative way.  I don't have a specific suggestion how
to do that yet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-10-20 10:30 ` rsandifo at gcc dot gnu.org
@ 2020-10-20 12:54 ` krebbel at gcc dot gnu.org
  2020-10-26 13:31 ` rsandifo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: krebbel at gcc dot gnu.org @ 2020-10-20 12:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #6 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Alternatively I could also mark r12 as preserved across function calls for
-fpic in the backend. In fact all the bits we care about are preserved. Since
the register is fixed all the accesses do come from the backend itself.

That's similar to what I was trying with the fixed_regs hack. But I agree that
this might not be correct in general.

The full fix is probably to track the exact parts of partially clobbered regs
which stay live but this would be a major change.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2020-10-20 12:54 ` krebbel at gcc dot gnu.org
@ 2020-10-26 13:31 ` rsandifo at gcc dot gnu.org
  2020-10-27 20:04 ` cvs-commit at gcc dot gnu.org
  2020-10-28 11:06 ` rsandifo at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-10-26 13:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #7 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Andreas Krebbel from comment #6)
> Alternatively I could also mark r12 as preserved across function calls for
> -fpic in the backend. In fact all the bits we care about are preserved.
> Since the register is fixed all the accesses do come from the backend itself.

Ah, yeah, that sounds good to me FWIW.

> That's similar to what I was trying with the fixed_regs hack. But I agree
> that this might not be correct in general.
> 
> The full fix is probably to track the exact parts of partially clobbered
> regs which stay live but this would be a major change.

Yeah.  If you do the above, I'll open a new PR for the underlying
gcse.c issue, since I think the problem is latent on targets that
have “real” partially-clobbered registers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2020-10-26 13:31 ` rsandifo at gcc dot gnu.org
@ 2020-10-27 20:04 ` cvs-commit at gcc dot gnu.org
  2020-10-28 11:06 ` rsandifo at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-27 20:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andreas Krebbel <krebbel@gcc.gnu.org>:

https://gcc.gnu.org/g:2b3e722a3ca1b9dcfff1c016e651d0d681de1af0

commit r11-4460-g2b3e722a3ca1b9dcfff1c016e651d0d681de1af0
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Tue Oct 27 20:57:39 2020 +0100

    Fix PR97497

    This works around a limitation of gcse with handling of partially
    clobbered registers.  With this patch our GOT pointer register r12 is
    not marked as partially clobbered anymore for the -m31 -mzarch -fpic
    combination. This is correct since all the bits in r12 we actually
    care about are in fact preserved.

    gcc/ChangeLog:

            PR rtl-optimization/97497
            * config/s390/s390.c (s390_hard_regno_call_part_clobbered): Do not
            return true for r12 when -fpic is used.

    gcc/testsuite/ChangeLog:

            * gcc.target/s390/pr97497.c: New test.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
  2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2020-10-27 20:04 ` cvs-commit at gcc dot gnu.org
@ 2020-10-28 11:06 ` rsandifo at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-10-28 11:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-10-28
           Assignee|unassigned at gcc dot gnu.org      |rsandifo at gcc dot gnu.org

--- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Thanks for the s390 fix.  On second thoughts, I guess it would
be easier to keep this bug open for the general problem, since
it has all the info.

Hope to get to this early in stage 3.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-28 11:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 16:39 [Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers krebbel at gcc dot gnu.org
2020-10-19 16:41 ` [Bug rtl-optimization/97497] " krebbel at gcc dot gnu.org
2020-10-19 22:00 ` rsandifo at gcc dot gnu.org
2020-10-20  5:31 ` krebbel at gcc dot gnu.org
2020-10-20  6:17 ` krebbel at gcc dot gnu.org
2020-10-20 10:30 ` rsandifo at gcc dot gnu.org
2020-10-20 12:54 ` krebbel at gcc dot gnu.org
2020-10-26 13:31 ` rsandifo at gcc dot gnu.org
2020-10-27 20:04 ` cvs-commit at gcc dot gnu.org
2020-10-28 11:06 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).