public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/113255] New: wrong code with -O2 -mtune=k8
@ 2024-01-06 23:24 mednafen at sent dot com
  2024-01-06 23:53 ` [Bug target/113255] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: mednafen at sent dot com @ 2024-01-06 23:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

            Bug ID: 113255
           Summary: wrong code with -O2 -mtune=k8
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mednafen at sent dot com
  Target Milestone: ---
            Target: x86_64-pc-linux-gnu

Created attachment 57000
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57000&action=edit
test case

Linux x86_64

$ /usr/local/gcc-13.2.0/bin/gcc -Wall -O2 -o sprite sprite.c && ./sprite
1

$ /usr/local/gcc-13.2.0/bin/gcc -Wall -O2 -mtune=k8 -o sprite sprite.c &&
./sprite
0
Aborted


The problem appears to have started around version 10.2.0.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
@ 2024-01-06 23:53 ` pinskia at gcc dot gnu.org
  2024-01-07 12:13 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-06 23:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.5
      Known to fail|                            |10.2.0
   Last reconfirmed|                            |2024-01-06
      Known to work|                            |10.1.0
          Component|c                           |target
     Ever confirmed|0                           |1
            Summary|wrong code with -O2         |[11/12/13/14 Regression]
                   |-mtune=k8                   |wrong code with -O2
                   |                            |-mtune=k8
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

This looks to be a target issue.
`-O2 -mtune=k8  -mstringop-strategy=libcall` works but
`-O2 -mtune=k8  -mstringop-strategy=rep_8byte` does not.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
  2024-01-06 23:53 ` [Bug target/113255] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
@ 2024-01-07 12:13 ` jakub at gcc dot gnu.org
  2024-01-07 19:48 ` ubizjak at gmail dot com
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-01-07 12:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
           Keywords|needs-bisection             |
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r11-963-g80d6f89e78fc3b772701988cc73aa8e8006283be

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
  2024-01-06 23:53 ` [Bug target/113255] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
  2024-01-07 12:13 ` jakub at gcc dot gnu.org
@ 2024-01-07 19:48 ` ubizjak at gmail dot com
  2024-01-09  7:52 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ubizjak at gmail dot com @ 2024-01-07 19:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
_.dse1 pass is removing the store for some reason, -fno-dse "fixes" the
testcase.

Before _.dse1 pass, we have:

(insn 41 40 46 4 (set (mem/c:SI (plus:DI (reg/f:DI 19 frame)
                (const_int -36 [0xffffffffffffffdc])) [2 e[1].y+0 S4 A32])
        (reg:SI 98 [ e$1$y ])) "pr113255.c":21:9 85 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 98 [ e$1$y ])
        (nil)))

But _.dse1 pass decides that:

**scanning insn=41
  mem: (plus:DI (reg/f:DI 19 frame)
    (const_int -36 [0xffffffffffffffdc]))

   after canon_rtx address: (plus:DI (reg/f:DI 19 frame)
    (const_int -36 [0xffffffffffffffdc]))
  gid=1 offset=-36
 processing const base store gid=1[-36..-32)
mems_found = 1, cannot_delete = false

...

Locally deleting insn 41
deferring deletion of insn with uid = 41.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (2 preceding siblings ...)
  2024-01-07 19:48 ` ubizjak at gmail dot com
@ 2024-01-09  7:52 ` rguenth at gcc dot gnu.org
  2024-01-09  7:58 ` jakub at gcc dot gnu.org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-09  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
DSE thinks the store is dead because it falls off the function.

(insn 41 40 46 4 (set (mem/c:SI (plus:DI (reg/f:DI 19 frame)
                (const_int -36 [0xffffffffffffffdc])) [2 e[1].y+0 S4 A32])
        (reg:SI 98 [ e$1$y ])) "t.c":21:9 85 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 98 [ e$1$y ])
        (nil)))

...

(insn 64 63 68 4 (parallel [
            (set (reg:DI 131)
                (const_int 0 [0]))
            (set (reg/f:DI 129)
                (plus:DI (ashift:DI (reg:DI 131)
                        (const_int 3 [0x3]))
                    (reg/f:DI 129)))
            (set (reg/f:DI 119)
                (plus:DI (ashift:DI (reg:DI 131)
                        (const_int 3 [0x3]))
                    (reg/f:DI 119)))
            (set (mem/c:BLK (reg/f:DI 129) [1  A64])
                (mem/c:BLK (reg/f:DI 119) [1  A8]))
            (use (reg:DI 131))
        ]) "t.c":21:9 1416 {*rep_movdi_rex64}
     (expr_list:REG_UNUSED (reg:DI 131)
        (expr_list:REG_UNUSED (reg/f:DI 129)
            (expr_list:REG_UNUSED (reg/f:DI 119)
                (nil)))))

it appears to be a bit convoluted how we compute the source of the block
copy (reg/f:DI 119) and this is probably confusing "frame related" vs.
non-frame-related MEM disambiguation.

I'm unsure the above parallel is valid, isn't parallel executing stmts
in "parallel" (unspecified order)?

**scanning insn=64
  mem: (reg/f:DI 119)

   after canon_rtx address: (reg/f:DI 119)

   after cselib_expand address: (minus:DI (plus:DI (reg/f:DI 129)
        (reg/f:DI 133))
    (reg/f:DI 134))

   after canon_rtx address: (minus:DI (plus:DI (reg/f:DI 129)
        (reg/f:DI 133))
    (reg/f:DI 134))
  varying cselib base=17:1741806 offset = 0
 processing cselib load mem:(mem/c:BLK (reg/f:DI 119) [1  A8])
 processing cselib load against insn 55
removing from active insn=55 has store
 processing cselib load against insn 47
removing from active insn=47 has store
 processing cselib load against insn 41
mems_found = 0, cannot_delete = true

we're calling canon_true_dependence with

(gdb) p debug_rtx (store_info->mem)
(mem/c:SI (plus:DI (reg/f:DI 19 frame)
        (const_int -36 [0xffffffffffffffdc])) [2 e[1].y+0 S4 A32])
(gdb) p debug_rtx (mem)
(mem/c:BLK (reg/f:DI 119) [1  A8])
(gdb) p debug_rtx (mem_addr)
(minus:DI (plus:DI (value:DI 8:1741643 @0x49c8ba8/0x49b8e80)
        (value:DI 11:11 @0x49c8bf0/0x49b8f10))
    (value:DI 9:9 @0x49c8bc0/0x49b8eb0))

and find_base_term of the mem_addr is returning

(symbol_ref:DI ("g_e") [flags 0x2] <var_decl 0x7ffff7019c60 g_e>)

which is because of the weird way we compute the source address (involving
the address of the destination).  There's a very old bug about find_base_term
which tends to pick up a wrong base if multiple candidates are involved.

I can't decipher this from what expand generates but the problem lies
there (the rep_8bytes expansion).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (3 preceding siblings ...)
  2024-01-09  7:52 ` rguenth at gcc dot gnu.org
@ 2024-01-09  7:58 ` jakub at gcc dot gnu.org
  2024-01-09  8:00 ` ubizjak at gmail dot com
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-01-09  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> I'm unsure the above parallel is valid, isn't parallel executing stmts
> in "parallel" (unspecified order)?

I don't see anything invalid on it.  In addition to the memory copying, it
describes the other effects at the end of the pattern (that the counter goes to
0 and the di/si pointers are incremented by the original counter times 8).  All
the uses of pseudos in the pattern are the values of those pseudos before the
instruction, then all the sets at the end set the updated pseudos.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (4 preceding siblings ...)
  2024-01-09  7:58 ` jakub at gcc dot gnu.org
@ 2024-01-09  8:00 ` ubizjak at gmail dot com
  2024-01-09  9:27 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: ubizjak at gmail dot com @ 2024-01-09  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #4)

> I can't decipher this from what expand generates but the problem lies
> there (the rep_8bytes expansion).

Let's ask Honza.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (5 preceding siblings ...)
  2024-01-09  8:00 ` ubizjak at gmail dot com
@ 2024-01-09  9:27 ` jakub at gcc dot gnu.org
  2024-01-09 10:20 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-01-09  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Slightly cleaned up testcase:

struct S { unsigned a[10]; unsigned y; unsigned b[6]; } g[2];

__attribute__((noinline, noclone)) int
test (int x)
{
  struct S e[2] = { g[0], g[1] };
  int r = 0;
  if (x >= 0)
    {
      r++;
      e[1].y++;
    }
  g[1] = e[1];
  return r;
}

int
main ()
{
  test (1);
  if (g[1].y != 1)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (6 preceding siblings ...)
  2024-01-09  9:27 ` jakub at gcc dot gnu.org
@ 2024-01-09 10:20 ` rguenth at gcc dot gnu.org
  2024-01-09 12:09 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-09 10:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=49330,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=53705

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
    3: NOTE_INSN_FUNCTION_BEG
    8: r134:DI=const(`g'+0x44)
    9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
      REG_UNUSED flags:CC
...
   56: r129:DI=const(`g'+0x4c)
   57: {r129:DI=r129:DI&0xfffffffffffffff8;clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_EQUAL const(`g'+0x4c)&0xfffffffffffffff8
   58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
      REG_DEAD r134:DI
      REG_UNUSED flags:CC
      REG_EQUAL const(`g'+0x44)-r129:DI
   59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
      REG_DEAD r133:DI
      REG_UNUSED flags:CC

so the source is

  &e.. - ((&g+0x44) - (&g+0x44)&0xfff...8)

that is, the original source adjusted by the alignment prologue amount
which is aligning the destination.  That's a classical example for where
find_base_term is confused.  I'm not sure there's a way to "obfuscate"
things here.  I think find_base_term ignores paths with & (align ops)
but it goes down the subtract base path here.  So maybe instead of
subtracting using & (align-1) for that would work.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (7 preceding siblings ...)
  2024-01-09 10:20 ` rguenth at gcc dot gnu.org
@ 2024-01-09 12:09 ` rguenth at gcc dot gnu.org
  2024-01-10  8:57 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-09 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this:

static void
expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx
srcmem,
...
      /* See how many bytes we skipped.  */
      saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
                                       *destptr,
                                       saveddest, 1, OPTAB_DIRECT);
      /* Adjust srcptr and count.  */
      if (!issetmem)
        *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr,
                                       saveddest, *srcptr, 1, OPTAB_DIRECT);

is the problematical op, but I'm not seeing an easy way to avoid the MINUS
here.  It's also difficult to match (minus (..) (and ... aligning-cst))
in find_base_term as the AND is exposed via CSELIB values only.

find_base_term is known broken but base_alias_check continues to be "useful"
for aliasing with spill slots mostly.  find_base_term tries to do ad-hoc
points-to analysis but is not conservative in any way - it doesn't even
have a way to say a final "I don't know" which means there's no way to
perform a conservative correction to it.  In fact I don't think we can
make it conservative.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (8 preceding siblings ...)
  2024-01-09 12:09 ` rguenth at gcc dot gnu.org
@ 2024-01-10  8:57 ` rguenth at gcc dot gnu.org
  2024-01-12  8:48 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-10  8:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, trying to fix find_base_term isn't enough, init_alias_analysis
find_base_value needs to be adjusted as well.  One "obvious" mistake there
is a missing

diff --git a/gcc/alias.cc b/gcc/alias.cc
index b2ec4806d22..6aeb2167520 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -1492,6 +1492,13 @@ find_base_value (rtx src)
       {
        rtx temp, src_0 = XEXP (src, 0), src_1 = XEXP (src, 1);

+       /* If both operands of a MINUS are known pointers then the
+          base is not either of them.  */
+       if (GET_CODE (src) == MINUS
+           && REG_P (src_0) && REG_POINTER (src_0)
+           && REG_P (src_1) && REG_POINTER (src_1))
+         return 0;
+
        /* If either operand is a REG that is a known pointer, then it
           is the base.  */
        if (REG_P (src_0) && REG_POINTER (src_0))

but of course that's not conservative - not having REG_POINTER set doesn't
mean it's not a pointer.  But even when we assume REG_POINTER is
correct the minus operands might not be REG_P.

This is really all totally wrong for what it is (pointer analysis on RTL).
On RTL we also lost constraints that arithmetic stays within an object.
It should likely be scrapped completely and re-done, possibly having
SET_DEST_POINTS_TO to be able to put SSA points-to info to SETs
(REG_ATTRs are too coarse, but would be possible as well, losing some of
the flow sensitivity).  Incoming args & frame analysis would need to be
implemented of course.  As said, I'm not sure analyzing RTL will yield
anything good while being conservative - and optimistic points-to is what
leads us to these kind of bugs ...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (9 preceding siblings ...)
  2024-01-10  8:57 ` rguenth at gcc dot gnu.org
@ 2024-01-12  8:48 ` rguenth at gcc dot gnu.org
  2024-01-23  7:08 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-12  8:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
diff --git a/gcc/alias.cc b/gcc/alias.cc
index b2ec4806d22..0150dd699db 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -2272,6 +2272,8 @@ static bool
 base_alias_check (rtx x, rtx x_base, rtx y, rtx y_base,
                  machine_mode x_mode, machine_mode y_mode)
 {
+  return 1;
+
   /* If the address itself has no known base see if a known equivalent
      value has one.  If either address still has no known base, nothing
      is known about aliasing.  */

(an experiment I did many years ago already) gives clean testresults
besides

+FAIL: gcc.dg/guality/pr41447-1.c   -O2  -DPREVENT_OPTIMIZATION  execution test
+FAIL: gcc.dg/guality/pr41447-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution
test
+FAIL: gcc.dg/guality/pr41447-1.c   -Os  -DPREVENT_OPTIMIZATION  execution test
+FAIL: gcc.dg/guality/pr41447-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test

and only minor change in code generation:

   text    data     bss     dec     hex filename
48374841          64824 1939512 50379177        300b9a9 ../obj2/gcc/cc1plus
48375065          64824 1939512 50379401        300ba89 gcc/cc1plus

where the larger binary is the patched one.

Assembly-wise there are scheduling changes, missed scheduling over spills.

Recovering this and similar cases should be as easy as marking spill
MEMs with a flag (in MEM_EXPR) for example, distinguishing (classes of)
stack memory from the rest.  We should have this already (spill_slot_decl,
set_mem_attrs_for_spill), not sure why it doesn't look effective.

Improving test coverage for desired transforms would be nice as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (10 preceding siblings ...)
  2024-01-12  8:48 ` rguenth at gcc dot gnu.org
@ 2024-01-23  7:08 ` cvs-commit at gcc dot gnu.org
  2024-01-23  7:10 ` [Bug target/113255] [11/12/13 " rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-23  7:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #12 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a98d5130a6dcff2ed4db371e500550134777b8cf

commit r14-8346-ga98d5130a6dcff2ed4db371e500550134777b8cf
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jan 15 12:55:20 2024 +0100

    rtl-optimization/113255 - base_alias_check vs. pointer difference

    When the x86 backend generates code for cpymem with the rep_8byte
    strathegy for the 8 byte aligned main rep movq it needs to compute
    an adjusted pointer to the source after doing a prologue aligning
    the destination.  It computes that via

      src_ptr + (dest_ptr - orig_dest_ptr)

    which is perfectly fine.  On RTL this is then

        8: r134:DI=const(`g'+0x44)
        9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
          REG_UNUSED flags:CC
       56: r129:DI=const(`g'+0x4c)
       57: {r129:DI=r129:DI&0xfffffffffffffff8;clobber flags:CC;}
          REG_UNUSED flags:CC
          REG_EQUAL const(`g'+0x4c)&0xfffffffffffffff8
       58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
          REG_DEAD r134:DI
          REG_UNUSED flags:CC
          REG_EQUAL const(`g'+0x44)-r129:DI
       59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
          REG_DEAD r133:DI
          REG_UNUSED flags:CC

    but as written find_base_term happily picks the first candidate
    it finds for the MINUS which means it picks const(`g') rather
    than the correct frame:DI.  This way find_base_term (but also
    the unfixed find_base_value used by init_alias_analysis to
    initialize REG_BASE_VALUE) performs pointer analysis isn't
    sound.  The following restricts the handling of multi-operand
    operations to the case we know only one can be a pointer.

    This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
    RTL PRE (I've opened PR113395 for this).  A more drastic patch,
    removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
    regressing (so testsuite coverage is bad).  I've looked at
    gcc.dg/tree-ssa tests and mostly scheduling changes are present,
    the cc1plus .text size is only 230 bytes worse.  With the this
    less drastic patch below most scheduling changes are gone.

    x86_64 might not the very best target to test for impact, but
    test coverage on other targets is unlikely to be very much better.

            PR rtl-optimization/113255
            * alias.cc (find_base_term): Remove PLUS/MINUS handling
            when both operands are not CONST_INT_P.

            * gcc.dg/torture/pr113255.c: New testcase.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (11 preceding siblings ...)
  2024-01-23  7:08 ` cvs-commit at gcc dot gnu.org
@ 2024-01-23  7:10 ` rguenth at gcc dot gnu.org
  2024-01-23 14:38 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-23  7:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
      Known to work|                            |14.0
             Status|NEW                         |ASSIGNED
            Summary|[11/12/13/14 Regression]    |[11/12/13 Regression] wrong
                   |wrong code with -O2         |code with -O2 -mtune=k8
                   |-mtune=k8                   |

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (12 preceding siblings ...)
  2024-01-23  7:10 ` [Bug target/113255] [11/12/13 " rguenth at gcc dot gnu.org
@ 2024-01-23 14:38 ` cvs-commit at gcc dot gnu.org
  2024-02-01 12:53 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-23 14:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:3936c8709c25c8bc72be0c1b2cc3ae7a25dc90ec

commit r14-8363-g3936c8709c25c8bc72be0c1b2cc3ae7a25dc90ec
Author: H.J. Lu <(no_default)>
Date:   Tue Jan 23 06:34:43 2024 -0800

    gcc.dg/torture/pr113255.c: Fix ia32 test failure

    Fix ia32 test failure:

    FAIL: gcc.dg/torture/pr113255.c   -O1  (test for excess errors)
    Excess errors:
    cc1: error: '-mstringop-strategy=rep_8byte' not supported for 32-bit code

            PR rtl-optimization/113255
            * gcc.dg/torture/pr113255.c (dg-additional-options): Add only
            if not ia32.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (13 preceding siblings ...)
  2024-01-23 14:38 ` cvs-commit at gcc dot gnu.org
@ 2024-02-01 12:53 ` rguenth at gcc dot gnu.org
  2024-02-01 13:08 ` rguenth at gcc dot gnu.org
  2024-02-05  7:35 ` cvs-commit at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-01 12:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is also that via CSELIB we go from the good

(minus:DI (reg/f:DI 119)
    (reg:DI 115))

to

(minus:DI (value:DI 11:11 @0x41fca00/0x41ec410)
    (value:DI 10:15448 @0x41fc9e8/0x41ec3e0))

and later when DSE does cselib_expand_value_rtx on the value it produces

(minus:DI (reg/f:DI 119)
    (minus:DI (reg/f:DI 120)
        (reg/f:DI 114)))

which simplify_rtx then turns into

(minus:DI (plus:DI (reg/f:DI 114)
        (reg/f:DI 119))
    (reg/f:DI 120))

note how that associates things in a way that confuses us later.  In particular
the loc for (value:DI 10:15448) (aka the inner minus) isn't REG_POINTER
(after you fix i386 RTL expansion) but after the re-assloc there's only
the wrong REG_POINTER immediately visible.

DSE gets this all back-and-forth into/out-of CSELIB, it feels a bit of a mess.
It obviously relies on the expansion to discover base values.

First the x86 backend should avoid having a REG_POINTER as the pointer
difference:

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 0d817fc3f3b..26c48e8b0c8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -8090,7 +8090,7 @@
expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx
src
       /* See how many bytes we skipped.  */
       saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
                                       *destptr,
-                                      saveddest, 1, OPTAB_DIRECT);
+                                      NULL_RTX, 1, OPTAB_DIRECT);
       /* Adjust srcptr and count.  */
       if (!issetmem)
        *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr,

We can avoid the issue by avoiding re-association of pointer MINUS:

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index ee75079917f..0108d0aa3bd 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3195,11 +3195,15 @@ simplify_context::simplify_binary_operation_1 (rtx_code
code,
          canonicalize (minus A (plus B C)) to (minus (minus A B) C).
         Don't use the associative law for floating point.
         The inaccuracy makes it nonassociative,
-        and subtle programs can break if operations are associated.  */
+        and subtle programs can break if operations are associated.
+        Don't use the associative law when subtracting a MINUS from
+        a REG_POINTER as that can trick find_base_term into discovering
+        the wrong base.  */

       if (INTEGRAL_MODE_P (mode)
          && (plus_minus_operand_p (op0)
-             || plus_minus_operand_p (op1))
+             || ((!REG_P (op0) || !REG_POINTER (op0))
+                 && plus_minus_operand_p (op1)))
          && (tem = simplify_plus_minus (code, mode, op0, op1)) != 0)
        return tem;


or we can avoid it with a more dangerous (IMHO) "fix" like the following
which while it looks good on the front, isn't reliable and might instead
trick find_base_term to deflect to another invalid base.

diff --git a/gcc/alias.cc b/gcc/alias.cc
index 3672bf277b9..f589a1fa47a 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -2094,7 +2101,14 @@ find_base_term (rtx x, vec<std::pair<cselib_val *,
        if (base != NULL_RTX
            && ((REG_P (tmp1) && REG_POINTER (tmp1))
                 || known_base_value_p (base)))
-         return base;
+         {
+           /* If, for a MINUS, we find another base value in the
+              other operand, fail.  */
+           if (GET_CODE (x) == MINUS
+               && find_base_term (tmp2, visited_vals) != NULL)
+             return 0;
+           return base;
+         }
        base = find_base_term (tmp2, visited_vals);
        if (base != NULL_RTX
            && ((REG_P (tmp2) && REG_POINTER (tmp2))

This all shows alternatives that might be suitable for branches and possibly
trunk when we decide to revert the fix that's currently there.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (14 preceding siblings ...)
  2024-02-01 12:53 ` rguenth at gcc dot gnu.org
@ 2024-02-01 13:08 ` rguenth at gcc dot gnu.org
  2024-02-05  7:35 ` cvs-commit at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-01 13:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
The "interesting" part is that the i386 + simplify_rtx parts fix the issue but
if you add the alias.cc part ontop it again fails at -O1 (the alias.cc part
alone also "fixes" it).  This all of course shows that RTL alias analysis
is fundamentally broken w/o r14-8346

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/113255] [11/12/13 Regression] wrong code with -O2 -mtune=k8
  2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
                   ` (15 preceding siblings ...)
  2024-02-01 13:08 ` rguenth at gcc dot gnu.org
@ 2024-02-05  7:35 ` cvs-commit at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-02-05  7:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #17 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:5b281946c4b51132caf5e5b64c730fef92dd6123

commit r14-8796-g5b281946c4b51132caf5e5b64c730fef92dd6123
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Feb 1 13:54:11 2024 +0100

    target/113255 - avoid REG_POINTER on a pointer difference

    The following avoids re-using a register holding a pointer (and
    thus might be REG_POINTER) for the result of a pointer difference
    computation.  That might confuse heuristics in (broken) RTL alias
    analysis which relies on REG_POINTER indicating that we're
    dealing with one.

    This alone doesn't fix anything.

            PR target/113255
            * config/i386/i386-expand.cc
            (expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves):
            Use a new pseudo for the skipped number of bytes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-02-05  7:35 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-06 23:24 [Bug c/113255] New: wrong code with -O2 -mtune=k8 mednafen at sent dot com
2024-01-06 23:53 ` [Bug target/113255] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
2024-01-07 12:13 ` jakub at gcc dot gnu.org
2024-01-07 19:48 ` ubizjak at gmail dot com
2024-01-09  7:52 ` rguenth at gcc dot gnu.org
2024-01-09  7:58 ` jakub at gcc dot gnu.org
2024-01-09  8:00 ` ubizjak at gmail dot com
2024-01-09  9:27 ` jakub at gcc dot gnu.org
2024-01-09 10:20 ` rguenth at gcc dot gnu.org
2024-01-09 12:09 ` rguenth at gcc dot gnu.org
2024-01-10  8:57 ` rguenth at gcc dot gnu.org
2024-01-12  8:48 ` rguenth at gcc dot gnu.org
2024-01-23  7:08 ` cvs-commit at gcc dot gnu.org
2024-01-23  7:10 ` [Bug target/113255] [11/12/13 " rguenth at gcc dot gnu.org
2024-01-23 14:38 ` cvs-commit at gcc dot gnu.org
2024-02-01 12:53 ` rguenth at gcc dot gnu.org
2024-02-01 13:08 ` rguenth at gcc dot gnu.org
2024-02-05  7:35 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).