public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu
@ 2021-10-30 17:45 zhendong.su at inf dot ethz.ch
  2021-10-30 22:12 ` [Bug tree-optimization/103006] [9/10/11/12 Regression] " pinskia at gcc dot gnu.org
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: zhendong.su at inf dot ethz.ch @ 2021-10-30 17:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

            Bug ID: 103006
           Summary: wrong code at -O2 (only) on x86_64-linux-gnu
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhendong.su at inf dot ethz.ch
  Target Milestone: ---

This is quite long-latent, affecting GCC 7.* and later.

[856] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++
--disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20211030 (experimental) [master r12-4804-g75c9fa318e3] (GCC)
[857] %
[857] % gcctk -O1 small.c; ./a.out
0
[858] % gcctk -O2 small.c
[859] % ./a.out
0
Aborted
[860] %
[860] % cat small.c
int printf(const char *, ...);
int a, *b, c, e, f;
void g() {
  int *d[7];
  d[6] = b = (int *)d;
  printf("0\n");
}
int i() {
  for (c = 0; c < 2; c++) {
    long h[6][2];
    for (e = 0; e < 6; e++)
      for (f = 0; f < 2; f++)
        h[e][f] = 1;
    if (c) {
      g();
      return h[3][0];
    }
  }
  return 0;
}
int main() {
  if (i() != 1)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/103006] [9/10/11/12 Regression] wrong code at -O2 (only) on x86_64-linux-gnu
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
@ 2021-10-30 22:12 ` pinskia at gcc dot gnu.org
  2021-10-30 22:15 ` [Bug middle-end/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 " pinskia at gcc dot gnu.org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-30 22:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-10-30
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, here is one which is broken at -O1 and -O2 but ok at -O3:

__attribute__((noipa)) void ff(void){}
int a, *b, c, e, f;
__attribute__((always_inline))
static inline void g() {
  int *d[7];
  d[6] = b = (int *)d;
  ff();
}
__attribute__((noinline))
int i() {
  for (c = 0; c < 2; c++) {
    long h[6][2];
    for (e = 0; e < 6; e++)
      for (f = 0; f < 2; f++)
        h[e][f] = 1;
    if (c) {
      g();
      return h[3][0];
    }
  }
  return 0;
}
int main() {
  if (i() != 1)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
  2021-10-30 22:12 ` [Bug tree-optimization/103006] [9/10/11/12 Regression] " pinskia at gcc dot gnu.org
@ 2021-10-30 22:15 ` pinskia at gcc dot gnu.org
  2021-10-30 22:28 ` [Bug rtl-optimization/103006] " pinskia at gcc dot gnu.org
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-30 22:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[9/10/11/12 Regression]     |[9/10/11/12 Regression]
                   |wrong code at -O2 (only) on |wrong code at -O1 or -O2 on
                   |x86_64-linux-gnu            |x86_64-linux-gnu
          Component|tree-optimization           |middle-end
      Known to fail|                            |8.1.0
   Target Milestone|---                         |9.5
           Keywords|                            |wrong-code

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Since noipa does not work with GCC 7, the following fails all the back to GCC
7:

__attribute__((noipa,noinline,noclone)) void ff(void){asm("":::"memory");}
int a, *b, c, e, f;
__attribute__((always_inline))
static inline void g() {
  int *d[7];
  d[6] = b = (int *)d;
  ff();
}
__attribute__((noinline))
int i() {
  for (c = 0; c < 2; c++) {
    long h[6][2];
    for (e = 0; e < 6; e++)
      for (f = 0; f < 2; f++)
        h[e][f] = 1;
    if (c) {
      g();
      return h[3][0];
    }
  }
  return 0;
}
int main() {
  if (i() != 1)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
  2021-10-30 22:12 ` [Bug tree-optimization/103006] [9/10/11/12 Regression] " pinskia at gcc dot gnu.org
  2021-10-30 22:15 ` [Bug middle-end/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 " pinskia at gcc dot gnu.org
@ 2021-10-30 22:28 ` pinskia at gcc dot gnu.org
  2021-11-01 11:13 ` [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101 jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-30 22:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |rtl-optimization
           Keywords|                            |ra

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This looks like a Register allocator issue.
IN GCC 6 we have:
.L8:
        movq    %rsp, b(%rip)
        movq    %rsp, 48(%rsp)
        call    _Z2ffv
        movq    112(%rsp), %rax
        addq    $160, %rsp
        .cfi_def_cfa_offset 8
        ret

While in GCC 7+ we get:
        movl    $2, f(%rip)
        movl    $6, e(%rip)
        movq    %rsp, b(%rip)
        movq    %rsp, 48(%rsp)
        call    _Z2ffv
        movl    48(%rsp), %eax
        addq    $96, %rsp
        .cfi_def_cfa_offset 8
        ret

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (2 preceding siblings ...)
  2021-10-30 22:28 ` [Bug rtl-optimization/103006] " pinskia at gcc dot gnu.org
@ 2021-11-01 11:13 ` jakub at gcc dot gnu.org
  2021-11-01 11:29 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-01 11:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |matz at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Doesn't look to me like RA issue, but rather incorrect coalescing of temporary
vars.
Optimized dump has:
  <bb 2> [local count: 1652516]:
  ivtmp.40_47 = (unsigned long) &h;
  _30 = ivtmp.40_47 + 96;

  <bb 3> [local count: 13370357]:
  # ivtmp.40_38 = PHI <ivtmp.40_47(2), ivtmp.40_44(3)>
  _21 = (void *) ivtmp.40_38;
  MEM[(long int *)_21] = 1;
  MEM[(long int *)_21 + 8B] = 1;
  ivtmp.40_44 = ivtmp.40_38 + 16;
  if (_30 != ivtmp.40_44)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 4> [local count: 1652516]:
  h ={v} {CLOBBER};

  <bb 5> [local count: 13370357]:
  # ivtmp.30_29 = PHI <ivtmp.30_48(5), ivtmp.40_47(4)>
  _31 = (void *) ivtmp.30_29;
  MEM[(long int *)_31] = 1;
  MEM[(long int *)_31 + 8B] = 1;
  ivtmp.30_48 = ivtmp.30_29 + 16;
  if (_30 != ivtmp.30_48)
    goto <bb 5>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 6> [local count: 1652516]:
  f = 2;
  e = 6;
  c = 1;
  b = &d;
  d[6] = &d;
  ff ();
  d ={v} {CLOBBER};
  _5 = h[3][0];
  _18 = (int) _5;
  h ={v} {CLOBBER};
  return _18;

So, the code initializes the whole h array, then has h ={v} {CLOBBER};, then
initializes it again, but unfortunately without mentioning the var in the IL -
it reuses ivtmp.40_47 for that, then sets various vars, including d[6]
initialization with &d escaping, clobbers d, reads from h and finally clobbers
h again.  I guess the var partitioning code from the above thinks h isn't
really live in between the first h clobber and h[3][0] load and so decides:
Partition 0: size 96 align 16
        h       d
and places both h and d into the same stack slot.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (3 preceding siblings ...)
  2021-11-01 11:13 ` [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101 jakub at gcc dot gnu.org
@ 2021-11-01 11:29 ` jakub at gcc dot gnu.org
  2021-11-02  7:17 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-01 11:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What we see is an effect of 3 different optimizations, one is loop unrolling
(cunroll in this case), another one is ivopts that moves the h array references
out of the loops but still has separate:
  ivtmp.40_47 = (unsigned long) &h;
  _17 = (unsigned long) &h;
  _30 = _17 + 96;
...
  h ={v} {CLOBBER};
  ivtmp.30_33 = (unsigned long) &h;
  _41 = (unsigned long) &h;
  _36 = _41 + 96;
and finally dom3's VN which replaces ivtmp.30_33 initializer with ivtmp.40_47
and _36 with _30.
If what the cfg expand var partition code is as designed (I think other passes
do it too, e.g. compute_live_vars/live_vars_at_stmt relies on it too), then we
need to somehow avoid VN of &var across var ={v} {CLOBBER} stmt, but it isn't
really clear to me how.
Unless we change loop unrolling so that the different loop iterations if there
is a var clobber in the loop actually have different variables (the first
iteration the original var and other iterations that var's copies; perhaps only
for addressable vars?).  Then naturally VN couldn't merge those and the RTL
partitioning code could decide to put them into the same or different partition
and later RTL opts could CSE the addresses.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (4 preceding siblings ...)
  2021-11-01 11:29 ` jakub at gcc dot gnu.org
@ 2021-11-02  7:17 ` rguenth at gcc dot gnu.org
  2021-11-02  7:57 ` jakub at gcc dot gnu.org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-02  7:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Isn't this another case of the still unsolved PR90348?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (5 preceding siblings ...)
  2021-11-02  7:17 ` rguenth at gcc dot gnu.org
@ 2021-11-02  7:57 ` jakub at gcc dot gnu.org
  2021-11-02  8:10 ` rguenther at suse dot de
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-02  7:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Looks like that.
What do you think about unrolling making variable copies?  We'd need to be sure
that the scope of the variable is the loop we are unrolling though (or
something nested in it).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (6 preceding siblings ...)
  2021-11-02  7:57 ` jakub at gcc dot gnu.org
@ 2021-11-02  8:10 ` rguenther at suse dot de
  2021-11-02  8:20 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenther at suse dot de @ 2021-11-02  8:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 2 Nov 2021, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006
> 
> --- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Looks like that.
> What do you think about unrolling making variable copies?  We'd need to be sure
> that the scope of the variable is the loop we are unrolling though (or
> something nested in it).

Being able to determine that would solve the very issue we're trying
to fix with making the copy.  The problem is that we put in CLOBBERs
based on the original BINDs but later optimizers do not respect the
birth boundary.  If we can figure that out we could ignore the
respective CLOBBERs for the CFG expansion live compute as well.

I think we may be able to compute the CFG SCC a CLOBBER resides in
and in case the CLOBBERed variable is live-in into that SCC we cannot
prune it with that CLOBBER.  Or so.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (7 preceding siblings ...)
  2021-11-02  8:10 ` rguenther at suse dot de
@ 2021-11-02  8:20 ` jakub at gcc dot gnu.org
  2021-11-02 13:55 ` rguenther at suse dot de
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-02  8:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|ra                          |

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We don't have that many unrolling passes though, and perhaps we could use
compute_live_vars and decide based on that.  Though I guess the addresses of
the vars could be even then hoisted before such loops, though it is unclear why
it would be done, it can't be VN because the vars don't exist outside of the
loop.
Say LIM can do that though...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (8 preceding siblings ...)
  2021-11-02  8:20 ` jakub at gcc dot gnu.org
@ 2021-11-02 13:55 ` rguenther at suse dot de
  2021-11-05 13:39 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenther at suse dot de @ 2021-11-02 13:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 2 Nov 2021, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Keywords|ra                          |
> 
> --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> We don't have that many unrolling passes though, and perhaps we could use
> compute_live_vars and decide based on that.  Though I guess the addresses of
> the vars could be even then hoisted before such loops, though it is unclear why
> it would be done, it can't be VN because the vars don't exist outside of the
> loop.
> Say LIM can do that though...

Yes, LIM can do it as can loop header copying or jump threading that
happens to peel an iteration.  Also there's no dataflow barrier that
prevents addresses from being moved so that "no pass does this" isn't
a good excuse.

That said, I think the fix is to the code computing stack slot reuse.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (9 preceding siblings ...)
  2021-11-02 13:55 ` rguenther at suse dot de
@ 2021-11-05 13:39 ` rguenth at gcc dot gnu.org
  2022-01-31 10:52 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-05 13:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (10 preceding siblings ...)
  2021-11-05 13:39 ` rguenth at gcc dot gnu.org
@ 2022-01-31 10:52 ` rguenth at gcc dot gnu.org
  2022-01-31 13:01 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-31 10:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #8)
> On Tue, 2 Nov 2021, jakub at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006
> > 
> > --- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > Looks like that.
> > What do you think about unrolling making variable copies?  We'd need to be sure
> > that the scope of the variable is the loop we are unrolling though (or
> > something nested in it).
> 
> Being able to determine that would solve the very issue we're trying
> to fix with making the copy.  The problem is that we put in CLOBBERs
> based on the original BINDs but later optimizers do not respect the
> birth boundary.  If we can figure that out we could ignore the
> respective CLOBBERs for the CFG expansion live compute as well.
> 
> I think we may be able to compute the CFG SCC a CLOBBER resides in
> and in case the CLOBBERed variable is live-in into that SCC we cannot
> prune it with that CLOBBER.  Or so.

Tried that but while it solves PR97821 it doesn't fix the case in this bug
because there the CLOBBER that breaks things is not inside a SCC but the
issue in this case is that not all accesses to 'h' also mention h and thus
we miss to make 'h' live again after the CLOBBER.

I also fear the more variables we expose the easier it will be to run into
this issue.

It might be possible to (conservatively) track pointers from ADDR_EXPR
mentions, but that's going to give up in most of the interesting cases
(it has the same issue as ideas how to prevent "leakage" via pointers).

The other idea that we discussed past in time is to perform stack slot sharing
early when the CLOBBERs are still OK to use.  Possibly even w/o CLOBBERs
but with the GIMPLE binds we have even after gimplifying.  We'd make the
"stack slot" sharing explicit by replacing decls assigned to the same
stack slot with either the larges of the decls or anonymous stack memory
(a new decl).  Doing this before inlining is complete will obviously not
catch all important cases.  Doing it after inlining requires being careful
during early optimizations (jump threading is the one transform we do early
that can cause code duplication and followup CSE).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (11 preceding siblings ...)
  2022-01-31 10:52 ` rguenth at gcc dot gnu.org
@ 2022-01-31 13:01 ` rguenth at gcc dot gnu.org
  2022-01-31 13:08 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-31 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, and I think address-takens are really not an issue but the accesses based
on them which confuse the simplistic live analysis to not recognize those as
births.

So we _can_ introduce explicit birth operations.  The simplest thing we
probably
can do is to add clobbers there and set a special 'birth' flag on them
just for liveness analysis, the rest of the compiler can treat them like
clobbers - besides cases where we remove clobbers.  We can't remove a birth
without also removing all clobbers of a variable (even in copies of birth-death
regions).  It might be tempting to somehow link birth and its clobbers (IIRC
with cleanups and
so we can have multiple clobbers for one birth), like via SSA def and uses, but
when we copy a birth that breaks down.  So the alternative is probably to
mark a variable as not to be subject to stack slot sharing when removing a
birth clobber.

The initial birth clobber would be at a more conservative position than
the current way of treating the first mention as birth but we can sink
birth clobbers (even across address takens) and hoist clobbers to shrink
live ranges at some point.

Both birth and clobber act as optimization barrier for loads and stores
of the affected variable, that's good for the purpose but possibly bad
for optimization.  I checked and for example loop store motion does consider
clobbers inside a loop as reason to not optimize.

And with the current scheme we don't even optimize cases like

struct Foo { int i; int j; int a[24]; };

void bar(struct Foo f);

void baz()
{
  struct Foo f, g;
  f.i = 1;
  bar (f);
  g.j = 2;
  bar (g);
}

as nothing hoists the clobbers we only put at the end of the function and
thus f and g appear to conflict (we only use clobbers to compute live,
for not address taken vars we could rely on mentions only).

I don't think we can reasonably fix all of the issue on branches and I
have my doubts for GCC 12.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (12 preceding siblings ...)
  2022-01-31 13:01 ` rguenth at gcc dot gnu.org
@ 2022-01-31 13:08 ` rguenth at gcc dot gnu.org
  2022-02-02 11:44 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-31 13:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (13 preceding siblings ...)
  2022-01-31 13:08 ` rguenth at gcc dot gnu.org
@ 2022-02-02 11:44 ` rguenth at gcc dot gnu.org
  2022-02-04 13:12 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-02 11:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |msebor at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
So I have a patch that adds explicit birth markers (using clobbers specially
marked).  That works well sofar but it conflicts with clobbers (not marked as
birth) that are added for clobbering at the start of variable lifetime like
C++ does at the beginning of CTORs.  I for example see

  inst ={v} {CLOBBER(birth)};
  inst ={v} {CLOBBER};  (**)
  inst.v = 42;
...
  inst ={v} {CLOBBER};

where (**) is inserted by the C++ frontend (with -flifetime-dse which is
the default).  Indeed my life analysis for the purpose of stack slot
sharing now only relies on the birth/death markers so it gets confused
by the extra clobber.

We now also have some use-after-free diagnostic that would likely trip
over this as it assumes that a CLOBBER ends lifetime of storage.

I guess disentangling both use-cases by also marking the
end-of-storage-lifetime
clobbers specially would solve both issues.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (14 preceding siblings ...)
  2022-02-02 11:44 ` rguenth at gcc dot gnu.org
@ 2022-02-04 13:12 ` rguenth at gcc dot gnu.org
  2022-05-27  9:46 ` [Bug rtl-optimization/103006] [10/11/12/13 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-04 13:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
There's an interesting case,

  a = BIRTH
loop:
  b = DEATH
  a = DEATH
  b = BIRTH
  goto loop;

where we end up having both a and b in the live-in set at the loop label
but a is removed before we see the BIRTH of b which is where we add
conflicts based on the current set of active vars.

In the case I'm running into this I have tail-recursion do

  a = BIRTH
  b = BIRTH
...
  a = DEATH
  b = DEATH

into

loop:
  a = BIRTH
  b = BIRTH
  goto loop;
  a = DEATH
  b = DEATH

leading to a similar issue.  The issue above can for example arise from
loop rotation.

In all cases live from backedges confuse the "optimization" done to only
record conflicts when we add a var to the live set (and it is not already set).

The previous code had

              /* If this is the first real instruction in this BB we need
                 to add conflicts for everything live at this point now.
                 Unlike classical liveness for named objects we can't
                 rely on seeing a def/use of the names we're interested in.
                 There might merely be indirect loads/stores.  We'd not add any
                 conflicts for such partitions.  */

and the easiest is to do the same here (we don't see the backedge "use"),
but we could possibly improve by noting which vars are only live from
a backedge here.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [10/11/12/13 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (15 preceding siblings ...)
  2022-02-04 13:12 ` rguenth at gcc dot gnu.org
@ 2022-05-27  9:46 ` rguenth at gcc dot gnu.org
  2022-06-28 10:46 ` jakub at gcc dot gnu.org
  2023-07-07 10:41 ` [Bug middle-end/103006] [11/12/13/14 " rguenth at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.5                         |10.4

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/103006] [10/11/12/13 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (16 preceding siblings ...)
  2022-05-27  9:46 ` [Bug rtl-optimization/103006] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:46 ` jakub at gcc dot gnu.org
  2023-07-07 10:41 ` [Bug middle-end/103006] [11/12/13/14 " rguenth at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.4                        |10.5

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug middle-end/103006] [11/12/13/14 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101
  2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
                   ` (17 preceding siblings ...)
  2022-06-28 10:46 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:41 ` rguenth at gcc dot gnu.org
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103006

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-07-07 10:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-30 17:45 [Bug tree-optimization/103006] New: wrong code at -O2 (only) on x86_64-linux-gnu zhendong.su at inf dot ethz.ch
2021-10-30 22:12 ` [Bug tree-optimization/103006] [9/10/11/12 Regression] " pinskia at gcc dot gnu.org
2021-10-30 22:15 ` [Bug middle-end/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 " pinskia at gcc dot gnu.org
2021-10-30 22:28 ` [Bug rtl-optimization/103006] " pinskia at gcc dot gnu.org
2021-11-01 11:13 ` [Bug rtl-optimization/103006] [9/10/11/12 Regression] wrong code at -O1 or -O2 on x86_64-linux-gnu by r7-7101 jakub at gcc dot gnu.org
2021-11-01 11:29 ` jakub at gcc dot gnu.org
2021-11-02  7:17 ` rguenth at gcc dot gnu.org
2021-11-02  7:57 ` jakub at gcc dot gnu.org
2021-11-02  8:10 ` rguenther at suse dot de
2021-11-02  8:20 ` jakub at gcc dot gnu.org
2021-11-02 13:55 ` rguenther at suse dot de
2021-11-05 13:39 ` rguenth at gcc dot gnu.org
2022-01-31 10:52 ` rguenth at gcc dot gnu.org
2022-01-31 13:01 ` rguenth at gcc dot gnu.org
2022-01-31 13:08 ` rguenth at gcc dot gnu.org
2022-02-02 11:44 ` rguenth at gcc dot gnu.org
2022-02-04 13:12 ` rguenth at gcc dot gnu.org
2022-05-27  9:46 ` [Bug rtl-optimization/103006] [10/11/12/13 " rguenth at gcc dot gnu.org
2022-06-28 10:46 ` jakub at gcc dot gnu.org
2023-07-07 10:41 ` [Bug middle-end/103006] [11/12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).