public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
@ 2021-09-30  9:27 theodoridisgr at gmail dot com
  2021-09-30 10:05 ` [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885 marxin at gcc dot gnu.org
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: theodoridisgr at gmail dot com @ 2021-09-30  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

            Bug ID: 102540
           Summary: Dead Code Elimination Regression at -O3 (trunk vs
                    11.2.0)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: theodoridisgr at gmail dot com
  Target Milestone: ---

cat test.c
static long a;
static unsigned b;
void foo(void);
int main() {
    long c, e;
    c = b = a;
    e = c ? 2 / (c + 1) : 0;
    if (e && !b)
        foo();
    a = 0;
}

11.2.0 at -O3 can eliminate the call to foo but trunk at -O3 cannot:

gcc-11 -v
Target: x86_64-pc-linux-gnu
Configured with: ../configure --disable-multilib --disable-bootstrap
--enable-languages=c,c++
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC)

gcc-11 test.c -S -O3
cat test.s
        .file   "test.c"
        .text
        .section        .text.startup,"ax",@progbits
        .p2align 4
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        movq    $0, a(%rip)
        xorl    %eax, %eax
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .local  a
        .comm   a,8,8
        .ident  "GCC: (GNU) 11.2.0"
        .section        .note.GNU-stack,"",@progbits


gcc-trunk -v
Target: x86_64-pc-linux-gnu
Configured with: ../configure --disable-multilib --disable-bootstrap
--enable-languages=c,c++ 
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20210930 (experimental) (GCC)

gcc-trunk test.c -S -O3
cat test.s
        .file   "test.c"
        .text
        .section        .text.startup,"ax",@progbits
        .p2align 4
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        movq    a(%rip), %rcx
        movq    %rcx, %rsi
        andl    $4294967295, %esi
        je      .L13
        movl    $2, %eax
        addq    $1, %rsi
        cqto
        idivq   %rsi
        testb   $1, %al
        je      .L13
        testl   %ecx, %ecx
        je      .L17
.L13:
        movq    $0, a(%rip)
        xorl    %eax, %eax
        ret
.L17:
        pushq   %rax
        .cfi_def_cfa_offset 16
        call    foo
        xorl    %edx, %edx
        xorl    %eax, %eax
        movq    %rdx, a(%rip)
        popq    %rcx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .local  a
        .comm   a,8,8
        .ident  "GCC: (GNU) 12.0.0 20210930 (experimental)"
        .section        .note.GNU-stack,"",@progbits

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
@ 2021-09-30 10:05 ` marxin at gcc dot gnu.org
  2021-09-30 12:50 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-09-30 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |marxin at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
   Target Milestone|---                         |12.0
   Last reconfirmed|                            |2021-09-30
            Summary|Dead Code Elimination       |[12 Regression] Dead Code
                   |Regression at -O3 (trunk vs |Elimination Regression at
                   |11.2.0)                     |-O3 since
                   |                            |r12-476-gd846f225c25c5885
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Started with r12-476-gd846f225c25c5885.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
  2021-09-30 10:05 ` [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885 marxin at gcc dot gnu.org
@ 2021-09-30 12:50 ` rguenth at gcc dot gnu.org
  2021-09-30 17:39 ` aldyh at gcc dot gnu.org
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-30 12:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aldyh at gcc dot gnu.org,
                   |                            |amacleod at redhat dot com
             Blocks|                            |85316
            Version|unknown                     |12.0

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
FRE1 has the following difference, simplifying the (unsigned int) truncation.

   <bb 2> :
   a.0_1 = a;
   _2 = (unsigned int) a.0_1;
   b = _2;
-  c_10 = (long int) _2;
+  _6 = a.0_1 & 4294967295;
+  c_10 = _6;
   if (c_10 != 0)
     goto <bb 3>; [INV]
   else

where the EVRP which now uses ranger retains (diff from GCC 11 to trunk):

   <bb 2> :
   a.0_1 = a;
   _2 = (unsigned int) a.0_1;
   b = _2;
-  c_10 = (long int) _2;
+  _6 = a.0_1 & 4294967295;
+  c_10 = _6;
   if (c_10 != 0)
     goto <bb 3>; [INV]
   else
-    goto <bb 4>; [INV]
+    goto <bb 6>; [INV]

   <bb 3> :
   _4 = c_10 + 1;
   iftmp.2_12 = 2 / _4;
+  if (iftmp.2_12 != 0)
+    goto <bb 4>; [INV]
+  else
+    goto <bb 6>; [INV]

   <bb 4> :
+  if (_2 == 0)
+    goto <bb 5>; [INV]
+  else
+    goto <bb 6>; [INV]
+
+  <bb 5> :
+  foo ();
+
+  <bb 6> :
   a = 0;
   return 0;


after EVRP we have

+  # RANGE [0, 4294967295] NONZERO 4294967295
   c_10 = _6;
...
+  # RANGE [2, 4294967296] NONZERO 8589934591
   _4 = c_10 + 1;
+  # RANGE [0, 1] NONZERO 1
   iftmp.2_12 = 2 / _4;
   if (iftmp.2_12 != 0)

what we did in GCC 11 is simplified the following check

  <bb 4> :
  if (_2 == 0)
    goto <bb 5>; [INV]
  else
    goto <bb 6>; [INV]

based on iftmp.2_12 == [1, 1] via ranger and

evrp visiting BB4
Visiting controlling predicate if (iftmp.2_12 != 0)
Adding assert for iftmp.2_12 from iftmp.2_12 != 0
Intersecting
  long int ~[0, 0]  EQUIVALENCES: { iftmp.2_12 } (1 elements)
and
  long int [0, 1]
to
  long int [1, 1]  EQUIVALENCES: { iftmp.2_12 } (1 elements)
Intersecting
  long int [0, 1]
and
  long int [1, 1]
to
  long int [1, 1]
pushing new range for iftmp.2_12: long int [1, 1]  EQUIVALENCES: { iftmp.2_12 }
(1 elements)
evrp visiting stmt if (_2 == 0)
Folding statement: if (_2 == 0)

Visiting conditional with predicate: if (_2 == 0)

With known ranges
        _2: unsigned int [1, +INF]  EQUIVALENCES: { _2 } (1 elements)

Predicate evaluates to: 0
Folded into: if (0 != 0)

that's now missing, somehow due to the folded IL if the bisect is correct.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
[Bug 85316] [meta-bug] VRP range propagation missed cases

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
  2021-09-30 10:05 ` [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885 marxin at gcc dot gnu.org
  2021-09-30 12:50 ` rguenth at gcc dot gnu.org
@ 2021-09-30 17:39 ` aldyh at gcc dot gnu.org
  2021-10-01 21:02 ` amacleod at redhat dot com
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-09-30 17:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #3 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
To elide the foo(), _2 must be non-zero on the 2->3 edge dominating the call.

Interestingly, a.0_1 is non-zero on the 2->3 edge, and we have:

_2 = (unsigned int) a.0_1

but somehow we have no knowledge of _2.

Andrew, wanna take a stab at this?

=========== BB 2 ============
Imports: a.0_1  
Exports: a.0_1  _6  c_10  
         _2 : a.0_1(I)  
         _6 : a.0_1(I)  
         c_10 : a.0_1(I)  _6  
Equivalence set : []
Equivalence set : [_6, c_10]
    <bb 2> :
    a.0_1 = a;
    _2 = (unsigned int) a.0_1;
    b = _2;
    _6 = a.0_1 & 4294967295;
    c_10 = _6;
    if (c_10 != 0)
      goto <bb 3>; [INV]
    else
      goto <bb 6>; [INV]

_6 : long int [0, 4294967295]
c_10 : long int [0, 4294967295]
2->3  (T) a.0_1 :       long int [-INF, -1][1, +INF]
2->3  (T) _6 :  long int [1, 4294967295]
2->3  (T) c_10 :        long int [1, 4294967295]
2->6  (F) a.0_1 :       long int [-INF, -4294967296][0, +INF]
2->6  (F) _6 :  long int [0, 0]
2->6  (F) c_10 :        long int [0, 0]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-30 17:39 ` aldyh at gcc dot gnu.org
@ 2021-10-01 21:02 ` amacleod at redhat dot com
  2021-10-04  6:36 ` rguenther at suse dot de
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: amacleod at redhat dot com @ 2021-10-01 21:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #4 from Andrew Macleod <amacleod at redhat dot com> ---


(In reply to Richard Biener from comment #2)
> FRE1 has the following difference, simplifying the (unsigned int) truncation.
> 
>    <bb 2> :
>    a.0_1 = a;
>    _2 = (unsigned int) a.0_1;
>    b = _2;
> -  c_10 = (long int) _2;
> +  _6 = a.0_1 & 4294967295;
> +  c_10 = _6;
>    if (c_10 != 0)
>      goto <bb 3>; [INV]
>    else
> 

Why does FRE make this transformation/simplification?  It removes a
relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0
is because the sequence is now:

    a.0_1 = a;
    _2 = (unsigned int) a.0_1;
    b = _2;
    _6 = a.0_1 & 4294967295;
    c_10 = _6;
    if (c_10 != 0)
      goto <bb 3>; [INV]

We do not find _2 is non-zero on the outgoing edge because _2 is not related to
the calculation in the condition.  (ie c_10 no longer has a dependency on _2)

We do recalculate _2 based on the outgoing range of a.0_1, but with it being a
64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is
non-zero.. we dont track any of the upper bits... 
 2->3  (T) a.0_1 :       long int [-INF, -1][1, +INF]
And when we recalculate _2 using that value, we still get varying because
0xFFFF0000 in not zero, but can still produce a zero in _2.

The problem is that the condition c_10 != 0 no longer related to the value of
_2 in the IL... so ranger never sees it. and we cant represent the 2^16
subranges that end in [1,0xFFFF].

Before that transformation, 
  _2 = (unsigned int) a.0_1;
   b = _2;
  c_10 = (long int) _2;
The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no
problem.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (3 preceding siblings ...)
  2021-10-01 21:02 ` amacleod at redhat dot com
@ 2021-10-04  6:36 ` rguenther at suse dot de
  2021-10-04 17:15 ` amacleod at redhat dot com
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenther at suse dot de @ 2021-10-04  6:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 1 Oct 2021, amacleod at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540
> 
> --- Comment #4 from Andrew Macleod <amacleod at redhat dot com> ---
> 
> 
> (In reply to Richard Biener from comment #2)
> > FRE1 has the following difference, simplifying the (unsigned int) truncation.
> > 
> >    <bb 2> :
> >    a.0_1 = a;
> >    _2 = (unsigned int) a.0_1;
> >    b = _2;
> > -  c_10 = (long int) _2;
> > +  _6 = a.0_1 & 4294967295;
> > +  c_10 = _6;
> >    if (c_10 != 0)
> >      goto <bb 3>; [INV]
> >    else
> > 
> 
> Why does FRE make this transformation/simplification?

It's a match.pd transform that transforms a zero-extend from a smaller
precision via two NOP_EXPRs to a single BIT_AND_EXPR which is better and
more canonical on GIMPLE.

>  It removes a
> relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0
> is because the sequence is now:
> 
>     a.0_1 = a;
>     _2 = (unsigned int) a.0_1;
>     b = _2;
>     _6 = a.0_1 & 4294967295;
>     c_10 = _6;
>     if (c_10 != 0)
>       goto <bb 3>; [INV]
> 
> We do not find _2 is non-zero on the outgoing edge because _2 is not related to
> the calculation in the condition.  (ie c_10 no longer has a dependency on _2)
> 
> We do recalculate _2 based on the outgoing range of a.0_1, but with it being a
> 64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is
> non-zero.. we dont track any of the upper bits... 
>  2->3  (T) a.0_1 :       long int [-INF, -1][1, +INF]
> And when we recalculate _2 using that value, we still get varying because
> 0xFFFF0000 in not zero, but can still produce a zero in _2.
> 
> The problem is that the condition c_10 != 0 no longer related to the value of
> _2 in the IL... so ranger never sees it. and we cant represent the 2^16
> subranges that end in [1,0xFFFF].
> 
> Before that transformation, 
>   _2 = (unsigned int) a.0_1;
>    b = _2;
>   c_10 = (long int) _2;
> The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no
> problem.

I see - too bad.  Note the transform made the dependence chain of _6
one instruction shorter without increasing the number of instructions
so it's a profitable transform.

Btw, the relation is still there but only indirectly via a.0_1.  The
old (E)VRP had this find_asserts(?) that produced assertions based
on the definitions - sth that now range-ops does(?), so it would
eventually have built assertions for a.0_1 for both conditions and
allow relations based on that?  I can't seem to find my way around
the VRP code now - pieces moved all over the place and so my mind
fails me on the searching task :/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (4 preceding siblings ...)
  2021-10-04  6:36 ` rguenther at suse dot de
@ 2021-10-04 17:15 ` amacleod at redhat dot com
  2021-10-05  6:19 ` rguenther at suse dot de
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: amacleod at redhat dot com @ 2021-10-04 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #6 from Andrew Macleod <amacleod at redhat dot com> ---

> 
> >  It removes a
> > relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0
> > is because the sequence is now:
> > 
> >     a.0_1 = a;
> >     _2 = (unsigned int) a.0_1;
> >     b = _2;
> >     _6 = a.0_1 & 4294967295;
> >     c_10 = _6;
> >     if (c_10 != 0)
> >       goto <bb 3>; [INV]
> > 
> > We do not find _2 is non-zero on the outgoing edge because _2 is not related to
> > the calculation in the condition.  (ie c_10 no longer has a dependency on _2)
> > 
> > We do recalculate _2 based on the outgoing range of a.0_1, but with it being a
> > 64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is
> > non-zero.. we dont track any of the upper bits... 
> >  2->3  (T) a.0_1 :       long int [-INF, -1][1, +INF]
> > And when we recalculate _2 using that value, we still get varying because
> > 0xFFFF0000 in not zero, but can still produce a zero in _2.
> > 
> > The problem is that the condition c_10 != 0 no longer related to the value of
> > _2 in the IL... so ranger never sees it. and we cant represent the 2^16
> > subranges that end in [1,0xFFFF].
> > 
> > Before that transformation, 
> >   _2 = (unsigned int) a.0_1;
> >    b = _2;
> >   c_10 = (long int) _2;
> > The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no
> > problem.
> 
> I see - too bad.  Note the transform made the dependence chain of _6
> one instruction shorter without increasing the number of instructions
> so it's a profitable transform.
> 
> Btw, the relation is still there but only indirectly via a.0_1.  The
> old (E)VRP had this find_asserts(?) that produced assertions based
> on the definitions - sth that now range-ops does(?), so it would
> eventually have built assertions for a.0_1 for both conditions and
> allow relations based on that?  I can't seem to find my way around
> the VRP code now - pieces moved all over the place and so my mind
> fails me on the searching task :/

We do know that a.0_1 is non-zero on that edge:
2->3  (T) a.0_1 :       long int [-INF, -1][1, +INF]

the problem is that we can't currently represent that the bitmask operation
causes all patterns ending in 0x00000000 to not occur.. we just leave it at
~[0,0].  which isn't sufficient for this use case. 

we don't currently track any equivalences between values of different
precision.. (even though ranger once did).   Handling it as a general
equivalence was fraught with issues. 

We might be able to add a new equivalence class "slice" or something.. I had
considered it but hadn't seen a great need case.   This would make _6 a 32 bit
slice of a.0_1 with range [1, 0xffffffff].
Then when we are querying for the cast
  _2 = (unsigned int) a.0_1;
we could also query the 32 bit equivalence slices of a.0_1, find _6, and get
the outgoing range of [1,0xffffffff].. and apply that value.

It would probably resolve an entire class of things where we don't recognize an
equivalence between a cast and a bitmask of equivalent precision.

This would also mean the reverse would apply.. ie if we instead branched on _2
!= 0 we would also understand that _6 will be non-zero.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (5 preceding siblings ...)
  2021-10-04 17:15 ` amacleod at redhat dot com
@ 2021-10-05  6:19 ` rguenther at suse dot de
  2021-10-05 13:59 ` amacleod at redhat dot com
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenther at suse dot de @ 2021-10-05  6:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 4 Oct 2021, amacleod at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540
> 
> --- Comment #6 from Andrew Macleod <amacleod at redhat dot com> ---
> 
> > 
> > >  It removes a
> > > relationship between c_10 and _2. The reason ranger no longer can fold _2 == 0
> > > is because the sequence is now:
> > > 
> > >     a.0_1 = a;
> > >     _2 = (unsigned int) a.0_1;
> > >     b = _2;
> > >     _6 = a.0_1 & 4294967295;
> > >     c_10 = _6;
> > >     if (c_10 != 0)
> > >       goto <bb 3>; [INV]
> > > 
> > > We do not find _2 is non-zero on the outgoing edge because _2 is not related to
> > > the calculation in the condition.  (ie c_10 no longer has a dependency on _2)
> > > 
> > > We do recalculate _2 based on the outgoing range of a.0_1, but with it being a
> > > 64 bit value and _2 being 32 bits, we only know the outgoing range of a.0_1 is
> > > non-zero.. we dont track any of the upper bits... 
> > >  2->3  (T) a.0_1 :       long int [-INF, -1][1, +INF]
> > > And when we recalculate _2 using that value, we still get varying because
> > > 0xFFFF0000 in not zero, but can still produce a zero in _2.
> > > 
> > > The problem is that the condition c_10 != 0 no longer related to the value of
> > > _2 in the IL... so ranger never sees it. and we cant represent the 2^16
> > > subranges that end in [1,0xFFFF].
> > > 
> > > Before that transformation, 
> > >   _2 = (unsigned int) a.0_1;
> > >    b = _2;
> > >   c_10 = (long int) _2;
> > > The relationship is obvious, and ranger would relate the c_10 != 0 to _2 no
> > > problem.
> > 
> > I see - too bad.  Note the transform made the dependence chain of _6
> > one instruction shorter without increasing the number of instructions
> > so it's a profitable transform.
> > 
> > Btw, the relation is still there but only indirectly via a.0_1.  The
> > old (E)VRP had this find_asserts(?) that produced assertions based
> > on the definitions - sth that now range-ops does(?), so it would
> > eventually have built assertions for a.0_1 for both conditions and
> > allow relations based on that?  I can't seem to find my way around
> > the VRP code now - pieces moved all over the place and so my mind
> > fails me on the searching task :/
> 
> We do know that a.0_1 is non-zero on that edge:
> 2->3  (T) a.0_1 :       long int [-INF, -1][1, +INF]
> 
> the problem is that we can't currently represent that the bitmask operation
> causes all patterns ending in 0x00000000 to not occur.. we just leave it at
> ~[0,0].  which isn't sufficient for this use case. 

Hmm, but we do have nonzero bits on SSA where we also store global
ranges, so there is a way to store the info and you could intersect
the sliced range produced from it with the range you got?

> we don't currently track any equivalences between values of different
> precision.. (even though ranger once did).   Handling it as a general
> equivalence was fraught with issues. 
> 
> We might be able to add a new equivalence class "slice" or something.. I had
> considered it but hadn't seen a great need case.   This would make _6 a 32 bit
> slice of a.0_1 with range [1, 0xffffffff].
> Then when we are querying for the cast
>   _2 = (unsigned int) a.0_1;
> we could also query the 32 bit equivalence slices of a.0_1, find _6, and get
> the outgoing range of [1,0xffffffff].. and apply that value.
> 
> It would probably resolve an entire class of things where we don't recognize an
> equivalence between a cast and a bitmask of equivalent precision.
> 
> This would also mean the reverse would apply.. ie if we instead branched on _2
> != 0 we would also understand that _6 will be non-zero.

I believe tracking known zero/one bits in addition to a range is more
useful - would that help in this case?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (6 preceding siblings ...)
  2021-10-05  6:19 ` rguenther at suse dot de
@ 2021-10-05 13:59 ` amacleod at redhat dot com
  2022-01-20  9:44 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: amacleod at redhat dot com @ 2021-10-05 13:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #8 from Andrew Macleod <amacleod at redhat dot com> ---

> > 
> > It would probably resolve an entire class of things where we don't recognize an
> > equivalence between a cast and a bitmask of equivalent precision.
> > 
> > This would also mean the reverse would apply.. ie if we instead branched on _2
> > != 0 we would also understand that _6 will be non-zero.
> 
> I believe tracking known zero/one bits in addition to a range is more
> useful - would that help in this case?

Thats in plan and what I first thought would fix it.  Reflecting on this
problem however, it would only help on the zero side where all the bits are
known zero, but the non-zero property we are looking for can't be reflected by
known ones or zeros.. 

Unfortunately I don't see how we can solve this particular problem by tracking
known bit values.. there wont be any known 1s or 0s in a.0_1...  just a
particular bit pattern that cannot occur.

Another option would be if I can get a cheap enough low-opt pass of evrp (also
on my list) we could run it really early before things get rearranged and then
run the higher levels later.

Anyway, I'll keep thinking about this...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (7 preceding siblings ...)
  2021-10-05 13:59 ` amacleod at redhat dot com
@ 2022-01-20  9:44 ` rguenth at gcc dot gnu.org
  2022-01-20 14:42 ` amacleod at redhat dot com
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-20  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the GCC 12 IL would require tracking equivalences on parts of
registers,
in this case that _2 is equal to the low part of a.0_1.  That is, one would
need to extend what CCP does with bit propagation to include equivalences.

There's also the if ((long)unsigned-var != 0) -> if (unsigned-var != 0)
transform that could save us here but we have that guarded with
single_use () and the converted value is used after the branching
but not the source so we'd have increased register pressure.

I do not see a good way to fix this issue, slightly altering the testcase to
do the undesired transform on the source level

static long a;
static unsigned b;
void foo(void);
int main() {
    long c, e;
    b = a;
    c = (unsigned)a;
    e = c ? 2 / (c + 1) : 0;
    if (e && !b)
        foo();
    a = 0;
}

causes the issue to appear also with GCC 11.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (8 preceding siblings ...)
  2022-01-20  9:44 ` rguenth at gcc dot gnu.org
@ 2022-01-20 14:42 ` amacleod at redhat dot com
  2022-05-06  8:31 ` [Bug tree-optimization/102540] [12/13 " jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: amacleod at redhat dot com @ 2022-01-20 14:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #10 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Richard Biener from comment #9)
> I think the GCC 12 IL would require tracking equivalences on parts of
> registers,
> in this case that _2 is equal to the low part of a.0_1.  That is, one would
> need to extend what CCP does with bit propagation to include equivalences.
> 

I have in my workqueue some thoughts for GCC13 where we can track partial
equivalences so we can see through different casts and masks. 

    a.0_1 = a;
    _2 = (unsigned int) a.0_1;
    b = _2;
    _6 = a.0_1 & 4294967295;
    c_10 = _6;
    if (c_10 != 0)
      goto <bb 3>; [INV]

_2 would be a 32 bit equivalence with a.0_1 generated by op_cast, and
_6 would also be a 32 bit equivalence with a.0_1 generated  op_bitwise_and.
They would all end up in the same partial equivalence set.

and when a range for anything in that set falls into the bitrange of the
equivalence size,  like the [0,0] , then it can be applied (with some
restrictions), to other members of the set.   Its an extension of some thought
I had for trying to relates different size casts to each other, just extended
to include masking.


There is a series of other PRs which would be fixed by this sort of ability. it
has become prevalent enough to warrant the work I think. (Also 79191 91881
102705 102872)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12/13 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (9 preceding siblings ...)
  2022-01-20 14:42 ` amacleod at redhat dot com
@ 2022-05-06  8:31 ` jakub at gcc dot gnu.org
  2022-07-26 13:26 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |12.2

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12/13 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (10 preceding siblings ...)
  2022-05-06  8:31 ` [Bug tree-optimization/102540] [12/13 " jakub at gcc dot gnu.org
@ 2022-07-26 13:26 ` rguenth at gcc dot gnu.org
  2022-10-13 15:29 ` cvs-commit at gcc dot gnu.org
  2022-10-13 17:03 ` amacleod at redhat dot com
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-26 13:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12/13 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (11 preceding siblings ...)
  2022-07-26 13:26 ` rguenth at gcc dot gnu.org
@ 2022-10-13 15:29 ` cvs-commit at gcc dot gnu.org
  2022-10-13 17:03 ` amacleod at redhat dot com
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-10-13 15:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:6cc3394507a2303a18891d34222c53f679256c37

commit r13-3281-g6cc3394507a2303a18891d34222c53f679256c37
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Wed Oct 5 10:42:07 2022 -0400

    propagate partial equivs in the cache.

    Adjust on-entry cache propagation to look for and propagate both full
    and partial equivalences.

            gcc/
            PR tree-optimization/102540
            PR tree-optimization/102872
            * gimple-range-cache.cc (ranger_cache::fill_block_cache):
            Handle partial equivs.
            (ranger_cache::range_from_dom): Cleanup dump output.

            gcc/testsuite/
            * gcc.dg/pr102540.c: New.
            * gcc.dg/pr102872.c: New.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/102540] [12/13 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885
  2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
                   ` (12 preceding siblings ...)
  2022-10-13 15:29 ` cvs-commit at gcc dot gnu.org
@ 2022-10-13 17:03 ` amacleod at redhat dot com
  13 siblings, 0 replies; 15+ messages in thread
From: amacleod at redhat dot com @ 2022-10-13 17:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102540

Andrew Macleod <amacleod at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #14 from Andrew Macleod <amacleod at redhat dot com> ---
fixed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-13 17:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30  9:27 [Bug tree-optimization/102540] New: Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodoridisgr at gmail dot com
2021-09-30 10:05 ` [Bug tree-optimization/102540] [12 Regression] Dead Code Elimination Regression at -O3 since r12-476-gd846f225c25c5885 marxin at gcc dot gnu.org
2021-09-30 12:50 ` rguenth at gcc dot gnu.org
2021-09-30 17:39 ` aldyh at gcc dot gnu.org
2021-10-01 21:02 ` amacleod at redhat dot com
2021-10-04  6:36 ` rguenther at suse dot de
2021-10-04 17:15 ` amacleod at redhat dot com
2021-10-05  6:19 ` rguenther at suse dot de
2021-10-05 13:59 ` amacleod at redhat dot com
2022-01-20  9:44 ` rguenth at gcc dot gnu.org
2022-01-20 14:42 ` amacleod at redhat dot com
2022-05-06  8:31 ` [Bug tree-optimization/102540] [12/13 " jakub at gcc dot gnu.org
2022-07-26 13:26 ` rguenth at gcc dot gnu.org
2022-10-13 15:29 ` cvs-commit at gcc dot gnu.org
2022-10-13 17:03 ` amacleod at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).