public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
@ 2021-10-22 23:09 clyon at gcc dot gnu.org
  2021-10-25  6:44 ` [Bug middle-end/102906] " rguenth at gcc dot gnu.org
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-10-22 23:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

            Bug ID: 102906
           Summary: [12 regression] gcc.target/arm/ivopts-4.c fails since
                    r12-4526
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

Since r12-4526 (g:d8edfadfc7a9795b65177a50ce44fd348858e844) I have noticed
regressions on some arm targets for gcc.target/arm/ivopts-4.c

For instance on arm-none-linux-gnueabihf --with-arch armv7-a+mp+sec+neon-fp16
FAIL: gcc.target/arm/ivopts-4.c object-size text <= 36

But this depends on the actual target & runtestflags, see:
https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/r12-4526-gd8edfadfc7a9795b65177a50ce44fd348858e844/report-build-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
@ 2021-10-25  6:44 ` rguenth at gcc dot gnu.org
  2021-10-30 17:51 ` aldyh at gcc dot gnu.org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-25  6:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aldyh at gcc dot gnu.org
   Target Milestone|---                         |12.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
  2021-10-25  6:44 ` [Bug middle-end/102906] " rguenth at gcc dot gnu.org
@ 2021-10-30 17:51 ` aldyh at gcc dot gnu.org
  2021-11-01 20:09 ` clyon at gcc dot gnu.org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-10-30 17:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #1 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Is this still an issue with the new jump threader?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
  2021-10-25  6:44 ` [Bug middle-end/102906] " rguenth at gcc dot gnu.org
  2021-10-30 17:51 ` aldyh at gcc dot gnu.org
@ 2021-11-01 20:09 ` clyon at gcc dot gnu.org
  2021-11-06 18:20 ` aldyh at redhat dot com
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-11-01 20:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #2 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Yes, I can still see failures with r12-4820

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-11-01 20:09 ` clyon at gcc dot gnu.org
@ 2021-11-06 18:20 ` aldyh at redhat dot com
  2021-11-07 10:36 ` clyon at gcc dot gnu.org
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at redhat dot com @ 2021-11-06 18:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Aldy Hernandez <aldyh at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aldyh at redhat dot com

--- Comment #3 from Aldy Hernandez <aldyh at redhat dot com> ---
I'm going to need more specific instructions here.

Looking at the link provided, there's a target of arm-none-linux-gnueabihf
--with-arch=default --with-cpu=cortex-a9 --with-cpu=vfp -march=armv5t... that
has a a red "REGRESSED" column.

I tried to simulate that with a cross --target=arm-none-linux-gnueabihf:

diff --git a/gcc/testsuite/gcc.target/arm/ivopts-4.c
b/gcc/testsuite/gcc.target/arm/ivopts-4.c
index 2e866c01823..cb81d010bb5 100644
--- a/gcc/testsuite/gcc.target/arm/ivopts-4.c
+++ b/gcc/testsuite/gcc.target/arm/ivopts-4.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-Os -fdump-tree-ivopts -save-temps" } */
+/* { dg-options "-Os -fdump-tree-ivopts -save-temps -S -mcpu=cortex-a9
-mfpu=vfp -march=armv5t" } */

 extern unsigned int foo (int*) __attribute__((pure));

$ make check-gcc RUNTESTFLAGS=arm.exp=ivopts-4.c
                === gcc Summary ===

# of expected passes            4

Can you provide specific flags to pass to cc1 on a cross to
arm-none-linux-gnueabihf in order to reproduce?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-11-06 18:20 ` aldyh at redhat dot com
@ 2021-11-07 10:36 ` clyon at gcc dot gnu.org
  2021-11-08  7:12 ` aldyh at gcc dot gnu.org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-11-07 10:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #4 from Christophe Lyon <clyon at gcc dot gnu.org> ---
If I'm not mistaken, if you click on "REGRESSED" for that specific line, you'll
see that only ssa-dom-thread-7.c actually regressed with the corresponding
flags.

For ivopts-4.c, if seems you need -mthumb and one of these values for -march:
armv7-a+mp+sec+neon-fp16
armv7ve+simd
armv7-a+mp+sec+vfpv3-d16-fp16
armv8-a+crc+simd+crypto
(possibly with -mfpu=auto)
or -march=armv8-a -mfpu=crypto-neon-fp-armv8

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-11-07 10:36 ` clyon at gcc dot gnu.org
@ 2021-11-08  7:12 ` aldyh at gcc dot gnu.org
  2021-11-08  7:52 ` clyon at gcc dot gnu.org
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-08  7:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #5 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #4)
> If I'm not mistaken, if you click on "REGRESSED" for that specific line,
> you'll see that only ssa-dom-thread-7.c actually regressed with the
> corresponding flags.
> 
> For ivopts-4.c, if seems you need -mthumb and one of these values for -march:
> armv7-a+mp+sec+neon-fp16
> armv7ve+simd
> armv7-a+mp+sec+vfpv3-d16-fp16
> armv8-a+crc+simd+crypto
> (possibly with -mfpu=auto)
> or -march=armv8-a -mfpu=crypto-neon-fp-armv8

Works for:
-Os -fdump-tree-ivopts -save-temps -S  -march=armv8-a
-mfpu=crypto-neon-fp-armv8
-Os -fdump-tree-ivopts -save-temps -S -mthumb -march=armv8-a
-mfpu=crypto-neon-fp-armv8
-Os -fdump-tree-ivopts -save-temps -S -mthumb -march=armv7ve+simd -mfpu=auto

Similarly for ssa-dom-thread-7.c.

Perhaps someone else can reproduce.

And just to be clear, it would be easier if someone could reproduce with a cc1
/ gcc command line, and not by pointing us to some table in a web page.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-11-08  7:12 ` aldyh at gcc dot gnu.org
@ 2021-11-08  7:52 ` clyon at gcc dot gnu.org
  2021-11-08  8:19 ` [Bug testsuite/102906] " aldyh at gcc dot gnu.org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-11-08  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #6 from Christophe Lyon <clyon at gcc dot gnu.org> ---
(In reply to Aldy Hernandez from comment #5)
> (In reply to Christophe Lyon from comment #4)
> > If I'm not mistaken, if you click on "REGRESSED" for that specific line,
> > you'll see that only ssa-dom-thread-7.c actually regressed with the
> > corresponding flags.
> > 
> > For ivopts-4.c, if seems you need -mthumb and one of these values for -march:
> > armv7-a+mp+sec+neon-fp16
> > armv7ve+simd
> > armv7-a+mp+sec+vfpv3-d16-fp16
> > armv8-a+crc+simd+crypto
> > (possibly with -mfpu=auto)
> > or -march=armv8-a -mfpu=crypto-neon-fp-armv8
> 
> Works for:
> -Os -fdump-tree-ivopts -save-temps -S  -march=armv8-a
> -mfpu=crypto-neon-fp-armv8
> -Os -fdump-tree-ivopts -save-temps -S -mthumb -march=armv8-a
> -mfpu=crypto-neon-fp-armv8
> -Os -fdump-tree-ivopts -save-temps -S -mthumb -march=armv7ve+simd -mfpu=auto
> 
> Similarly for ssa-dom-thread-7.c.
> 
> Perhaps someone else can reproduce.

It fails for me with:
cc1 -fpreprocessed ivopts-4.i -quiet -dumpbase ivopts-4.c -dumpbase-ext .c
-mthumb -mfloat-abi=hard -mfpu=neon -mtls-dialect=gnu
-mlibarch=armv7-a+mp+sec+neon-fp16 -march=armv7-a+mp+sec+neon-fp16 -Os -version
-fdiagnostics-color=never -fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fdiagnostics-urls=never
-fdiagnostics-path-format=separate-events -fdump-tree-ivopts -o ivopts-4.s


the main compilation line is:
arm-none-linux-gnueabihf-gcc -mthumb -march=armv7-a+mp+sec+neon-fp16
-fdiagnostics-plain-output -Os -fdump-tree-ivopts -save-temps -c -o ivopts-4.o
/home/christophe.lyon/src/GCC/sources/gcc-fsf-git/fixes/gcc/testsuite/gcc.target/arm/ivopts-4.c
then size ivopts-4.o:
   text    data     bss     dec     hex filename
     38       0       0      38      26 ivopts-4.o
where the testcase expects text <= 36

My GCC is configured with --with-float=hard --with-cpu=cortex-a9
--with-fpu=neon --with-mode=arm

>
> And just to be clear, it would be easier if someone could reproduce with a
> cc1 / gcc command line, and not by pointing us to some table in a web page.

Sorry, these automated validations builds are deleted after the reports are
generated,
so it's not really easier for me to manually reproduce than for anyone else.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug testsuite/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-11-08  7:52 ` clyon at gcc dot gnu.org
@ 2021-11-08  8:19 ` aldyh at gcc dot gnu.org
  2021-11-08  8:41 ` aldyh at gcc dot gnu.org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-08  8:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |testsuite

--- Comment #7 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
> then size ivopts-4.o:
>    text    data     bss     dec     hex filename
>      38       0       0      38      26 ivopts-4.o
> where the testcase expects text <= 36

Ohhhhh, this is an object size regression?  This test seems very fragile.  Jump
threading will alter code size, so any change in the threading rules will
likely have an effect on code size.  I suggest you add -fno-thread-jumps to the
test and adjust the object-size test accordingly.

This is a testsuite regression that must be fixed by the target maintainers,
not a middle-end or tree-optimization regression.  I've marked it as so.

> 
> My GCC is configured with --with-float=hard --with-cpu=cortex-a9
> --with-fpu=neon --with-mode=arm
> 
> >
> > And just to be clear, it would be easier if someone could reproduce with a
> > cc1 / gcc command line, and not by pointing us to some table in a web page.
> 
> Sorry, these automated validations builds are deleted after the reports are
> generated,
> so it's not really easier for me to manually reproduce than for anyone else.

That's why it's better to include actual command lines, and not web pages that
may change after the PR has been created.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug testsuite/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-11-08  8:19 ` [Bug testsuite/102906] " aldyh at gcc dot gnu.org
@ 2021-11-08  8:41 ` aldyh at gcc dot gnu.org
  2021-11-08 17:53 ` clyon at gcc dot gnu.org
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-08  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #8 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
This may be related to the discussion in PR102997, particularly comment 16.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug testsuite/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-11-08  8:41 ` aldyh at gcc dot gnu.org
@ 2021-11-08 17:53 ` clyon at gcc dot gnu.org
  2021-11-08 18:20 ` aldyh at gcc dot gnu.org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-11-08 17:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #9 from Christophe Lyon <clyon at gcc dot gnu.org> ---
(In reply to Aldy Hernandez from comment #7)
> > then size ivopts-4.o:
> >    text    data     bss     dec     hex filename
> >      38       0       0      38      26 ivopts-4.o
> > where the testcase expects text <= 36
> 
> Ohhhhh, this is an object size regression?  This test seems very fragile. 
> Jump threading will alter code size, so any change in the threading rules
> will likely have an effect on code size.  I suggest you add
> -fno-thread-jumps to the test and adjust the object-size test accordingly.

I tried that, it doesn't change the generated code.


> This is a testsuite regression that must be fixed by the target maintainers,
> not a middle-end or tree-optimization regression.  I've marked it as so.
> 

Before your changes, we generated (with -mthumb
-march=armv7-a+mp+sec+neon-fp16)
tr2:
        push    {r4, r5, r6, lr}
        cmp     r1, #0
        mov     r4, r0
        ble     .L4
        add     r5, r0, r1, lsl #2
        movs    r6, #0
.L3:
        mov     r0, r4
        adds    r4, r4, #4
        bl      foo
        cmp     r4, r5
        add     r6, r6, r0
        bne     .L3
.L1:
        mov     r0, r6
        pop     {r4, r5, r6, pc}
.L4:
        movs    r6, #0
        b       .L1

After the jump-threading change (r12-4526):

tr2:
        push    {r4, r5, r6, lr}
        cmp     r1, #0
        mov     r4, r0
        ble     .L5
        add     r5, r0, r1, lsl #2
        movs    r6, #0
.L3:
        cmp     r4, r5
        bne     .L4 
.L1:
        mov     r0, r6
        pop     {r4, r5, r6, pc}
.L4:
        mov     r0, r4
        adds    r4, r4, #4
        bl      foo
        add     r6, r6, r0
        b       .L3 
.L5:
        movs    r6, #0
        b       .L1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug testsuite/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-11-08 17:53 ` clyon at gcc dot gnu.org
@ 2021-11-08 18:20 ` aldyh at gcc dot gnu.org
  2021-11-09  8:15 ` [Bug tree-optimization/102906] " rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-08 18:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #10 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #9)
> (In reply to Aldy Hernandez from comment #7)
> > > then size ivopts-4.o:
> > >    text    data     bss     dec     hex filename
> > >      38       0       0      38      26 ivopts-4.o
> > > where the testcase expects text <= 36
> > 
> > Ohhhhh, this is an object size regression?  This test seems very fragile. 
> > Jump threading will alter code size, so any change in the threading rules
> > will likely have an effect on code size.  I suggest you add
> > -fno-thread-jumps to the test and adjust the object-size test accordingly.
> 
> I tried that, it doesn't change the generated code.

The difference from before the commit til now is that there was a threading
path that was valid but is now disallowed.  So adding -fno-thread-jumps won't
have any effect since it won't cause the disallowed threading path to reappear.
 What I'm saying is that the test should be calibrated to the new normal.

Of course, it could be that another pass should pick up the slack here, or that
the restriction is too strict.  Richi, do you have some insight here?

At least for -Os  -mthumb -mfloat-abi=hard -mfpu=neon -mtls-dialect=gnu
-mlibarch=armv7-a+mp+sec+neon-fp16 -march=armv7-a+mp+sec+neon-fp16, the
difference in the threader is that we used to thread 6->4->3 in DOM2, but we no
longer do so because doing so would rotate the loop:

a.c.127t.dom2:Path rotates loop:   Cancelling jump thread: (6, 4) incoming
edge;  (4, 3) normal; 

I still think testing for some magic code size is fragile at best.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-11-08 18:20 ` aldyh at gcc dot gnu.org
@ 2021-11-09  8:15 ` rguenth at gcc dot gnu.org
  2021-11-10  9:15 ` aldyh at gcc dot gnu.org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-09  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1
   Last reconfirmed|                            |2021-11-09
             Status|UNCONFIRMED                 |NEW
          Component|testsuite                   |tree-optimization
             Target|                            |arm
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Aldy Hernandez from comment #10)
> (In reply to Christophe Lyon from comment #9)
> > (In reply to Aldy Hernandez from comment #7)
> > > > then size ivopts-4.o:
> > > >    text    data     bss     dec     hex filename
> > > >      38       0       0      38      26 ivopts-4.o
> > > > where the testcase expects text <= 36
> > > 
> > > Ohhhhh, this is an object size regression?  This test seems very fragile. 
> > > Jump threading will alter code size, so any change in the threading rules
> > > will likely have an effect on code size.  I suggest you add
> > > -fno-thread-jumps to the test and adjust the object-size test accordingly.
> > 
> > I tried that, it doesn't change the generated code.
> 
> The difference from before the commit til now is that there was a threading
> path that was valid but is now disallowed.  So adding -fno-thread-jumps
> won't have any effect since it won't cause the disallowed threading path to
> reappear.  What I'm saying is that the test should be calibrated to the new
> normal.
> 
> Of course, it could be that another pass should pick up the slack here, or
> that the restriction is too strict.  Richi, do you have some insight here?
> 
> At least for -Os  -mthumb -mfloat-abi=hard -mfpu=neon -mtls-dialect=gnu
> -mlibarch=armv7-a+mp+sec+neon-fp16 -march=armv7-a+mp+sec+neon-fp16, the
> difference in the threader is that we used to thread 6->4->3 in DOM2, but we
> no longer do so because doing so would rotate the loop:
> 
> a.c.127t.dom2:Path rotates loop:   Cancelling jump thread: (6, 4) incoming
> edge;  (4, 3) normal; 
> 
> I still think testing for some magic code size is fragile at best.

So it seems we intentionally allowed some loop header copying done by
the threader based on the idea that it knows when it can do so without
a code size penalty.  The loop header copying pass has

static bool
should_duplicate_loop_header_p (basic_block header, class loop *loop,
                                int *limit)
{
  gimple_stmt_iterator bsi;

  gcc_assert (!header->aux);

  /* Loop header copying usually increases size of the code.  This used not to
     be true, since quite often it is possible to verify that the condition is
     satisfied in the first iteration and therefore to eliminate it.  Jump
     threading handles these cases now.  */
  if (optimize_loop_for_size_p (loop)
      && !loop->force_vectorize)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file,
                 "  Not duplicating bb %i: optimizing for size.\n",
                 header->index);
      return false;

and indeed for the testcase we have

Analyzing loop 1
Loop 1 is not do-while loop: latch is not empty.
  Not duplicating bb 4: optimizing for size.

and were probably expecting jump-threading to rotate the loop.  Now, the
forward threader is set up to eventually not destroy loop info but then
loop-header copying might be able to either use ranger to tell the
looping condition is always true [and the block to copy is empty] or
loop header copying could use the jump threading path code which might be
able to determine all this(?)

That said, for GCC 12 we probably need to look to allow those threadings,
but only once and for the forward threader?  The early threader doesn't
seem to discover this opportunity for some reason.

At the point we run jt_path_registry::cancel_invalid_paths do we have
any ideas about the size of the copying we do?

So the pattern we'd allow is a threading with entry being the single
entry edge into the loop and the exit being the other edge from the
single_exit source block of it.

I wonder why the late DOM threading doesn't catch this?  We'd allow
the threading there with

  if (cfun->curr_properties & PROP_loop_opts_done)
    return false;

the late pass sees

  <bb 2> [local count: 200189151]:
  if (n_8(D) > 0)
    goto <bb 3>; [59.00%]
  else
    goto <bb 6>; [41.00%]

  <bb 3> [local count: 118111600]:
  ivtmp.9_16 = (unsigned int) array_9(D);
  _17 = (unsigned int) n_8(D);
  _18 = _17 * 4;
  _20 = ivtmp.9_16 + _18;
  goto <bb 5>; [100.00%]

  <bb 5> [local count: 1073741824]:
  # sum_5 = PHI <0(3), sum_11(4)>
  # ivtmp.9_14 = PHI <ivtmp.9_16(3), ivtmp.9_15(4)>
  if (ivtmp.9_14 != _20)
    goto <bb 4>; [89.00%]
  else
    goto <bb 6>; [11.00%]

so appearantly with IVCANON introducing a canonical IV ranger cannot
relate ivtmp.9_16 and _20 using the n_8(D) > 0 condition (OK, that
looks mightly complex, but mostly because n_8(D) * 4 is involved which
we don't know it doesn't overflow to zero).

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 8aac733ac25..3670ec5b8d0 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2815,6 +2815,20 @@ jt_path_registry::cancel_invalid_paths
(vec<jump_thread_edge *> &path)
       && flow_loop_nested_p (exit->dest->loop_father, exit->src->loop_father))
     return false;

+  /* Allow loop header copying when we know the entry condition is
+     optimized away.  The loop header copying pass relies on those
+     being done here when optimizing for size.  */
+  edge e;
+  if (entry->dest == exit->src
+      && entry->dest->loop_father->header == entry->dest
+      /* Alternatively check for a single non-backedge pred of entry->dest. 
*/
+      && loops_state_satisfies_p (cfun, LOOPS_HAVE_PREHEADERS)
+      && (e = single_exit (entry->dest->loop_father))
+      && exit != e
+      && exit->src == e->src
+      && optimize_loop_for_size_p (entry->dest->loop_father))
+    return false;
+
   if (cfun->curr_properties & PROP_loop_opts_done)
     return false;


fixes the missing jump threading.  Possibly the predicate should be put next
to should_duplicate_loop_header_p in tree-ssa-loop-ch.c.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-11-09  8:15 ` [Bug tree-optimization/102906] " rguenth at gcc dot gnu.org
@ 2021-11-10  9:15 ` aldyh at gcc dot gnu.org
  2021-11-10  9:18 ` rguenther at suse dot de
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-10  9:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amacleod at redhat dot com

--- Comment #12 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #11)

First of all, thanks for your detailed analysis.  It's very helpful.

> So it seems we intentionally allowed some loop header copying done by
> the threader based on the idea that it knows when it can do so without
> a code size penalty.  The loop header copying pass has
> 
> static bool
> should_duplicate_loop_header_p (basic_block header, class loop *loop,
>                                 int *limit)
> {
>   gimple_stmt_iterator bsi;
> 
>   gcc_assert (!header->aux);
> 
>   /* Loop header copying usually increases size of the code.  This used not
> to
>      be true, since quite often it is possible to verify that the condition
> is
>      satisfied in the first iteration and therefore to eliminate it.  Jump
>      threading handles these cases now.  */
>   if (optimize_loop_for_size_p (loop)
>       && !loop->force_vectorize)
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file,
>                  "  Not duplicating bb %i: optimizing for size.\n",
>                  header->index);
>       return false;
> 
> and indeed for the testcase we have
> 
> Analyzing loop 1
> Loop 1 is not do-while loop: latch is not empty.
>   Not duplicating bb 4: optimizing for size.
> 
> and were probably expecting jump-threading to rotate the loop.  Now, the
> forward threader is set up to eventually not destroy loop info but then
> loop-header copying might be able to either use ranger to tell the
> looping condition is always true [and the block to copy is empty] or
> loop header copying could use the jump threading path code which might be
> able to determine all this(?)

Hmmm, are you asking if a path through 6->4 resolves the x_7 < n_8 conditional?

  <bb 6> [local count: 118111600]:
  goto <bb 4>; [100.00%]

  <bb 4> [local count: 1073741824]:
  # sum_5 = PHI <0(6), sum_11(3)>
  # x_7 = PHI <0(6), x_12(3)>
  if (x_7 < n_8(D))
    goto <bb 3>; [89.00%]
  else
    goto <bb 5>; [11.00%]

If so, that's trivially answered by the path solver.  We could put BB6 and BB4
in the path, and ask it what the final conditional resolves to.

> 
> That said, for GCC 12 we probably need to look to allow those threadings,
> but only once and for the forward threader?  The early threader doesn't
> seem to discover this opportunity for some reason.

It sees it, but fails to thread it because it would peel the loop.  From the
.ethread dump:

  FAIL: path through PHI in bb4 (incoming bb:2) crosses loop
  path: 2->4->xx REJECTED

> 
> At the point we run jt_path_registry::cancel_invalid_paths do we have
> any ideas about the size of the copying we do?

Not really, but we could adapt something.

The forward threader threads blindly with regards to size, but it does have
smarts to prune paths at -Os if there will be statement duplication (look for
optimize_function_for_size_p in *threadupdate.c).  The forward threader can get
away with not looking too hard, because it can't thread deep enough paths for
it to matter.

On the other hand, things can get out of control with the backward threaders
because it can find infinitely long paths.  It tracks statement costs with
estimate_num_insns.

As usual, we have a mess of different implementations.  It would be nice to
ultimately merge these two approaches.

> 
> So the pattern we'd allow is a threading with entry being the single
> entry edge into the loop and the exit being the other edge from the
> single_exit source block of it.
> 
> I wonder why the late DOM threading doesn't catch this?  We'd allow
> the threading there with
> 
>   if (cfun->curr_properties & PROP_loop_opts_done)
>     return false;
> 
> the late pass sees
> 
>   <bb 2> [local count: 200189151]:
>   if (n_8(D) > 0)
>     goto <bb 3>; [59.00%]
>   else
>     goto <bb 6>; [41.00%]
> 
>   <bb 3> [local count: 118111600]:
>   ivtmp.9_16 = (unsigned int) array_9(D);
>   _17 = (unsigned int) n_8(D);
>   _18 = _17 * 4;
>   _20 = ivtmp.9_16 + _18;
>   goto <bb 5>; [100.00%]
> 
>   <bb 5> [local count: 1073741824]:
>   # sum_5 = PHI <0(3), sum_11(4)>
>   # ivtmp.9_14 = PHI <ivtmp.9_16(3), ivtmp.9_15(4)>
>   if (ivtmp.9_14 != _20)
>     goto <bb 4>; [89.00%]
>   else
>     goto <bb 6>; [11.00%]
> 
> so appearantly with IVCANON introducing a canonical IV ranger cannot
> relate ivtmp.9_16 and _20 using the n_8(D) > 0 condition (OK, that
> looks mightly complex, but mostly because n_8(D) * 4 is involved which
> we don't know it doesn't overflow to zero).

Ranger is not involved at all.  DOM is the only threader that does not use
neither ranger nor the path solver.  The only difference in this release for
the DOM threader is the restrictions we put in place for loop threading.

> 
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index 8aac733ac25..3670ec5b8d0 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -2815,6 +2815,20 @@ jt_path_registry::cancel_invalid_paths
> (vec<jump_thread_edge *> &path)
>        && flow_loop_nested_p (exit->dest->loop_father,
> exit->src->loop_father))
>      return false;
>  
> +  /* Allow loop header copying when we know the entry condition is
> +     optimized away.  The loop header copying pass relies on those
> +     being done here when optimizing for size.  */
> +  edge e;
> +  if (entry->dest == exit->src
> +      && entry->dest->loop_father->header == entry->dest
> +      /* Alternatively check for a single non-backedge pred of entry->dest.
> */
> +      && loops_state_satisfies_p (cfun, LOOPS_HAVE_PREHEADERS)
> +      && (e = single_exit (entry->dest->loop_father))
> +      && exit != e
> +      && exit->src == e->src
> +      && optimize_loop_for_size_p (entry->dest->loop_father))
> +    return false;
> +
>    if (cfun->curr_properties & PROP_loop_opts_done)
>      return false;
>  
> 
> fixes the missing jump threading.  Possibly the predicate should be put next
> to should_duplicate_loop_header_p in tree-ssa-loop-ch.c.

So, we could fix this either by relaxing the restriction with your patch, or by
teaching should_duplicate_loop_header_p that an incoming edge can resolve the
conditional in the header?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2021-11-10  9:15 ` aldyh at gcc dot gnu.org
@ 2021-11-10  9:18 ` rguenther at suse dot de
  2021-11-10  9:53 ` aldyh at gcc dot gnu.org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenther at suse dot de @ 2021-11-10  9:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 10 Nov 2021, aldyh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906
> 
> Aldy Hernandez <aldyh at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |amacleod at redhat dot com
> 
> --- Comment #12 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #11)
> 
> First of all, thanks for your detailed analysis.  It's very helpful.
> 
> > So it seems we intentionally allowed some loop header copying done by
> > the threader based on the idea that it knows when it can do so without
> > a code size penalty.  The loop header copying pass has
> > 
> > static bool
> > should_duplicate_loop_header_p (basic_block header, class loop *loop,
> >                                 int *limit)
> > {
> >   gimple_stmt_iterator bsi;
> > 
> >   gcc_assert (!header->aux);
> > 
> >   /* Loop header copying usually increases size of the code.  This used not
> > to
> >      be true, since quite often it is possible to verify that the condition
> > is
> >      satisfied in the first iteration and therefore to eliminate it.  Jump
> >      threading handles these cases now.  */
> >   if (optimize_loop_for_size_p (loop)
> >       && !loop->force_vectorize)
> >     {
> >       if (dump_file && (dump_flags & TDF_DETAILS))
> >         fprintf (dump_file,
> >                  "  Not duplicating bb %i: optimizing for size.\n",
> >                  header->index);
> >       return false;
> > 
> > and indeed for the testcase we have
> > 
> > Analyzing loop 1
> > Loop 1 is not do-while loop: latch is not empty.
> >   Not duplicating bb 4: optimizing for size.
> > 
> > and were probably expecting jump-threading to rotate the loop.  Now, the
> > forward threader is set up to eventually not destroy loop info but then
> > loop-header copying might be able to either use ranger to tell the
> > looping condition is always true [and the block to copy is empty] or
> > loop header copying could use the jump threading path code which might be
> > able to determine all this(?)
> 
> Hmmm, are you asking if a path through 6->4 resolves the x_7 < n_8 conditional?
> 
>   <bb 6> [local count: 118111600]:
>   goto <bb 4>; [100.00%]
> 
>   <bb 4> [local count: 1073741824]:
>   # sum_5 = PHI <0(6), sum_11(3)>
>   # x_7 = PHI <0(6), x_12(3)>
>   if (x_7 < n_8(D))
>     goto <bb 3>; [89.00%]
>   else
>     goto <bb 5>; [11.00%]
> 
> If so, that's trivially answered by the path solver.  We could put BB6 and BB4
> in the path, and ask it what the final conditional resolves to.
> 
> > 
> > That said, for GCC 12 we probably need to look to allow those threadings,
> > but only once and for the forward threader?  The early threader doesn't
> > seem to discover this opportunity for some reason.
> 
> It sees it, but fails to thread it because it would peel the loop.  From the
> .ethread dump:
> 
>   FAIL: path through PHI in bb4 (incoming bb:2) crosses loop
>   path: 2->4->xx REJECTED
> 
> > 
> > At the point we run jt_path_registry::cancel_invalid_paths do we have
> > any ideas about the size of the copying we do?
> 
> Not really, but we could adapt something.
> 
> The forward threader threads blindly with regards to size, but it does have
> smarts to prune paths at -Os if there will be statement duplication (look for
> optimize_function_for_size_p in *threadupdate.c).  The forward threader can get
> away with not looking too hard, because it can't thread deep enough paths for
> it to matter.
> 
> On the other hand, things can get out of control with the backward threaders
> because it can find infinitely long paths.  It tracks statement costs with
> estimate_num_insns.
> 
> As usual, we have a mess of different implementations.  It would be nice to
> ultimately merge these two approaches.
> 
> > 
> > So the pattern we'd allow is a threading with entry being the single
> > entry edge into the loop and the exit being the other edge from the
> > single_exit source block of it.
> > 
> > I wonder why the late DOM threading doesn't catch this?  We'd allow
> > the threading there with
> > 
> >   if (cfun->curr_properties & PROP_loop_opts_done)
> >     return false;
> > 
> > the late pass sees
> > 
> >   <bb 2> [local count: 200189151]:
> >   if (n_8(D) > 0)
> >     goto <bb 3>; [59.00%]
> >   else
> >     goto <bb 6>; [41.00%]
> > 
> >   <bb 3> [local count: 118111600]:
> >   ivtmp.9_16 = (unsigned int) array_9(D);
> >   _17 = (unsigned int) n_8(D);
> >   _18 = _17 * 4;
> >   _20 = ivtmp.9_16 + _18;
> >   goto <bb 5>; [100.00%]
> > 
> >   <bb 5> [local count: 1073741824]:
> >   # sum_5 = PHI <0(3), sum_11(4)>
> >   # ivtmp.9_14 = PHI <ivtmp.9_16(3), ivtmp.9_15(4)>
> >   if (ivtmp.9_14 != _20)
> >     goto <bb 4>; [89.00%]
> >   else
> >     goto <bb 6>; [11.00%]
> > 
> > so appearantly with IVCANON introducing a canonical IV ranger cannot
> > relate ivtmp.9_16 and _20 using the n_8(D) > 0 condition (OK, that
> > looks mightly complex, but mostly because n_8(D) * 4 is involved which
> > we don't know it doesn't overflow to zero).
> 
> Ranger is not involved at all.  DOM is the only threader that does not use
> neither ranger nor the path solver.  The only difference in this release for
> the DOM threader is the restrictions we put in place for loop threading.
> 
> > 
> > diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> > index 8aac733ac25..3670ec5b8d0 100644
> > --- a/gcc/tree-ssa-threadupdate.c
> > +++ b/gcc/tree-ssa-threadupdate.c
> > @@ -2815,6 +2815,20 @@ jt_path_registry::cancel_invalid_paths
> > (vec<jump_thread_edge *> &path)
> >        && flow_loop_nested_p (exit->dest->loop_father,
> > exit->src->loop_father))
> >      return false;
> >  
> > +  /* Allow loop header copying when we know the entry condition is
> > +     optimized away.  The loop header copying pass relies on those
> > +     being done here when optimizing for size.  */
> > +  edge e;
> > +  if (entry->dest == exit->src
> > +      && entry->dest->loop_father->header == entry->dest
> > +      /* Alternatively check for a single non-backedge pred of entry->dest.
> > */
> > +      && loops_state_satisfies_p (cfun, LOOPS_HAVE_PREHEADERS)
> > +      && (e = single_exit (entry->dest->loop_father))
> > +      && exit != e
> > +      && exit->src == e->src
> > +      && optimize_loop_for_size_p (entry->dest->loop_father))
> > +    return false;
> > +
> >    if (cfun->curr_properties & PROP_loop_opts_done)
> >      return false;
> >  
> > 
> > fixes the missing jump threading.  Possibly the predicate should be put next
> > to should_duplicate_loop_header_p in tree-ssa-loop-ch.c.
> 
> So, we could fix this either by relaxing the restriction with your patch, or by
> teaching should_duplicate_loop_header_p that an incoming edge can resolve the
> conditional in the header?

Yes.  I think the latter would be cleaner but the former has an (ugly)
patch already (see above).  Not sure how difficult it is to do the latter,
you are likely 10x quicker than me to figure that out ;)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2021-11-10  9:18 ` rguenther at suse dot de
@ 2021-11-10  9:53 ` aldyh at gcc dot gnu.org
  2021-11-10  9:57 ` rguenther at suse dot de
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-10  9:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |aldyh at gcc dot gnu.org

--- Comment #14 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #13)
> On Wed, 10 Nov 2021, aldyh at gcc dot gnu.org wrote:

> > So, we could fix this either by relaxing the restriction with your patch, or by
> > teaching should_duplicate_loop_header_p that an incoming edge can resolve the
> > conditional in the header?
> 
> Yes.  I think the latter would be cleaner but the former has an (ugly)
> patch already (see above).  Not sure how difficult it is to do the latter,

The idea is actually straightforward.  Assuming I have the right edge, this
would get us the result:

@@ -60,6 +63,24 @@ should_duplicate_loop_header_p (basic_block header, class
loop *loop,
   if (optimize_loop_for_size_p (loop)
       && !loop->force_vectorize)
     {
+      if (gcond *last = safe_dyn_cast <gcond *> (last_stmt (header)))
+       {
+         gimple_ranger ranger;
+         int_range<2> r;
+         path_range_query path (ranger, /*resolve=*/true);
+         auto_vec<basic_block> bbs (2);
+         edge e = loop_preheader_edge (loop);
+
+         gcc_checking_assert (e->dest == header);
+         bbs.quick_push (header);
+         bbs.quick_push (e->src);
+         bitmap imports = ranger.gori ().imports (header);
+         path.compute_ranges (bbs, imports);
+         path.range_of_stmt (r, last);
+         r.dump ();
+         fputc ('\n', stderr);
+       }
+
       if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file,

$ ./cc1 a.c -quiet -I/tmp -Os
_Bool [1, 1]

which means the final conditional in the path evaluates to true.

I'll come up with an actual patch.  The gori imports stuff can be hidden away
in the path solver and we should have a ranger available for the entire pass,
not instantiate a new one every time.

> you are likely 10x quicker than me to figure that out ;)

Well, only because you did all the hard work here :).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2021-11-10  9:53 ` aldyh at gcc dot gnu.org
@ 2021-11-10  9:57 ` rguenther at suse dot de
  2021-11-10 10:07 ` aldyh at gcc dot gnu.org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenther at suse dot de @ 2021-11-10  9:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 10 Nov 2021, aldyh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906
> 
> Aldy Hernandez <aldyh at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Assignee|unassigned at gcc dot gnu.org      |aldyh at gcc dot gnu.org
> 
> --- Comment #14 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #13)
> > On Wed, 10 Nov 2021, aldyh at gcc dot gnu.org wrote:
> 
> > > So, we could fix this either by relaxing the restriction with your patch, or by
> > > teaching should_duplicate_loop_header_p that an incoming edge can resolve the
> > > conditional in the header?
> > 
> > Yes.  I think the latter would be cleaner but the former has an (ugly)
> > patch already (see above).  Not sure how difficult it is to do the latter,
> 
> The idea is actually straightforward.  Assuming I have the right edge, this
> would get us the result:
> 
> @@ -60,6 +63,24 @@ should_duplicate_loop_header_p (basic_block header, class
> loop *loop,
>    if (optimize_loop_for_size_p (loop)
>        && !loop->force_vectorize)
>      {
> +      if (gcond *last = safe_dyn_cast <gcond *> (last_stmt (header)))
> +       {
> +         gimple_ranger ranger;
> +         int_range<2> r;
> +         path_range_query path (ranger, /*resolve=*/true);
> +         auto_vec<basic_block> bbs (2);
> +         edge e = loop_preheader_edge (loop);
> +
> +         gcc_checking_assert (e->dest == header);
> +         bbs.quick_push (header);
> +         bbs.quick_push (e->src);
> +         bitmap imports = ranger.gori ().imports (header);
> +         path.compute_ranges (bbs, imports);
> +         path.range_of_stmt (r, last);
> +         r.dump ();
> +         fputc ('\n', stderr);

Nice.  Does composing the path from the exact two BBs mean that
it won't pick up a case like

  if (n > 0)
    if (k > 0)
      for (; n > 0;)
        ...

where the n > 0 outer condition is on the predecessor from
e->src?  Or is the path merely built to denote the fact
that we're interested on the entry edge of the loop only
(on the backedge the condition wouldn't be known)?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2021-11-10  9:57 ` rguenther at suse dot de
@ 2021-11-10 10:07 ` aldyh at gcc dot gnu.org
  2021-11-10 10:13 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-10 10:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #16 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #15)
> On Wed, 10 Nov 2021, aldyh at gcc dot gnu.org wrote:

> > @@ -60,6 +63,24 @@ should_duplicate_loop_header_p (basic_block header, class
> > loop *loop,
> >    if (optimize_loop_for_size_p (loop)
> >        && !loop->force_vectorize)
> >      {
> > +      if (gcond *last = safe_dyn_cast <gcond *> (last_stmt (header)))
> > +       {
> > +         gimple_ranger ranger;
> > +         int_range<2> r;
> > +         path_range_query path (ranger, /*resolve=*/true);
> > +         auto_vec<basic_block> bbs (2);
> > +         edge e = loop_preheader_edge (loop);
> > +
> > +         gcc_checking_assert (e->dest == header);
> > +         bbs.quick_push (header);
> > +         bbs.quick_push (e->src);
> > +         bitmap imports = ranger.gori ().imports (header);
> > +         path.compute_ranges (bbs, imports);
> > +         path.range_of_stmt (r, last);
> > +         r.dump ();
> > +         fputc ('\n', stderr);
> 
> Nice.  Does composing the path from the exact two BBs mean that
> it won't pick up a case like
> 
>   if (n > 0)
>     if (k > 0)
>       for (; n > 0;)
>         ...
> 
> where the n > 0 outer condition is on the predecessor from
> e->src?  Or is the path merely built to denote the fact
> that we're interested on the entry edge of the loop only
> (on the backedge the condition wouldn't be known)?

If the predecessor for e->src dominates it, it will also pick that up.  The
path merely denotes the blocks we care about for intra-block ranges /
relationals, etc.  With resolve=true (above), any range or relation not known
within the path we will just pick up the range on entry to the path by asking
ranger.

Does that answer your question?

For the record, I also agree that we should pull out these loop rotations,
peels, etc from the threaders into the loop optimizers, as they have a better
model to make decisions about loops.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2021-11-10 10:07 ` aldyh at gcc dot gnu.org
@ 2021-11-10 10:13 ` rguenther at suse dot de
  2021-11-10 10:20 ` aldyh at redhat dot com
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenther at suse dot de @ 2021-11-10 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #17 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 10 Nov 2021, aldyh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906
> 
> --- Comment #16 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #15)
> > On Wed, 10 Nov 2021, aldyh at gcc dot gnu.org wrote:
> 
> > > @@ -60,6 +63,24 @@ should_duplicate_loop_header_p (basic_block header, class
> > > loop *loop,
> > >    if (optimize_loop_for_size_p (loop)
> > >        && !loop->force_vectorize)
> > >      {
> > > +      if (gcond *last = safe_dyn_cast <gcond *> (last_stmt (header)))
> > > +       {
> > > +         gimple_ranger ranger;
> > > +         int_range<2> r;
> > > +         path_range_query path (ranger, /*resolve=*/true);
> > > +         auto_vec<basic_block> bbs (2);
> > > +         edge e = loop_preheader_edge (loop);
> > > +
> > > +         gcc_checking_assert (e->dest == header);
> > > +         bbs.quick_push (header);
> > > +         bbs.quick_push (e->src);
> > > +         bitmap imports = ranger.gori ().imports (header);
> > > +         path.compute_ranges (bbs, imports);
> > > +         path.range_of_stmt (r, last);
> > > +         r.dump ();
> > > +         fputc ('\n', stderr);
> > 
> > Nice.  Does composing the path from the exact two BBs mean that
> > it won't pick up a case like
> > 
> >   if (n > 0)
> >     if (k > 0)
> >       for (; n > 0;)
> >         ...
> > 
> > where the n > 0 outer condition is on the predecessor from
> > e->src?  Or is the path merely built to denote the fact
> > that we're interested on the entry edge of the loop only
> > (on the backedge the condition wouldn't be known)?
> 
> If the predecessor for e->src dominates it, it will also pick that up.  The
> path merely denotes the blocks we care about for intra-block ranges /
> relationals, etc.  With resolve=true (above), any range or relation not known
> within the path we will just pick up the range on entry to the path by asking
> ranger.
> 
> Does that answer your question?

Yes.  I guess it would be nice to have a CTOR or so for the case
where the path is really a single edge like in this case.

> For the record, I also agree that we should pull out these loop rotations,
> peels, etc from the threaders into the loop optimizers, as they have a better
> model to make decisions about loops.

Indeed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2021-11-10 10:13 ` rguenther at suse dot de
@ 2021-11-10 10:20 ` aldyh at redhat dot com
  2021-11-10 12:59 ` aldyh at gcc dot gnu.org
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at redhat dot com @ 2021-11-10 10:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #18 from Defunct account. Do not use. <aldyh at redhat dot com> ---
> Yes.  I guess it would be nice to have a CTOR or so for the case
> where the path is really a single edge like in this case.

Good idea.  Will do.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2021-11-10 10:20 ` aldyh at redhat dot com
@ 2021-11-10 12:59 ` aldyh at gcc dot gnu.org
  2021-11-10 22:13 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-10 12:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #19 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Created attachment 51757
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51757&action=edit
proposed patch in testing

Patch depends on some shuffling in the path solver to make way for non-threader
clients.  Both patches are currently testing.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2021-11-10 12:59 ` aldyh at gcc dot gnu.org
@ 2021-11-10 22:13 ` cvs-commit at gcc dot gnu.org
  2021-11-11  6:19 ` aldyh at gcc dot gnu.org
  2021-11-14 10:39 ` aldyh at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-10 22:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #20 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>:

https://gcc.gnu.org/g:e82c382971664d6fd138cc36020db4b1a91885c6

commit r12-5138-ge82c382971664d6fd138cc36020db4b1a91885c6
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Wed Nov 10 13:21:59 2021 +0100

    Allow loop header copying when first iteration condition is known.

    As discussed in the PR, the loop header copying pass avoids doing so
    when optimizing for size.  However, sometimes we can determine the
    loop entry conditional statically for the first iteration of the loop.

    This patch uses the path solver to determine the outgoing edge
    out of preheader->header->xx.  If so, it allows header copying.  Doing
    this in the loop optimizer saves us from doing gymnastics in the
    threader which doesn't have the context to determine if a loop
    transformation is profitable.

    I am only returning true in entry_loop_condition_is_static for
    a true conditional.  Technically a false conditional is also
    provably static, but allowing any boolean value causes a regression
    in gfortran.dg/vector_subscript_1.f90.

    I would have preferred not passing around the query object, but the
    layout of pass_ch and should_duplicate_loop_header_p make it a bit
    awkward to get it right without an outright refactor to the
    pass.

    Tested on x86-64 Linux.

    gcc/ChangeLog:

            PR tree-optimization/102906
            * tree-ssa-loop-ch.c (entry_loop_condition_is_static): New.
            (should_duplicate_loop_header_p): Call
entry_loop_condition_is_static.
            (class ch_base): Add m_ranger and m_query.
            (ch_base::copy_headers): Pass m_query to
            entry_loop_condition_is_static.
            (pass_ch::execute): Allocate and deallocate m_ranger and
            m_query.
            (pass_ch_vect::execute): Same.

    gcc/testsuite/ChangeLog:

            * gcc.dg/tree-ssa/pr102906.c: New test.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2021-11-10 22:13 ` cvs-commit at gcc dot gnu.org
@ 2021-11-11  6:19 ` aldyh at gcc dot gnu.org
  2021-11-14 10:39 ` aldyh at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-11  6:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

--- Comment #21 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Should be fixed. Can someone verify the object size on arm is as expected?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/102906] [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526
  2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2021-11-11  6:19 ` aldyh at gcc dot gnu.org
@ 2021-11-14 10:39 ` aldyh at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-14 10:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102906

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #22 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
fixed

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-11-14 10:39 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22 23:09 [Bug middle-end/102906] New: [12 regression] gcc.target/arm/ivopts-4.c fails since r12-4526 clyon at gcc dot gnu.org
2021-10-25  6:44 ` [Bug middle-end/102906] " rguenth at gcc dot gnu.org
2021-10-30 17:51 ` aldyh at gcc dot gnu.org
2021-11-01 20:09 ` clyon at gcc dot gnu.org
2021-11-06 18:20 ` aldyh at redhat dot com
2021-11-07 10:36 ` clyon at gcc dot gnu.org
2021-11-08  7:12 ` aldyh at gcc dot gnu.org
2021-11-08  7:52 ` clyon at gcc dot gnu.org
2021-11-08  8:19 ` [Bug testsuite/102906] " aldyh at gcc dot gnu.org
2021-11-08  8:41 ` aldyh at gcc dot gnu.org
2021-11-08 17:53 ` clyon at gcc dot gnu.org
2021-11-08 18:20 ` aldyh at gcc dot gnu.org
2021-11-09  8:15 ` [Bug tree-optimization/102906] " rguenth at gcc dot gnu.org
2021-11-10  9:15 ` aldyh at gcc dot gnu.org
2021-11-10  9:18 ` rguenther at suse dot de
2021-11-10  9:53 ` aldyh at gcc dot gnu.org
2021-11-10  9:57 ` rguenther at suse dot de
2021-11-10 10:07 ` aldyh at gcc dot gnu.org
2021-11-10 10:13 ` rguenther at suse dot de
2021-11-10 10:20 ` aldyh at redhat dot com
2021-11-10 12:59 ` aldyh at gcc dot gnu.org
2021-11-10 22:13 ` cvs-commit at gcc dot gnu.org
2021-11-11  6:19 ` aldyh at gcc dot gnu.org
2021-11-14 10:39 ` aldyh at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).