public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
@ 2020-04-07  7:30 ` luoxhu at gcc dot gnu.org
  2020-04-07 16:37 ` segher at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-04-07  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

luoxhu at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |luoxhu at gcc dot gnu.org

--- Comment #3 from luoxhu at gcc dot gnu.org ---
"addi 8,4,-1" and "subf 9,8,5" could not be hoisted out as there are dependency
to "lbzu 9,1(8)". r8 need be initialized to p2-1 in each iteration of outer
loop. Only the result of subf 9,8,5 is loop invariant (p2+s-1)-(p2-1).

But the latest GCC code could be optimized as A, B, C is loop invariant.

foo:
.LFB0:
        .cfi_startproc
        cmpwi 7,5,0
        li 6,0
        rldicl 5,5,0,32
        li 7,0
        .p2align 4,,15
.L2:
        ble 7,.L7
        addi 8,5,-1       // A
        addi 10,4,-1
        rldicl 8,8,0,32   // B
        mr 9,3
        addi 8,8,1        // C
        mtctr 8
        .p2align 5
.L4:
        lbzu 8,1(10)
        cmpw 0,8,7
        bne 0,.L3
        stw 6,0(9)
.L3:
        addi 9,9,4
        bdnz .L4
.L7:
        addi 6,6,88
        addi 7,7,1
        cmpwi 0,6,8888
        extsw 7,7
        extsw 6,6
        bne 0,.L2
        blr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
  2020-04-07  7:30 ` [Bug target/61837] missed loop invariant expression optimization luoxhu at gcc dot gnu.org
@ 2020-04-07 16:37 ` segher at gcc dot gnu.org
  2020-04-13 10:31 ` luoxhu at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: segher at gcc dot gnu.org @ 2020-04-07 16:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #4 from Segher Boessenkool <segher at gcc dot gnu.org> ---
If the  ble 7,.L7  is taken once, it will be taken all of the time, since
cr7 isn't assigned to any more -- and then the whole loop does nothing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
  2020-04-07  7:30 ` [Bug target/61837] missed loop invariant expression optimization luoxhu at gcc dot gnu.org
  2020-04-07 16:37 ` segher at gcc dot gnu.org
@ 2020-04-13 10:31 ` luoxhu at gcc dot gnu.org
  2020-04-14  6:42 ` segher at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-04-13 10:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #5 from luoxhu at gcc dot gnu.org ---
"-O2 -funswitch-loops" could generate expected code for s<=0, unswitch-loops is
enabled by -O3, so this issue is reduced to duplicate of PR67288?

foo:
.LFB0:
        .cfi_startproc
        cmpwi 0,5,0
        blelr 0
        rldicl 5,5,0,32
        addi 4,4,-1
        li 6,0
        li 7,0
        .p2align 4,,15
.L2:
        rldicl 8,5,0,32
        mr 10,4
        mtctr 8
        mr 9,3
        .p2align 5
.L5:
        lbzu 8,1(10)
        cmpw 0,8,7
        bne 0,.L4
        stw 6,0(9)
.L4:
        addi 9,9,4
        bdnz .L5
        addi 6,6,88
        addi 7,7,1
        cmpwi 0,6,8888
        extsw 7,7
        extsw 6,6
        bne 0,.L2
        blr
        .long 0
        .byte 0,0,0,0,0,0,0,0
        .cfi_endproc
.LFE0:

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-04-13 10:31 ` luoxhu at gcc dot gnu.org
@ 2020-04-14  6:42 ` segher at gcc dot gnu.org
  2020-04-14  7:01 ` luoxhu at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: segher at gcc dot gnu.org @ 2020-04-14  6:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
But -funswitch-loops is much stronger than we want here, and the wrong
thing to use at -O2 (it often generates *slower* code!)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-04-14  6:42 ` segher at gcc dot gnu.org
@ 2020-04-14  7:01 ` luoxhu at gcc dot gnu.org
  2020-04-14  7:26 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-04-14  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #7 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #6)
> But -funswitch-loops is much stronger than we want here, and the wrong
> thing to use at -O2 (it often generates *slower* code!)

Not sure your meaning here, -funswitch-loops is to generate "blelr 0" as you
pointed out in (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837#c4), not to
optimize 
"-1, zero_ext, +1", which is to move loop invariant out, and if "-1, zero_ext,
+1" could be simplified to "zero_ext" for non zero, this is actually a special
case of PR67288.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2020-04-14  7:01 ` luoxhu at gcc dot gnu.org
@ 2020-04-14  7:26 ` segher at gcc dot gnu.org
  2020-04-14  7:52 ` luoxhu at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: segher at gcc dot gnu.org @ 2020-04-14  7:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
-funswitch-loops changes things like

  for (...) {
    if (...)
      ...1;
    else
      ...2;
  }

into

  if (...) {
    for (...)
      ...1;
  } else {
    for (...)
      ...2;
  }

which often is not a good idea.  This is why this is not done at -O2:
-O2 is only for optimisations that almost never hurt performance.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2020-04-14  7:26 ` segher at gcc dot gnu.org
@ 2020-04-14  7:52 ` luoxhu at gcc dot gnu.org
  2020-04-14 10:37 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-04-14  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #9 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #8)
> -funswitch-loops changes things like
> 
>   for (...) {
>     if (...)
>       ...1;
>     else
>       ...2;
>   }
> 
> into
> 
>   if (...) {
>     for (...)
>       ...1;
>   } else {
>     for (...)
>       ...2;
>   }
> 
> which often is not a good idea.  This is why this is not done at -O2:
> -O2 is only for optimisations that almost never hurt performance.

Yes, for this case it performs better with unswitch-loops, and I see many usage
of -O2 with unswith-loops in testsuite.  I thought you were meaning do this at
O2 without -funswitch-loops...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2020-04-14  7:52 ` luoxhu at gcc dot gnu.org
@ 2020-04-14 10:37 ` rguenth at gcc dot gnu.org
  2020-05-15  3:27 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-14 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note the unswitching pass has special code to hoist guards of inner loops
steming from loop header copying.  That could possibly be enabled at -O2
since it doesn't come with a size penalty due to loop copying (the code size
issue is the reason we don't unswitch at -O2 - yes, creating non-perfect
nests might be another good reason).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2020-04-14 10:37 ` rguenth at gcc dot gnu.org
@ 2020-05-15  3:27 ` cvs-commit at gcc dot gnu.org
  2021-07-29  0:43 ` cvs-commit at gcc dot gnu.org
  2021-08-12  2:17 ` guojiufu at gcc dot gnu.org
  10 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-15  3:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Xiong Hu Luo <luoxhu@gcc.gnu.org>:

https://gcc.gnu.org/g:8a15faa730f99100f6f3ed12663563356ec5a2c0

commit r11-407-g8a15faa730f99100f6f3ed12663563356ec5a2c0
Author: Xionghu Luo <luoxhu@linux.ibm.com>
Date:   Thu May 14 21:03:24 2020 -0500

    Fold (add -1; zero_ext; add +1) operations to zero_ext when not
overflow(PR37451, PR61837)

    This "subtract/extend/add" existed for a long time and still annoying us
    (PR37451, part of PR61837) when converting from 32bits to 64bits, as the
ctr
    register is used as 64bits on powerpc64, Andraw Pinski had a patch but
    caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

    Andraw:
    http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
    http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
    Joseph:
    https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

    We still can do the simplification from "subtract/zero_ext/add" to
"zero_ext"
    when loop iterations is known to be LT than MODE_MAX (only do simplify
    when counter+0x1 NOT overflow).

    Bootstrap and regression tested pass on Power8-LE.

    gcc/ChangeLog

            2020-05-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>

            PR rtl-optimization/37451, part of PR target/61837
            * loop-doloop.c (doloop_simplify_count): New function.  Simplify
            (add -1; zero_ext; add +1) to zero_ext when not wrapping.
            (doloop_modify): Call doloop_simplify_count.

    gcc/testsuite/ChangeLog

            2020-05-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>

            PR rtl-optimization/37451, part of PR target/61837
            * gcc.target/powerpc/doloop-2.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2020-05-15  3:27 ` cvs-commit at gcc dot gnu.org
@ 2021-07-29  0:43 ` cvs-commit at gcc dot gnu.org
  2021-08-12  2:17 ` guojiufu at gcc dot gnu.org
  10 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-29  0:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jiu Fu Guo <guojiufu@gcc.gnu.org>:

https://gcc.gnu.org/g:aafa38b5bfed5e3eff258aa5354ed928f4986709

commit r12-2585-gaafa38b5bfed5e3eff258aa5354ed928f4986709
Author: Jiufu Guo <guojiufu@linux.ibm.com>
Date:   Thu Jul 15 17:21:00 2021 +0800

    Use preferred mode for doloop IV [PR61837]

    Currently, doloop.xx variable is using the type as niter which may be
    shorter than word size.  For some targets, it would be better to use
    word size type.  For example, on 64bit system, to access 32bit value,
    subreg maybe used.  Then using 64bit type maybe better for niter if
    it can be present in both 32bit and 64bit.

    This patch add target hook to query preferred mode for doloop IV,
    and update mode accordingly.

    gcc/ChangeLog:

    2021-07-29  Jiufu Guo  <guojiufu@linux.ibm.com>

            PR target/61837
            * config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
            (rs6000_preferred_doloop_mode): New hook.
            * doc/tm.texi: Regenerate.
            * doc/tm.texi.in: Add hook preferred_doloop_mode.
            * target.def (preferred_doloop_mode): New hook.
            * targhooks.c (default_preferred_doloop_mode): New hook.
            * targhooks.h (default_preferred_doloop_mode): New hook.
            * tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New
function.
            (add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
            and compute_doloop_base_on_mode.

    gcc/testsuite/ChangeLog:

    2021-07-29  Jiufu Guo  <guojiufu@linux.ibm.com>

            PR target/61837
            * gcc.target/powerpc/pr61837.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/61837] missed loop invariant expression optimization
       [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2021-07-29  0:43 ` cvs-commit at gcc dot gnu.org
@ 2021-08-12  2:17 ` guojiufu at gcc dot gnu.org
  10 siblings, 0 replies; 11+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-08-12  2:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
The code looks like below with trunk and options  -O2 -mcpu=power8 -S
-fno-unroll-loops

.L2:
        ble %cr7,.L7
        mtctr %r5
        addi %r10,%r4,-1
        mr %r9,%r3
        .p2align 5
.L4:
        lbzu %r8,1(%r10)
        cmpw %cr0,%r8,%r7
        bne %cr0,.L3
        stw %r6,0(%r9)
.L3:
        addi %r9,%r9,4
        bdnz .L4
.L7:
        addi %r6,%r6,88
        addi %r7,%r7,1
        cmpwi %cr0,%r6,8888
        bne %cr0,.L2
        blr

Just mark this PR as resolved.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-12  2:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-61837-4@http.gcc.gnu.org/bugzilla/>
2020-04-07  7:30 ` [Bug target/61837] missed loop invariant expression optimization luoxhu at gcc dot gnu.org
2020-04-07 16:37 ` segher at gcc dot gnu.org
2020-04-13 10:31 ` luoxhu at gcc dot gnu.org
2020-04-14  6:42 ` segher at gcc dot gnu.org
2020-04-14  7:01 ` luoxhu at gcc dot gnu.org
2020-04-14  7:26 ` segher at gcc dot gnu.org
2020-04-14  7:52 ` luoxhu at gcc dot gnu.org
2020-04-14 10:37 ` rguenth at gcc dot gnu.org
2020-05-15  3:27 ` cvs-commit at gcc dot gnu.org
2021-07-29  0:43 ` cvs-commit at gcc dot gnu.org
2021-08-12  2:17 ` guojiufu at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).