[Bug c/102062] New: powerpc suboptimal unrolling simple array sum

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/102062] New: powerpc suboptimal unrolling simple array sum
@ 2021-08-25 11:27 npiggin at gmail dot com
  2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-25 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

            Bug ID: 102062
           Summary: powerpc suboptimal unrolling simple array sum
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: npiggin at gmail dot com
  Target Milestone: ---
            Target: powerpc64le-linux-gnu

--- test.c ---
int test(int *arr, int sz)
{
        int ret = 0;
        int i;

        if (sz < 1)
                __builtin_unreachable();

        for (i = 0; i < sz*2; i++)
                ret += arr[i];

        return ret;
}
---

gcc-11 compiles this to:
test:
        rldic 4,4,1,32
        addi 10,3,-4
        rldicl 9,4,63,33
        li 3,0
        mtctr 9
.L2:
        addi 8,10,4
        lwz 9,4(10)
        addi 10,10,8
        lwz 8,4(8)
        add 9,9,3
        add 9,9,8
        extsw 3,9
        bdnz .L2
        blr

I may be unaware of a constraint of C standard here, but maintaining the two
base addresses seems pointless, so is beginning the first at offset -4.

The bigger problem is keeping a single sum. Keeping two sums and adding them at
the end reduces critical latency of the loop from 6 to 2, which brings
throughput on large loops from 6 cycles per iteration down to about 2.2 on
POWER9 without harming short loops:

test:
        rldic 4,4,1,32
        rldicl 9,4,63,33
        mtctr 9
        li 8,0
        li 9,0
.L2:
        lwz 6,0(3)
        lwz 7,4(3)
        addi 3,3,8
        add  8,8,6
        add  9,9,7
        bdnz .L2
        add 9,9,8
        extsw 3,9
        blr

Any reason this can't be done?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
@ 2021-08-25 11:52 ` wschmidt at gcc dot gnu.org
  2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

Bill Schmidt <wschmidt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wschmidt at gcc dot gnu.org

--- Comment #1 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Regarding the latter question, I'm surprised it's not being done.  This
behavior is controlled by -fvariable-expansion-in-unroller, which was enabled
by default for PowerPC targets a couple of releases back.  You reported this
against GCC 11.2, but I'm skeptical.  What options are you using?

Compiling with -O2 and current trunk, I see variable expansion kicking in, and
I also see the same base register in use in all references in the loop:

test:
.LFB0:
        .cfi_startproc
        .localentry     test,1
        slwi 4,4,1
        li 10,0
        li 7,0
        addi 9,3,-4
        extsw 4,4
        andi. 6,4,0x3
        addi 5,4,-1
        mr 8,4
        beq 0,.L9
        cmpdi 0,6,1
        beq 0,.L13
        cmpdi 0,6,2
        bne 0,.L22
.L14:
        lwzu 6,4(9)
        addi 4,4,-1
        add 10,10,6
.L13:
        lwzu 6,4(9)
        cmpdi 0,4,1
        add 10,10,6
        beq 0,.L19
.L9:
        srdi 8,8,2
        mtctr 8
.L2:
        lwz 4,4(9)
        lwz 5,12(9)
        lwz 6,8(9)
        lwzu 8,16(9)
        add 10,4,10
        add 10,10,5
        add 7,6,7
        add 7,7,8
        bdnz .L2
.L19:
        add 3,10,7
        extsw 3,3
        blr
        .p2align 4,,15
.L22:
        lwz 10,0(3)
        mr 9,3
        mr 4,5
        b .L14

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
  2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
@ 2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
  2021-08-25 12:43 ` npiggin at gmail dot com
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 11:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #2 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
As expected, I get similar code when compiling either for P9 or P10.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
  2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
  2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
@ 2021-08-25 12:43 ` npiggin at gmail dot com
  2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-25 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #3 from Nicholas Piggin <npiggin at gmail dot com> ---
(In reply to Bill Schmidt from comment #2)
> As expected, I get similar code when compiling either for P9 or P10.

Oh I should have specified, -O2 is the only option. If I add
-fvariable-expansion-in-unroller it has no effect, just to make sure.

It's gcc from Debian (gcc version 11.2.0 (Debian 11.2.0-3)). Maybe they've done
something to change this.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (2 preceding siblings ...)
  2021-08-25 12:43 ` npiggin at gmail dot com
@ 2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
  2021-08-25 13:01 ` npiggin at gmail dot com
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 12:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

Bill Schmidt <wschmidt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-08-25

--- Comment #4 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Interesting.  I dug up an older GCC 11 compiler off of one of my systems and it
behaves as you report.  No idea why this should differ between 11 and 12, as
the user of variable expansion as default goes back at least to 11 (I'm
thinking as early as 9).

Confirmed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (3 preceding siblings ...)
  2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
@ 2021-08-25 13:01 ` npiggin at gmail dot com
  2021-08-25 14:05 ` segher at gcc dot gnu.org
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-25 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #5 from Nicholas Piggin <npiggin at gmail dot com> ---
(In reply to Bill Schmidt from comment #2)
> As expected, I get similar code when compiling either for P9 or P10.

Oh I should have specified, -O2 is the only option. If I add
-fvariable-expansion-in-unroller it has no effect, just to make sure.

It's gcc from Debian (gcc version 11.2.0 (Debian 11.2.0-3)). Maybe they've done
something to change this.(In reply to Bill Schmidt from comment #1)
> Regarding the latter question, I'm surprised it's not being done.  This
> behavior is controlled by -fvariable-expansion-in-unroller, which was
> enabled by default for PowerPC targets a couple of releases back.  You
> reported this against GCC 11.2, but I'm skeptical.  What options are you
> using?
> 
> Compiling with -O2 and current trunk, I see variable expansion kicking in,
> and I also see the same base register in use in all references in the loop:
> 
> test:
> .LFB0:
>         .cfi_startproc
>         .localentry     test,1
>         slwi 4,4,1
>         li 10,0
>         li 7,0
>         addi 9,3,-4
>         extsw 4,4
>         andi. 6,4,0x3
>         addi 5,4,-1
>         mr 8,4
>         beq 0,.L9
>         cmpdi 0,6,1
>         beq 0,.L13
>         cmpdi 0,6,2
>         bne 0,.L22
> .L14:
>         lwzu 6,4(9)
>         addi 4,4,-1
>         add 10,10,6
> .L13:
>         lwzu 6,4(9)
>         cmpdi 0,4,1
>         add 10,10,6
>         beq 0,.L19
> .L9:
>         srdi 8,8,2
>         mtctr 8
> .L2:
>         lwz 4,4(9)
>         lwz 5,12(9)
>         lwz 6,8(9)
>         lwzu 8,16(9)
>         add 10,4,10
>         add 10,10,5
>         add 7,6,7
>         add 7,7,8
>         bdnz .L2
> .L19:
>         add 3,10,7
>         extsw 3,3
>         blr
>         .p2align 4,,15
> .L22:
>         lwz 10,0(3)
>         mr 9,3
>         mr 4,5
>         b .L14

That asm does well on the test, better than my version (a little bit on P9, a
lot on P10). It does have 2x more unrolling which probably helps a bit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (4 preceding siblings ...)
  2021-08-25 13:01 ` npiggin at gmail dot com
@ 2021-08-25 14:05 ` segher at gcc dot gnu.org
  2021-08-25 14:10 ` segher at gcc dot gnu.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 14:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Nicholas Piggin from comment #0)
> I may be unaware of a constraint of C standard here, but maintaining the two
> base addresses seems pointless,

This is an ordering problem.  The unroller works (late) in RTL, it cannot
do a good job of induction variable opts.

> so is beginning the first at offset -4.

This is actually useful: it makes it possible to use update-form insns much
more often.  It isn't optimised away later when it turns out not to help --
see ordering again.

> The bigger problem is keeping a single sum.

That is the variable expansion thing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (5 preceding siblings ...)
  2021-08-25 14:05 ` segher at gcc dot gnu.org
@ 2021-08-25 14:10 ` segher at gcc dot gnu.org
  2021-08-25 15:31 ` linkw at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 14:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Btw, -ftree-loop-vectorize -fvect-cost-model=cheap makes this 8 vectors per
iteration (and very-cheap doesn't vectorise it).  Maybe overkill, esp. when
you look at the tail code, but that 8 vector core sure looks tight :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (6 preceding siblings ...)
  2021-08-25 14:10 ` segher at gcc dot gnu.org
@ 2021-08-25 15:31 ` linkw at gcc dot gnu.org
  2021-08-25 17:07 ` segher at gcc dot gnu.org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-08-25 15:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |linkw at gcc dot gnu.org

--- Comment #8 from Kewen Lin <linkw at gcc dot gnu.org> ---
Haochen's patch r12-1202 helps to make the variable expansion work for this
case.

With GCC11, we get the RTL like:

  11: NOTE_INSN_BASIC_BLOCK 3
   12: r118:DI=r118:DI+0x4
   13: r128:SI=[r118:DI]
   14: r127:SI=r128:SI+r123:DI#0        // A
      REG_DEAD r128:SI
      REG_DEAD r123:DI
   15: r123:DI=sign_extend(r127:SI)     // B
      REG_DEAD r127:SI
   16: r129:SI=r119:DI#0-0x1
      REG_DEAD r119:DI
   17: r119:DI=zero_extend(r129:SI)
      REG_DEAD r129:SI
   19: r130:CC=cmp(r119:DI,0)
   20: pc={(r130:CC!=0)?L35:pc}

While with trunk, we get the RTL like:

   10: NOTE_INSN_BASIC_BLOCK 3
   11: r118:DI=r118:DI+0x4
   12: r127:SI=[r118:DI]
   13: r120:SI=r120:SI+r127:SI           // C
      REG_DEAD r127:SI
   14: r119:SI=r119:SI-0x1
   16: r128:CC=cmp(r119:SI,0)
   17: pc={(r128:CC!=0)?L33:pc}

We have A+B for the accumulation with GCC11 while just C with trunk. The C
pattern matches the check in function analyze_insn_to_expand_var, which is able
to record var_to_expand further.

The related code in analyze_insn_to_expand_var is:

  /* Find the accumulator use within the operation.  */
  if (code == FMA)
    {
      /* We only support accumulation via FMA in the ADD position.  */
      if (!rtx_equal_p  (dest, XEXP (src, 2)))
        return NULL;
      accum_pos = 2;
    }
  else if (rtx_equal_p (dest, XEXP (src, 0)))
    accum_pos = 0;
  else if (rtx_equal_p (dest, XEXP (src, 1)))
    {
      /* The method of expansion that we are using; which includes the
         initialization of the expansions with zero and the summation of
         the expansions at the end of the computation will yield wrong
         results for (x = something - x) thus avoid using it in that case.  */
      if (code == MINUS)
        return NULL;
      accum_pos = 1;
    }
  else
    return NULL;

The key is if dest can match XEXP (src, 0) or XEXP (src, 1).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (7 preceding siblings ...)
  2021-08-25 15:31 ` linkw at gcc dot gnu.org
@ 2021-08-25 17:07 ` segher at gcc dot gnu.org
  2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 17:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Thanks for the detective work!

So the variable expansion code could be improved to handle sign extensions
better (or maybe zero extensions as well?)  In either case that won't help
rs6000 much anymore (because the usual case, 32-bit -> 64-bit) doesn't affect
us anymore :-)

Is there anything left to do here, or can we close the PR?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (8 preceding siblings ...)
  2021-08-25 17:07 ` segher at gcc dot gnu.org
@ 2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
  2021-08-25 18:03 ` dje at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 18:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #10 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Well, the problem is that we still generate suboptimal code on GCC 11.  I don't
know whether we want to address that or not.

I suppose we aren't going to backport Haochen's lovely patch for sign
extensions, and it's probably not worth messing around with variable expansion
just for GCC 11...

Nick, can you use the -ftree-vectorize -fvect-cost-model=cheap to deal with
this (or even use -O3, which is recommended and probably does the same thing)? 
Or are you holding out for some fix in GCC 11.x?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (9 preceding siblings ...)
  2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
@ 2021-08-25 18:03 ` dje at gcc dot gnu.org
  2021-08-25 22:43 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dje at gcc dot gnu.org @ 2021-08-25 18:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #11 from David Edelsohn <dje at gcc dot gnu.org> ---
We could backport Haochen's patch to AT.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (10 preceding siblings ...)
  2021-08-25 18:03 ` dje at gcc dot gnu.org
@ 2021-08-25 22:43 ` segher at gcc dot gnu.org
  2021-08-25 23:29 ` [Bug rtl-optimization/102062] " dje at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 22:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
We can backport Hao Chen's patch, it has proven to cause no problems at all.
We don't normally backport patches that aren't bugfixes, but we could do it
for important enough things (we did it for most p11 enablement patches for
example).

David, do you agree?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (11 preceding siblings ...)
  2021-08-25 22:43 ` segher at gcc dot gnu.org
@ 2021-08-25 23:29 ` dje at gcc dot gnu.org
  2021-08-26  0:17 ` npiggin at gmail dot com
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dje at gcc dot gnu.org @ 2021-08-25 23:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #13 from David Edelsohn <dje at gcc dot gnu.org> ---
I don't object to backporting Hao Chen's patch.  It has baked sufficiently on
trunk that it seems relatively stable.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (12 preceding siblings ...)
  2021-08-25 23:29 ` [Bug rtl-optimization/102062] " dje at gcc dot gnu.org
@ 2021-08-26  0:17 ` npiggin at gmail dot com
  2021-08-30 17:34 ` segher at gcc dot gnu.org
  2021-09-22 13:53 ` npiggin at gmail dot com
  15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-26  0:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #14 from Nicholas Piggin <npiggin at gmail dot com> ---
(In reply to Bill Schmidt from comment #10)
> Well, the problem is that we still generate suboptimal code on GCC 11.  I
> don't know whether we want to address that or not.
> 
> I suppose we aren't going to backport Haochen's lovely patch for sign
> extensions, and it's probably not worth messing around with variable
> expansion just for GCC 11...
> 
> Nick, can you use the -ftree-vectorize -fvect-cost-model=cheap to deal with
> this (or even use -O3, which is recommended and probably does the same
> thing)?  Or are you holding out for some fix in GCC 11.x?

For my case no but it's only a very small issue in the scheme of things, I only
found it by looking at code generation. So you can close immediately if it's
fixed upstream, or whatever else you think best. Thanks all.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (13 preceding siblings ...)
  2021-08-26  0:17 ` npiggin at gmail dot com
@ 2021-08-30 17:34 ` segher at gcc dot gnu.org
  2021-09-22 13:53 ` npiggin at gmail dot com
  15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-30 17:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

--- Comment #15 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This should be fixed now, please confirm.  Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
  2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
                   ` (14 preceding siblings ...)
  2021-08-30 17:34 ` segher at gcc dot gnu.org
@ 2021-09-22 13:53 ` npiggin at gmail dot com
  15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-09-22 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062

Nicholas Piggin <npiggin at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #16 from Nicholas Piggin <npiggin at gmail dot com> ---
gcc-11 -O2 -mcpu=power10 now generates identical code that Bill listed for
trunk:

.L2:
        lwz 4,4(9)
        lwz 5,12(9)
        lwz 6,8(9)
        lwzu 8,16(9)
        add 10,4,10
        add 10,10,5
        add 7,6,7
        add 7,7,8
        bdnz .L2

Thanks all.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-09-22 13:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
2021-08-25 12:43 ` npiggin at gmail dot com
2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
2021-08-25 13:01 ` npiggin at gmail dot com
2021-08-25 14:05 ` segher at gcc dot gnu.org
2021-08-25 14:10 ` segher at gcc dot gnu.org
2021-08-25 15:31 ` linkw at gcc dot gnu.org
2021-08-25 17:07 ` segher at gcc dot gnu.org
2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
2021-08-25 18:03 ` dje at gcc dot gnu.org
2021-08-25 22:43 ` segher at gcc dot gnu.org
2021-08-25 23:29 ` [Bug rtl-optimization/102062] " dje at gcc dot gnu.org
2021-08-26  0:17 ` npiggin at gmail dot com
2021-08-30 17:34 ` segher at gcc dot gnu.org
2021-09-22 13:53 ` npiggin at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).