public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/102062] New: powerpc suboptimal unrolling simple array sum
@ 2021-08-25 11:27 npiggin at gmail dot com
2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-25 11:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
Bug ID: 102062
Summary: powerpc suboptimal unrolling simple array sum
Product: gcc
Version: 11.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: npiggin at gmail dot com
Target Milestone: ---
Target: powerpc64le-linux-gnu
--- test.c ---
int test(int *arr, int sz)
{
int ret = 0;
int i;
if (sz < 1)
__builtin_unreachable();
for (i = 0; i < sz*2; i++)
ret += arr[i];
return ret;
}
---
gcc-11 compiles this to:
test:
rldic 4,4,1,32
addi 10,3,-4
rldicl 9,4,63,33
li 3,0
mtctr 9
.L2:
addi 8,10,4
lwz 9,4(10)
addi 10,10,8
lwz 8,4(8)
add 9,9,3
add 9,9,8
extsw 3,9
bdnz .L2
blr
I may be unaware of a constraint of C standard here, but maintaining the two
base addresses seems pointless, so is beginning the first at offset -4.
The bigger problem is keeping a single sum. Keeping two sums and adding them at
the end reduces critical latency of the loop from 6 to 2, which brings
throughput on large loops from 6 cycles per iteration down to about 2.2 on
POWER9 without harming short loops:
test:
rldic 4,4,1,32
rldicl 9,4,63,33
mtctr 9
li 8,0
li 9,0
.L2:
lwz 6,0(3)
lwz 7,4(3)
addi 3,3,8
add 8,8,6
add 9,9,7
bdnz .L2
add 9,9,8
extsw 3,9
blr
Any reason this can't be done?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
@ 2021-08-25 11:52 ` wschmidt at gcc dot gnu.org
2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 11:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
Bill Schmidt <wschmidt at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wschmidt at gcc dot gnu.org
--- Comment #1 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Regarding the latter question, I'm surprised it's not being done. This
behavior is controlled by -fvariable-expansion-in-unroller, which was enabled
by default for PowerPC targets a couple of releases back. You reported this
against GCC 11.2, but I'm skeptical. What options are you using?
Compiling with -O2 and current trunk, I see variable expansion kicking in, and
I also see the same base register in use in all references in the loop:
test:
.LFB0:
.cfi_startproc
.localentry test,1
slwi 4,4,1
li 10,0
li 7,0
addi 9,3,-4
extsw 4,4
andi. 6,4,0x3
addi 5,4,-1
mr 8,4
beq 0,.L9
cmpdi 0,6,1
beq 0,.L13
cmpdi 0,6,2
bne 0,.L22
.L14:
lwzu 6,4(9)
addi 4,4,-1
add 10,10,6
.L13:
lwzu 6,4(9)
cmpdi 0,4,1
add 10,10,6
beq 0,.L19
.L9:
srdi 8,8,2
mtctr 8
.L2:
lwz 4,4(9)
lwz 5,12(9)
lwz 6,8(9)
lwzu 8,16(9)
add 10,4,10
add 10,10,5
add 7,6,7
add 7,7,8
bdnz .L2
.L19:
add 3,10,7
extsw 3,3
blr
.p2align 4,,15
.L22:
lwz 10,0(3)
mr 9,3
mr 4,5
b .L14
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
@ 2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
2021-08-25 12:43 ` npiggin at gmail dot com
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 11:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #2 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
As expected, I get similar code when compiling either for P9 or P10.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
@ 2021-08-25 12:43 ` npiggin at gmail dot com
2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-25 12:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #3 from Nicholas Piggin <npiggin at gmail dot com> ---
(In reply to Bill Schmidt from comment #2)
> As expected, I get similar code when compiling either for P9 or P10.
Oh I should have specified, -O2 is the only option. If I add
-fvariable-expansion-in-unroller it has no effect, just to make sure.
It's gcc from Debian (gcc version 11.2.0 (Debian 11.2.0-3)). Maybe they've done
something to change this.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (2 preceding siblings ...)
2021-08-25 12:43 ` npiggin at gmail dot com
@ 2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
2021-08-25 13:01 ` npiggin at gmail dot com
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 12:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
Bill Schmidt <wschmidt at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2021-08-25
--- Comment #4 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Interesting. I dug up an older GCC 11 compiler off of one of my systems and it
behaves as you report. No idea why this should differ between 11 and 12, as
the user of variable expansion as default goes back at least to 11 (I'm
thinking as early as 9).
Confirmed.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (3 preceding siblings ...)
2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
@ 2021-08-25 13:01 ` npiggin at gmail dot com
2021-08-25 14:05 ` segher at gcc dot gnu.org
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-25 13:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #5 from Nicholas Piggin <npiggin at gmail dot com> ---
(In reply to Bill Schmidt from comment #2)
> As expected, I get similar code when compiling either for P9 or P10.
Oh I should have specified, -O2 is the only option. If I add
-fvariable-expansion-in-unroller it has no effect, just to make sure.
It's gcc from Debian (gcc version 11.2.0 (Debian 11.2.0-3)). Maybe they've done
something to change this.(In reply to Bill Schmidt from comment #1)
> Regarding the latter question, I'm surprised it's not being done. This
> behavior is controlled by -fvariable-expansion-in-unroller, which was
> enabled by default for PowerPC targets a couple of releases back. You
> reported this against GCC 11.2, but I'm skeptical. What options are you
> using?
>
> Compiling with -O2 and current trunk, I see variable expansion kicking in,
> and I also see the same base register in use in all references in the loop:
>
> test:
> .LFB0:
> .cfi_startproc
> .localentry test,1
> slwi 4,4,1
> li 10,0
> li 7,0
> addi 9,3,-4
> extsw 4,4
> andi. 6,4,0x3
> addi 5,4,-1
> mr 8,4
> beq 0,.L9
> cmpdi 0,6,1
> beq 0,.L13
> cmpdi 0,6,2
> bne 0,.L22
> .L14:
> lwzu 6,4(9)
> addi 4,4,-1
> add 10,10,6
> .L13:
> lwzu 6,4(9)
> cmpdi 0,4,1
> add 10,10,6
> beq 0,.L19
> .L9:
> srdi 8,8,2
> mtctr 8
> .L2:
> lwz 4,4(9)
> lwz 5,12(9)
> lwz 6,8(9)
> lwzu 8,16(9)
> add 10,4,10
> add 10,10,5
> add 7,6,7
> add 7,7,8
> bdnz .L2
> .L19:
> add 3,10,7
> extsw 3,3
> blr
> .p2align 4,,15
> .L22:
> lwz 10,0(3)
> mr 9,3
> mr 4,5
> b .L14
That asm does well on the test, better than my version (a little bit on P9, a
lot on P10). It does have 2x more unrolling which probably helps a bit.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (4 preceding siblings ...)
2021-08-25 13:01 ` npiggin at gmail dot com
@ 2021-08-25 14:05 ` segher at gcc dot gnu.org
2021-08-25 14:10 ` segher at gcc dot gnu.org
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 14:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Nicholas Piggin from comment #0)
> I may be unaware of a constraint of C standard here, but maintaining the two
> base addresses seems pointless,
This is an ordering problem. The unroller works (late) in RTL, it cannot
do a good job of induction variable opts.
> so is beginning the first at offset -4.
This is actually useful: it makes it possible to use update-form insns much
more often. It isn't optimised away later when it turns out not to help --
see ordering again.
> The bigger problem is keeping a single sum.
That is the variable expansion thing.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (5 preceding siblings ...)
2021-08-25 14:05 ` segher at gcc dot gnu.org
@ 2021-08-25 14:10 ` segher at gcc dot gnu.org
2021-08-25 15:31 ` linkw at gcc dot gnu.org
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 14:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Btw, -ftree-loop-vectorize -fvect-cost-model=cheap makes this 8 vectors per
iteration (and very-cheap doesn't vectorise it). Maybe overkill, esp. when
you look at the tail code, but that 8 vector core sure looks tight :-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (6 preceding siblings ...)
2021-08-25 14:10 ` segher at gcc dot gnu.org
@ 2021-08-25 15:31 ` linkw at gcc dot gnu.org
2021-08-25 17:07 ` segher at gcc dot gnu.org
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-08-25 15:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
Kewen Lin <linkw at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |linkw at gcc dot gnu.org
--- Comment #8 from Kewen Lin <linkw at gcc dot gnu.org> ---
Haochen's patch r12-1202 helps to make the variable expansion work for this
case.
With GCC11, we get the RTL like:
11: NOTE_INSN_BASIC_BLOCK 3
12: r118:DI=r118:DI+0x4
13: r128:SI=[r118:DI]
14: r127:SI=r128:SI+r123:DI#0 // A
REG_DEAD r128:SI
REG_DEAD r123:DI
15: r123:DI=sign_extend(r127:SI) // B
REG_DEAD r127:SI
16: r129:SI=r119:DI#0-0x1
REG_DEAD r119:DI
17: r119:DI=zero_extend(r129:SI)
REG_DEAD r129:SI
19: r130:CC=cmp(r119:DI,0)
20: pc={(r130:CC!=0)?L35:pc}
While with trunk, we get the RTL like:
10: NOTE_INSN_BASIC_BLOCK 3
11: r118:DI=r118:DI+0x4
12: r127:SI=[r118:DI]
13: r120:SI=r120:SI+r127:SI // C
REG_DEAD r127:SI
14: r119:SI=r119:SI-0x1
16: r128:CC=cmp(r119:SI,0)
17: pc={(r128:CC!=0)?L33:pc}
We have A+B for the accumulation with GCC11 while just C with trunk. The C
pattern matches the check in function analyze_insn_to_expand_var, which is able
to record var_to_expand further.
The related code in analyze_insn_to_expand_var is:
/* Find the accumulator use within the operation. */
if (code == FMA)
{
/* We only support accumulation via FMA in the ADD position. */
if (!rtx_equal_p (dest, XEXP (src, 2)))
return NULL;
accum_pos = 2;
}
else if (rtx_equal_p (dest, XEXP (src, 0)))
accum_pos = 0;
else if (rtx_equal_p (dest, XEXP (src, 1)))
{
/* The method of expansion that we are using; which includes the
initialization of the expansions with zero and the summation of
the expansions at the end of the computation will yield wrong
results for (x = something - x) thus avoid using it in that case. */
if (code == MINUS)
return NULL;
accum_pos = 1;
}
else
return NULL;
The key is if dest can match XEXP (src, 0) or XEXP (src, 1).
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (7 preceding siblings ...)
2021-08-25 15:31 ` linkw at gcc dot gnu.org
@ 2021-08-25 17:07 ` segher at gcc dot gnu.org
2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 17:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Thanks for the detective work!
So the variable expansion code could be improved to handle sign extensions
better (or maybe zero extensions as well?) In either case that won't help
rs6000 much anymore (because the usual case, 32-bit -> 64-bit) doesn't affect
us anymore :-)
Is there anything left to do here, or can we close the PR?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (8 preceding siblings ...)
2021-08-25 17:07 ` segher at gcc dot gnu.org
@ 2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
2021-08-25 18:03 ` dje at gcc dot gnu.org
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-08-25 18:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #10 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Well, the problem is that we still generate suboptimal code on GCC 11. I don't
know whether we want to address that or not.
I suppose we aren't going to backport Haochen's lovely patch for sign
extensions, and it's probably not worth messing around with variable expansion
just for GCC 11...
Nick, can you use the -ftree-vectorize -fvect-cost-model=cheap to deal with
this (or even use -O3, which is recommended and probably does the same thing)?
Or are you holding out for some fix in GCC 11.x?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (9 preceding siblings ...)
2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
@ 2021-08-25 18:03 ` dje at gcc dot gnu.org
2021-08-25 22:43 ` segher at gcc dot gnu.org
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: dje at gcc dot gnu.org @ 2021-08-25 18:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #11 from David Edelsohn <dje at gcc dot gnu.org> ---
We could backport Haochen's patch to AT.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (10 preceding siblings ...)
2021-08-25 18:03 ` dje at gcc dot gnu.org
@ 2021-08-25 22:43 ` segher at gcc dot gnu.org
2021-08-25 23:29 ` [Bug rtl-optimization/102062] " dje at gcc dot gnu.org
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-25 22:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
We can backport Hao Chen's patch, it has proven to cause no problems at all.
We don't normally backport patches that aren't bugfixes, but we could do it
for important enough things (we did it for most p11 enablement patches for
example).
David, do you agree?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (11 preceding siblings ...)
2021-08-25 22:43 ` segher at gcc dot gnu.org
@ 2021-08-25 23:29 ` dje at gcc dot gnu.org
2021-08-26 0:17 ` npiggin at gmail dot com
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: dje at gcc dot gnu.org @ 2021-08-25 23:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #13 from David Edelsohn <dje at gcc dot gnu.org> ---
I don't object to backporting Hao Chen's patch. It has baked sufficiently on
trunk that it seems relatively stable.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (12 preceding siblings ...)
2021-08-25 23:29 ` [Bug rtl-optimization/102062] " dje at gcc dot gnu.org
@ 2021-08-26 0:17 ` npiggin at gmail dot com
2021-08-30 17:34 ` segher at gcc dot gnu.org
2021-09-22 13:53 ` npiggin at gmail dot com
15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-08-26 0:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #14 from Nicholas Piggin <npiggin at gmail dot com> ---
(In reply to Bill Schmidt from comment #10)
> Well, the problem is that we still generate suboptimal code on GCC 11. I
> don't know whether we want to address that or not.
>
> I suppose we aren't going to backport Haochen's lovely patch for sign
> extensions, and it's probably not worth messing around with variable
> expansion just for GCC 11...
>
> Nick, can you use the -ftree-vectorize -fvect-cost-model=cheap to deal with
> this (or even use -O3, which is recommended and probably does the same
> thing)? Or are you holding out for some fix in GCC 11.x?
For my case no but it's only a very small issue in the scheme of things, I only
found it by looking at code generation. So you can close immediately if it's
fixed upstream, or whatever else you think best. Thanks all.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (13 preceding siblings ...)
2021-08-26 0:17 ` npiggin at gmail dot com
@ 2021-08-30 17:34 ` segher at gcc dot gnu.org
2021-09-22 13:53 ` npiggin at gmail dot com
15 siblings, 0 replies; 17+ messages in thread
From: segher at gcc dot gnu.org @ 2021-08-30 17:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #15 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This should be fixed now, please confirm. Thanks!
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug rtl-optimization/102062] powerpc suboptimal unrolling simple array sum
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
` (14 preceding siblings ...)
2021-08-30 17:34 ` segher at gcc dot gnu.org
@ 2021-09-22 13:53 ` npiggin at gmail dot com
15 siblings, 0 replies; 17+ messages in thread
From: npiggin at gmail dot com @ 2021-09-22 13:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
Nicholas Piggin <npiggin at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #16 from Nicholas Piggin <npiggin at gmail dot com> ---
gcc-11 -O2 -mcpu=power10 now generates identical code that Bill listed for
trunk:
.L2:
lwz 4,4(9)
lwz 5,12(9)
lwz 6,8(9)
lwzu 8,16(9)
add 10,4,10
add 10,10,5
add 7,6,7
add 7,7,8
bdnz .L2
Thanks all.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-09-22 13:53 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 11:27 [Bug c/102062] New: powerpc suboptimal unrolling simple array sum npiggin at gmail dot com
2021-08-25 11:52 ` [Bug c/102062] " wschmidt at gcc dot gnu.org
2021-08-25 11:55 ` wschmidt at gcc dot gnu.org
2021-08-25 12:43 ` npiggin at gmail dot com
2021-08-25 12:50 ` wschmidt at gcc dot gnu.org
2021-08-25 13:01 ` npiggin at gmail dot com
2021-08-25 14:05 ` segher at gcc dot gnu.org
2021-08-25 14:10 ` segher at gcc dot gnu.org
2021-08-25 15:31 ` linkw at gcc dot gnu.org
2021-08-25 17:07 ` segher at gcc dot gnu.org
2021-08-25 18:01 ` wschmidt at gcc dot gnu.org
2021-08-25 18:03 ` dje at gcc dot gnu.org
2021-08-25 22:43 ` segher at gcc dot gnu.org
2021-08-25 23:29 ` [Bug rtl-optimization/102062] " dje at gcc dot gnu.org
2021-08-26 0:17 ` npiggin at gmail dot com
2021-08-30 17:34 ` segher at gcc dot gnu.org
2021-09-22 13:53 ` npiggin at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).