public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V
@ 2023-02-16 17:53 tkoenig at gcc dot gnu.org
2023-02-16 17:56 ` [Bug rtl-optimization/108826] " pinskia at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2023-02-16 17:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
Bug ID: 108826
Summary: Inefficient address generation on POWER and RISC-V
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tkoenig at gcc dot gnu.org
Target Milestone: ---
For the code (reduced from embench)
struct {
unsigned int table[4][100];
} * _nettle_aes_decrypt_T;
unsigned int _nettle_aes_decrypt_w1;
void _nettle_aes_decrypt() {
_nettle_aes_decrypt_T->table[2][0] =
_nettle_aes_decrypt_T->table[2][_nettle_aes_decrypt_w1 >> 6 & 5];
}
current trunk generates
0: addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
.localentry _nettle_aes_decrypt,.-_nettle_aes_decrypt
addis 9,2,.LANCHOR0+8@toc@ha
lwz 9,.LANCHOR0+8@toc@l(9)
addis 10,2,.LANCHOR0@toc@ha
ld 10,.LANCHOR0@toc@l(10)
srwi 9,9,6
andi. 9,9,0x5
addi 9,9,200
sldi 9,9,2
lwzx 9,10,9
stw 9,800(10)
blr
After the TOC loading, this shifts the value once, does the and, adds 200
and then shifts back the value. These two shifts are not necessary.
A better alternative would be something like (please excuse any errors)
srwi 9,9,4
andi 9,9,20
add 9,9,2
lwz 9,800(9)
stw 9,800(9)
saving an instruction.
RISC-V does something similar. According to godbolt:
lui a5,%hi(_nettle_aes_decrypt_w1)
lw a5,%lo(_nettle_aes_decrypt_w1)(a5)
lui a4,%hi(_nettle_aes_decrypt_T)
ld a4,%lo(_nettle_aes_decrypt_T)(a4)
srliw a5,a5,6
andi a5,a5,5
addi a5,a5,200
slli a5,a5,2
add a5,a4,a5
lw a5,0(a5)
sw a5,800(a4)
ret
(which is why I think this is a general RTL optimization issue).
x86 is much better:
movl _nettle_aes_decrypt_w1(%rip), %eax
movq _nettle_aes_decrypt_T(%rip), %rdx
shrl $6, %eax
andl $5, %eax
movl 800(%rdx,%rax,4), %eax
movl %eax, 800(%rdx)
ret
but it can use the complex addressing modes on x86.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
@ 2023-02-16 17:56 ` pinskia at gcc dot gnu.org
2023-02-16 18:00 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-16 17:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
AARCH64 looks ok too because of the use of more complex adddresses:
ldr w0, [x0, #:lo12:.LANCHOR0]
and w0, w2, w0, lsr 6
add x0, x0, 200
ldr w0, [x1, x0, lsl 2]
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
2023-02-16 17:56 ` [Bug rtl-optimization/108826] " pinskia at gcc dot gnu.org
@ 2023-02-16 18:00 ` pinskia at gcc dot gnu.org
2023-02-16 18:00 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-16 18:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually this is aarch64:
ldr x1, [x0, #:lo12:.LANCHOR0]
ldr w0, [x3, 8]
and w0, w2, w0, lsr 6
add x0, x0, 200
ldr w0, [x1, x0, lsl 2]
str w0, [x1, 800]
Note I need to better understand why the C++ front-end thinks this would be
invalid ...
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
2023-02-16 17:56 ` [Bug rtl-optimization/108826] " pinskia at gcc dot gnu.org
2023-02-16 18:00 ` pinskia at gcc dot gnu.org
@ 2023-02-16 18:00 ` pinskia at gcc dot gnu.org
2023-02-16 18:09 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-16 18:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> Note I need to better understand why the C++ front-end thinks this would be
> invalid ...
Oh because the struct name is unnamed :).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
` (2 preceding siblings ...)
2023-02-16 18:00 ` pinskia at gcc dot gnu.org
@ 2023-02-16 18:09 ` pinskia at gcc dot gnu.org
2023-02-16 18:21 ` palmer at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-16 18:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Last reconfirmed| |2023-02-16
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Trying 13, 14, 15 -> 16:
13: r84:DI=r83:DI+0xc8
REG_DEAD r83:DI
14: r85:DI=r84:DI<<0x2
REG_DEAD r84:DI
15: r86:DI=r72:DI+r85:DI
REG_DEAD r85:DI
16: r76:DI=sign_extend([r86:DI])
REG_DEAD r86:DI
Failed to match this instruction:
(set (reg:DI 76 [ _5 ])
(sign_extend:DI (mem:SI (plus:DI (plus:DI (mult:DI (reg:DI 83)
(const_int 4 [0x4]))
(reg/f:DI 72 [ _nettle_aes_decrypt_T.0_1 ]))
(const_int 800 [0x320])) [2
_nettle_aes_decrypt_T.0_1->table[2][_4]+0 S4 A32])))
Failed to match this instruction:
(set (reg/f:DI 86)
(plus:DI (ashift:DI (reg:DI 83)
(const_int 2 [0x2]))
(reg/f:DI 72 [ _nettle_aes_decrypt_T.0_1 ])))
So combine does know how to combine all 4 instructions and produce the plus 800
there. But then it goes and splits it up and fails. I can't remember if there
is 4->3 splitting or just 4->2 .
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
` (3 preceding siblings ...)
2023-02-16 18:09 ` pinskia at gcc dot gnu.org
@ 2023-02-16 18:21 ` palmer at gcc dot gnu.org
2023-02-16 18:25 ` pinskia at gcc dot gnu.org
2023-09-28 19:55 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: palmer at gcc dot gnu.org @ 2023-02-16 18:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
--- Comment #5 from palmer at gcc dot gnu.org ---
We've run into a handful of things that look like this before, I'm not sure if
it's a backend issue or something more general. There's two patterns here that
are frequently bad on RISC-V: "unsigned int" array indices and unsigned int
shifting. I think they might both boil down to some problems we have tracking
the high parts of registers around ABI boundaries.
FWIW, the smallest bad code I can get is
unsigned int func(unsigned int ui) {
return (ui >> 6 & 5) << 2;
}
func:
srliw a0,a0,6
slliw a0,a0,2
andi a0,a0,20
ret
which is particularly awkward as enough is going right to try and move that
andi, but we still end up with the double shifts.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
` (4 preceding siblings ...)
2023-02-16 18:21 ` palmer at gcc dot gnu.org
@ 2023-02-16 18:25 ` pinskia at gcc dot gnu.org
2023-09-28 19:55 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-16 18:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to palmer from comment #5)
> We've run into a handful of things that look like this before, I'm not sure
> if it's a backend issue or something more general. There's two patterns
> here that are frequently bad on RISC-V: "unsigned int" array indices and
> unsigned int shifting. I think they might both boil down to some problems
> we have tracking the high parts of registers around ABI boundaries.
That seems unrelated to the issue here. In this case the shift is in DI
(ptrmode) mode already so the shift is fine. See comment # 4 for the RTL (this
was the RTL even for RV64).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
` (5 preceding siblings ...)
2023-02-16 18:25 ` pinskia at gcc dot gnu.org
@ 2023-09-28 19:55 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-28 19:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lis8215 at gmail dot com
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 111626 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-09-28 19:55 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-16 17:53 [Bug rtl-optimization/108826] New: Inefficient address generation on POWER and RISC-V tkoenig at gcc dot gnu.org
2023-02-16 17:56 ` [Bug rtl-optimization/108826] " pinskia at gcc dot gnu.org
2023-02-16 18:00 ` pinskia at gcc dot gnu.org
2023-02-16 18:00 ` pinskia at gcc dot gnu.org
2023-02-16 18:09 ` pinskia at gcc dot gnu.org
2023-02-16 18:21 ` palmer at gcc dot gnu.org
2023-02-16 18:25 ` pinskia at gcc dot gnu.org
2023-09-28 19:55 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).