public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/34072] unoptimal byte extraction.
[not found] <bug-34072-4@http.gcc.gnu.org/bugzilla/>
@ 2021-08-02 21:08 ` pinskia at gcc dot gnu.org
2023-08-05 21:39 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-02 21:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Known to work| |11.0
Component|target |rtl-optimization
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC 11 RTL before combine:
(insn 2 4 3 2 (set (reg/v:DI 84 [ x ])
(mem/c:DI (reg/f:SI 16 argp) [1 x+0 S8 A32])) "/app/example.cpp":3:45
74 {*movdi_internal}
(nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
(set (reg:DI 86)
(lshiftrt:DI (reg/v:DI 84 [ x ])
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
]) "/app/example.cpp":3:58 688 {*lshrdi3_doubleword}
(expr_list:REG_DEAD (reg/v:DI 84 [ x ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
(insn 7 6 12 2 (set (reg:QI 85)
(subreg:QI (reg:DI 86) 0)) "/app/example.cpp":3:63 77 {*movqi_internal}
(expr_list:REG_DEAD (reg:DI 86)
(nil)))
(insn 12 7 13 2 (set (reg/i:QI 0 ax)
(reg:QI 85)) "/app/example.cpp":3:66 77 {*movqi_internal}
(expr_list:REG_DEAD (reg:QI 85)
(nil)))
GCC 10- before combine:
((insn 2 4 3 2 (set (reg/v:DI 84 [ x ])
(mem/c:DI (reg/f:SI 16 argp) [1 x+0 S8 A32])) "/app/example.cpp":3:45
66 {*movdi_internal}
(nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 12 2 (parallel [
(set (reg:DI 86)
(lshiftrt:DI (reg/v:DI 84 [ x ])
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
]) "/app/example.cpp":3:58 624 {*lshrdi3_doubleword}
(expr_list:REG_DEAD (reg/v:DI 84 [ x ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
(insn 12 6 13 2 (set (reg/i:QI 0 ax)
(subreg:QI (reg:DI 86) 0)) "/app/example.cpp":3:66 69 {*movqi_internal}
(expr_list:REG_DEAD (reg:DI 86)
(nil)))
fwprop used to prop insn 7 into insn 12; I think this changed with r11-6188.
With this change combine can do:
Trying 6 -> 7:
6: {r86:DI=r84:DI 0>>0x8;clobber flags:CC;}
REG_DEAD r84:DI
REG_UNUSED flags:CC
7: r85:QI=r86:DI#0
REG_DEAD r86:DI
Successfully matched this instruction:
(set (subreg:SI (reg:QI 85) 0)
(zero_extract:SI (subreg:SI (reg/v:DI 84 [ x ]) 0)
(const_int 8 [0x8])
(const_int 8 [0x8])))
Trying 2 -> 7:
2: r84:DI=[argp:SI]
7: r85:QI#0=zero_extract(r84:DI#0,0x8,0x8)
REG_DEAD r84:DI
Successfully matched this instruction:
(set (subreg:SI (reg:QI 85) 0)
(zero_extend:SI (mem/c:QI (plus:SI (reg/f:SI 16 argp)
(const_int 1 [0x1])) [1 x+1 S1 A8])))
Where before combine would produce:
Trying 2 -> 6:
2: r84:DI=[argp:SI]
6: {r86:DI=r84:DI 0>>0x8;clobber flags:CC;}
REG_DEAD r84:DI
REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
(set (reg:DI 86)
(lshiftrt:DI (mem/c:DI (reg/f:SI 16 argp) [1 x+0 S8 A32])
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
])
And we would never combine into 12 insn.
As for the C++ example:
combine does
Trying 14 -> 6:
14: r85:SI=[argp:SI]
6: r84:QI#0=zero_extract(r85:SI,0x8,0x8)
REG_DEAD r85:SI
Successfully matched this instruction:
(set (subreg:SI (reg:QI 84) 0)
(zero_extend:SI (mem/c:QI (plus:SI (reg/f:SI 16 argp)
(const_int 1 [0x1])) [1 x+1 S1 A8])))
But never tries to move the subreg from the lhs to rhs of the set ....
So we still have an issue for the C++ example with memcpy.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/34072] unoptimal byte extraction.
[not found] <bug-34072-4@http.gcc.gnu.org/bugzilla/>
2021-08-02 21:08 ` [Bug rtl-optimization/34072] unoptimal byte extraction pinskia at gcc dot gnu.org
@ 2023-08-05 21:39 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-05 21:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> So we still have an issue for the C++ example with memcpy.
But the code did improve from GCC 6 to GCC 7 though.
From
.cfi_startproc
subl $12, %esp
.cfi_def_cfa_offset 16
movzwl 16(%esp), %eax
addl $12, %esp
.cfi_def_cfa_offset 4
shrw $8, %ax
ret
To:
movl 4(%esp), %eax
movzbl %ah, %eax
So it is definitely already much better from when this was reported.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/34072] unoptimal byte extraction.
2007-11-12 14:50 [Bug target/34072] New: " pluto at agmk dot net
2007-11-14 1:44 ` [Bug rtl-optimization/34072] " rask at gcc dot gnu dot org
@ 2007-11-14 11:15 ` pluto at agmk dot net
1 sibling, 0 replies; 4+ messages in thread
From: pluto at agmk dot net @ 2007-11-14 11:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from pluto at agmk dot net 2007-11-14 11:14 -------
and the c++ testcase with __builtin_memcpy:
template < int N >
unsigned char memcpy_byte( unsigned long long x )
{
unsigned char rv;
__builtin_memcpy( &rv, N + reinterpret_cast< unsigned char* >( &x ),
sizeof( rv ) );
return rv;
}
template unsigned char memcpy_byte< 0 >( unsigned long long );
template unsigned char memcpy_byte< 1 >( unsigned long long );
template unsigned char memcpy_byte< 6 >( unsigned long long );
template unsigned char memcpy_byte< 7 >( unsigned long long );
unsigned char memcpy_byte<0>(unsigned long long):
subl $28, %esp
movzbl 32(%esp), %eax # 13 *movqi_1/3 [length = 5]
addl $28, %esp
ret
unsigned char memcpy_byte<1>(unsigned long long):
subl $28, %esp
movzwl 32(%esp), %eax # 34 *movhi_1/3 [length = 5]
addl $28, %esp
shrw $8, %ax # 14 *lshrhi3_1/1 [length = 4]
ret
unsigned char memcpy_byte<6>(unsigned long long):
subl $28, %esp
movzbl 38(%esp), %eax # 14 *movqi_1/3 [length = 5]
addl $28, %esp
ret
unsigned char memcpy_byte<7>(unsigned long long):
subl $28, %esp
movzbl 39(%esp), %eax # 14 *movqi_1/3 [length = 5]
addl $28, %esp
ret
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/34072] unoptimal byte extraction.
2007-11-12 14:50 [Bug target/34072] New: " pluto at agmk dot net
@ 2007-11-14 1:44 ` rask at gcc dot gnu dot org
2007-11-14 11:15 ` pluto at agmk dot net
1 sibling, 0 replies; 4+ messages in thread
From: rask at gcc dot gnu dot org @ 2007-11-14 1:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rask at gcc dot gnu dot org 2007-11-14 01:44 -------
With -S -dp it is clear that only byte0 is optimized:
byte0:
movzbl 4(%esp), %eax # 11 *movqi_1/3
byte1:
movl 4(%esp), %eax # 24 *movsi_1/1
movl 8(%esp), %edx # 25 *movsi_1/1
shrdl $8, %edx, %eax # 30 x86_shrd_1/1
byte6:
movzwl 10(%esp), %eax # 24 *zero_extendhisi2_movzwl
byte7:
movzbl 11(%esp), %eax # 28 *zero_extendqisi2_movzbw
They should all be optimized to use movqi. The first part of the problem is
that any of cse, cse2, gcse and fwprop will combine these instructions
(insn 7 6 8 2 /tmp/pr34072.c:3 (set (reg:QI 60)
(subreg:QI (reg:SI 64) 0)) 62 {*movqi_1} (nil))
(insn 8 7 12 2 /tmp/pr34072.c:3 (set (reg:QI 58 [ <result> ])
(reg:QI 60)) 62 {*movqi_1} (nil))
(insn 12 8 18 2 /tmp/pr34072.c:3 (set (reg/i:QI 0 ax)
(reg:QI 58 [ <result> ])) 62 {*movqi_1} (nil))
into
(insn 12 8 18 2 /tmp/pr34072.c:3 (set (reg/i:QI 0 ax [ <result> ])
(subreg:QI (reg:SI 64) 0)) 62 {*movqi_1} (nil))
and then combine won't touch it because of the hard register (ax) and
SMALL_REGISTER_CLASSES and/or CLASS_LIKELY_SPILLED. The fix is to teach
these passes to not combine these insns, as demonstrated using
-fno-forward-propagate -fno-gcse -fno-rerun-cse-after-loop -fno-cse[1]:
byte6:
movzbl 10(%esp), %eax # 8 *movqi_1/3
byte7:
movzbl 11(%esp), %eax # 8 *movqi_1/3
Byte1 is still not optimized because we're failing to simplify this
instruction in combine:
(set (reg:QI 60)
(subreg:QI (lshiftrt:DI (mem/c/i:DI (reg/f:SI 16 argp) [2 x+0 S8 A32])
(const_int 8 [0x8])) 0))
I should be entirely possible to simplify it to this:
(set (reg:QI 60) (mem/c/i:QI (plus:SI (reg/f:SI 16 argp) (const_int 1))))
[1] An option I hacked in to debug this problem.
--
rask at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Component|target |rtl-optimization
Ever Confirmed|0 |1
Keywords| |missed-optimization
Known to fail| |4.3.0
Last reconfirmed|0000-00-00 00:00:00 |2007-11-14 01:44:03
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-05 21:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-34072-4@http.gcc.gnu.org/bugzilla/>
2021-08-02 21:08 ` [Bug rtl-optimization/34072] unoptimal byte extraction pinskia at gcc dot gnu.org
2023-08-05 21:39 ` pinskia at gcc dot gnu.org
2007-11-12 14:50 [Bug target/34072] New: " pluto at agmk dot net
2007-11-14 1:44 ` [Bug rtl-optimization/34072] " rask at gcc dot gnu dot org
2007-11-14 11:15 ` pluto at agmk dot net
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).