public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/34072] unoptimal byte extraction.
       [not found] <bug-34072-4@http.gcc.gnu.org/bugzilla/>
@ 2021-08-02 21:08 ` pinskia at gcc dot gnu.org
  2023-08-05 21:39 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-02 21:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
      Known to work|                            |11.0
          Component|target                      |rtl-optimization

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC 11 RTL before combine:
(insn 2 4 3 2 (set (reg/v:DI 84 [ x ])
        (mem/c:DI (reg/f:SI 16 argp) [1 x+0 S8 A32])) "/app/example.cpp":3:45
74 {*movdi_internal}
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
            (set (reg:DI 86)
                (lshiftrt:DI (reg/v:DI 84 [ x ])
                    (const_int 8 [0x8])))
            (clobber (reg:CC 17 flags))
        ]) "/app/example.cpp":3:58 688 {*lshrdi3_doubleword}
     (expr_list:REG_DEAD (reg/v:DI 84 [ x ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 7 6 12 2 (set (reg:QI 85)
        (subreg:QI (reg:DI 86) 0)) "/app/example.cpp":3:63 77 {*movqi_internal}
     (expr_list:REG_DEAD (reg:DI 86)
        (nil)))
(insn 12 7 13 2 (set (reg/i:QI 0 ax)
        (reg:QI 85)) "/app/example.cpp":3:66 77 {*movqi_internal}
     (expr_list:REG_DEAD (reg:QI 85)
        (nil)))


GCC 10- before combine:
((insn 2 4 3 2 (set (reg/v:DI 84 [ x ])
        (mem/c:DI (reg/f:SI 16 argp) [1 x+0 S8 A32])) "/app/example.cpp":3:45
66 {*movdi_internal}
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 12 2 (parallel [
            (set (reg:DI 86)
                (lshiftrt:DI (reg/v:DI 84 [ x ])
                    (const_int 8 [0x8])))
            (clobber (reg:CC 17 flags))
        ]) "/app/example.cpp":3:58 624 {*lshrdi3_doubleword}
     (expr_list:REG_DEAD (reg/v:DI 84 [ x ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 12 6 13 2 (set (reg/i:QI 0 ax)
        (subreg:QI (reg:DI 86) 0)) "/app/example.cpp":3:66 69 {*movqi_internal}
     (expr_list:REG_DEAD (reg:DI 86)
        (nil)))

fwprop used to prop insn 7 into insn 12; I think this changed with r11-6188.

With this change combine can do:
Trying 6 -> 7:
    6: {r86:DI=r84:DI 0>>0x8;clobber flags:CC;}
      REG_DEAD r84:DI
      REG_UNUSED flags:CC
    7: r85:QI=r86:DI#0
      REG_DEAD r86:DI
Successfully matched this instruction:
(set (subreg:SI (reg:QI 85) 0)
    (zero_extract:SI (subreg:SI (reg/v:DI 84 [ x ]) 0)
        (const_int 8 [0x8])
        (const_int 8 [0x8])))
Trying 2 -> 7:
    2: r84:DI=[argp:SI]
    7: r85:QI#0=zero_extract(r84:DI#0,0x8,0x8)
      REG_DEAD r84:DI
Successfully matched this instruction:
(set (subreg:SI (reg:QI 85) 0)
    (zero_extend:SI (mem/c:QI (plus:SI (reg/f:SI 16 argp)
                (const_int 1 [0x1])) [1 x+1 S1 A8])))

Where before combine would produce:
Trying 2 -> 6:
    2: r84:DI=[argp:SI]
    6: {r86:DI=r84:DI 0>>0x8;clobber flags:CC;}
      REG_DEAD r84:DI
      REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
        (set (reg:DI 86)
            (lshiftrt:DI (mem/c:DI (reg/f:SI 16 argp) [1 x+0 S8 A32])
                (const_int 8 [0x8])))
        (clobber (reg:CC 17 flags))
    ])

And we would never combine into 12 insn.


As for the C++ example:
combine does 
Trying 14 -> 6:
   14: r85:SI=[argp:SI]
    6: r84:QI#0=zero_extract(r85:SI,0x8,0x8)
      REG_DEAD r85:SI
Successfully matched this instruction:
(set (subreg:SI (reg:QI 84) 0)
    (zero_extend:SI (mem/c:QI (plus:SI (reg/f:SI 16 argp)
                (const_int 1 [0x1])) [1 x+1 S1 A8])))

But never tries to move the subreg from the lhs to rhs of the set ....
So we still have an issue for the C++ example with memcpy.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/34072] unoptimal byte extraction.
       [not found] <bug-34072-4@http.gcc.gnu.org/bugzilla/>
  2021-08-02 21:08 ` [Bug rtl-optimization/34072] unoptimal byte extraction pinskia at gcc dot gnu.org
@ 2023-08-05 21:39 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-05 21:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> So we still have an issue for the C++ example with memcpy.

But the code did improve from GCC 6 to GCC 7 though.
From
        .cfi_startproc
        subl    $12, %esp
        .cfi_def_cfa_offset 16
        movzwl  16(%esp), %eax
        addl    $12, %esp
        .cfi_def_cfa_offset 4
        shrw    $8, %ax
        ret

To:
        movl    4(%esp), %eax
        movzbl  %ah, %eax

So it is definitely already much better from when this was reported.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/34072] unoptimal byte extraction.
  2007-11-12 14:50 [Bug target/34072] New: " pluto at agmk dot net
  2007-11-14  1:44 ` [Bug rtl-optimization/34072] " rask at gcc dot gnu dot org
@ 2007-11-14 11:15 ` pluto at agmk dot net
  1 sibling, 0 replies; 4+ messages in thread
From: pluto at agmk dot net @ 2007-11-14 11:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pluto at agmk dot net  2007-11-14 11:14 -------
and the c++ testcase with __builtin_memcpy:

template < int N >
unsigned char memcpy_byte( unsigned long long x )
{
    unsigned char rv;
    __builtin_memcpy( &rv, N + reinterpret_cast< unsigned char* >( &x ),
        sizeof( rv ) );
    return rv;
}

template unsigned char memcpy_byte< 0 >( unsigned long long );
template unsigned char memcpy_byte< 1 >( unsigned long long );
template unsigned char memcpy_byte< 6 >( unsigned long long );
template unsigned char memcpy_byte< 7 >( unsigned long long );

unsigned char memcpy_byte<0>(unsigned long long):
        subl    $28, %esp
        movzbl  32(%esp), %eax  # 13    *movqi_1/3      [length = 5]
        addl    $28, %esp
        ret

unsigned char memcpy_byte<1>(unsigned long long):
        subl    $28, %esp
        movzwl  32(%esp), %eax  # 34    *movhi_1/3      [length = 5]
        addl    $28, %esp
        shrw    $8, %ax         # 14    *lshrhi3_1/1    [length = 4]
        ret

unsigned char memcpy_byte<6>(unsigned long long):
        subl    $28, %esp
        movzbl  38(%esp), %eax  # 14    *movqi_1/3      [length = 5]
        addl    $28, %esp
        ret

unsigned char memcpy_byte<7>(unsigned long long):
        subl    $28, %esp
        movzbl  39(%esp), %eax  # 14    *movqi_1/3      [length = 5]
        addl    $28, %esp
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/34072] unoptimal byte extraction.
  2007-11-12 14:50 [Bug target/34072] New: " pluto at agmk dot net
@ 2007-11-14  1:44 ` rask at gcc dot gnu dot org
  2007-11-14 11:15 ` pluto at agmk dot net
  1 sibling, 0 replies; 4+ messages in thread
From: rask at gcc dot gnu dot org @ 2007-11-14  1:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rask at gcc dot gnu dot org  2007-11-14 01:44 -------
With -S -dp it is clear that only byte0 is optimized:

byte0:
        movzbl  4(%esp), %eax   # 11    *movqi_1/3
byte1:
        movl    4(%esp), %eax   # 24    *movsi_1/1
        movl    8(%esp), %edx   # 25    *movsi_1/1
        shrdl   $8, %edx, %eax  # 30    x86_shrd_1/1
byte6:
        movzwl  10(%esp), %eax  # 24    *zero_extendhisi2_movzwl
byte7:
        movzbl  11(%esp), %eax  # 28    *zero_extendqisi2_movzbw

They should all be optimized to use movqi. The first part of the problem is
that any of cse, cse2, gcse and fwprop will combine these instructions

(insn 7 6 8 2 /tmp/pr34072.c:3 (set (reg:QI 60)
        (subreg:QI (reg:SI 64) 0)) 62 {*movqi_1} (nil))

(insn 8 7 12 2 /tmp/pr34072.c:3 (set (reg:QI 58 [ <result> ])
        (reg:QI 60)) 62 {*movqi_1} (nil))

(insn 12 8 18 2 /tmp/pr34072.c:3 (set (reg/i:QI 0 ax)
        (reg:QI 58 [ <result> ])) 62 {*movqi_1} (nil))

into

(insn 12 8 18 2 /tmp/pr34072.c:3 (set (reg/i:QI 0 ax [ <result> ])
        (subreg:QI (reg:SI 64) 0)) 62 {*movqi_1} (nil))

and then combine won't touch it because of the hard register (ax) and
SMALL_REGISTER_CLASSES and/or CLASS_LIKELY_SPILLED. The fix is to teach
these passes to not combine these insns, as demonstrated using
-fno-forward-propagate -fno-gcse -fno-rerun-cse-after-loop -fno-cse[1]:

byte6:
        movzbl  10(%esp), %eax  # 8     *movqi_1/3
byte7:
        movzbl  11(%esp), %eax  # 8     *movqi_1/3

Byte1 is still not optimized because we're failing to simplify this
instruction in combine:

(set (reg:QI 60)
    (subreg:QI (lshiftrt:DI (mem/c/i:DI (reg/f:SI 16 argp) [2 x+0 S8 A32])
            (const_int 8 [0x8])) 0))

I should be entirely possible to simplify it to this:

(set (reg:QI 60) (mem/c/i:QI (plus:SI (reg/f:SI 16 argp) (const_int 1))))

[1] An option I hacked in to debug this problem.


-- 

rask at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
          Component|target                      |rtl-optimization
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
      Known to fail|                            |4.3.0
   Last reconfirmed|0000-00-00 00:00:00         |2007-11-14 01:44:03
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-05 21:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-34072-4@http.gcc.gnu.org/bugzilla/>
2021-08-02 21:08 ` [Bug rtl-optimization/34072] unoptimal byte extraction pinskia at gcc dot gnu.org
2023-08-05 21:39 ` pinskia at gcc dot gnu.org
2007-11-12 14:50 [Bug target/34072] New: " pluto at agmk dot net
2007-11-14  1:44 ` [Bug rtl-optimization/34072] " rask at gcc dot gnu dot org
2007-11-14 11:15 ` pluto at agmk dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).