public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
Date: Wed, 12 Apr 2023 12:21:45 +0000	[thread overview]
Message-ID: <bug-109476-4-LRxsIIvFAz@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-109476-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |rtl-optimization
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-04-12
             Status|UNCONFIRMED                 |NEW
                 CC|                            |iant at google dot com,
                   |                            |law at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the good case (with + 1) combine succeeds:

Trying 9 -> 11:
    9: r55:HI=zero_extend(r54:QI)
      REG_DEAD r54:QI
   11: r52:HI=r55:HI*r56:HI
      REG_DEAD r56:HI
      REG_DEAD r55:HI
Successfully matched this instruction:
(set (reg:HI 52)
    (mult:HI (zero_extend:HI (reg:QI 54))
        (reg:HI 56 [ a ])))
allowing combination of insns 9 and 11
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 9.
modifying insn i3    11: r52:HI=zero_extend(r54:QI)*r56:HI
      REG_DEAD r54:QI
      REG_DEAD r56:HI
deferring rescan insn with uid = 11.

and then

Trying 10 -> 11:
   10: r56:HI=zero_extend(r61:QI)
      REG_DEAD r61:QI
   11: r52:HI=zero_extend(r54:QI)*r56:HI
      REG_DEAD r54:QI
      REG_DEAD r56:HI
Successfully matched this instruction:
(set (reg:HI 52)
    (mult:HI (zero_extend:HI (reg:QI 54))
        (zero_extend:HI (reg:QI 61))))
allowing combination of insns 10 and 11
original costs 4 + 20 = 24
replacement cost 12
deferring deletion of insn with uid = 10.
modifying insn i3    11: r52:HI=zero_extend(r54:QI)*zero_extend(r61:QI)
      REG_DEAD r61:QI
      REG_DEAD r54:QI
deferring rescan insn with uid = 11.


in the bad case instead

Trying 8 -> 9:
    8: r52:HI=zero_extend(r55:QI)
      REG_DEAD r55:QI
    9: r50:HI=r51:HI*r52:HI
      REG_DEAD r52:HI
      REG_DEAD r51:HI
Successfully matched this instruction:
(set (reg:HI 50)
    (mult:HI (zero_extend:HI (reg:QI 55))
        (reg:HI 51 [ b+1 ])))
allowing combination of insns 8 and 9
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 8.
modifying insn i3     9: r50:HI=zero_extend(r55:QI)*r51:HI
      REG_DEAD r55:QI
      REG_DEAD r51:HI
deferring rescan insn with uid = 9.

Trying 20 -> 9:
   20: r51:HI#1=0
    9: r50:HI=zero_extend(r55:QI)*r51:HI
      REG_DEAD r55:QI
      REG_DEAD r51:HI
Can't combine i2 into i3

that's because the RTL into combine in the bad case is

(insn 19 22 20 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (expr_list:REG_DEAD (reg:QI 54 [ b+1 ])
        (nil)))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 8 20 9 2 (set (reg:HI 52 [ a ])
        (zero_extend:HI (reg/v:QI 48 [ a ]))) "t.ii":4:49 635
{zero_extendqihi2}
     (expr_list:REG_DEAD (reg/v:QI 48 [ a ])
        (nil)))
(insn 9 8 14 2 (set (reg:HI 50)
        (mult:HI (reg:HI 51 [ b+1 ])
            (reg:HI 52 [ a ]))) "t.ii":4:47 328 {*mulhi3_enh_split}
     (expr_list:REG_DEAD (reg:HI 52 [ a ])
        (expr_list:REG_DEAD (reg:HI 51 [ b+1 ])
            (nil))))

so the 'b' operand of the multiplication is now not a zero_extend:HI
but instead a two instruction set.  The first subreg pass produces
this IL, turning

(insn 3 2 4 2 (set (reg/v:HI 49 [ b ])
        (reg:HI 22 r22 [ b ])) "t.ii":3:49 101 {*movhi_split}
     (nil))
(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (reg/v:HI 49 [ b ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

into

(insn 17 2 18 2 (set (reg:QI 53 [ b ])
        (reg:QI 22 r22 [ b ])) "t.ii":3:49 86 {movqi_insn_split}
     (nil))
(insn 18 17 4 2 (set (reg:QI 54 [ b+1 ])
        (reg:QI 23 r23 [ b+1 ])) "t.ii":3:49 86 {movqi_insn_split}
     (nil))
(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))

with -fno-split-wide-types the rotate doesn't get a zero_extend so the
multiplication pattern doesn't match either.

I think that possibly the lower subreg pass should more optimally
handle the situation, creating

(insn ... (set (zero_extend:HI (reg:QI 54 [ b + 1])))

here.  I'm quite sure combine/forwprop cannot combine the seemingly
unrelated subreg sets.  resolve_shift_zext seems to be supposed to
handle this and it receives

(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (concatn/v:HI [
                    (reg:QI 53 [ b ])
                    (reg:QI 54 [ b+1 ])
                ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

maybe for GET_CODE (op) != ASHIFTRT && offset1 == 0 && shift_count <=
BITS_PER_WORD this can be directly emitted as zero_extend (if supported
by the target).

  parent reply	other threads:[~2023-04-12 12:21 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11 19:19 [Bug c++/109476] New: " klaus.doldinger64 at googlemail dot com
2023-04-11 19:27 ` [Bug target/109476] " klaus.doldinger64 at googlemail dot com
2023-04-12 11:14 ` rguenth at gcc dot gnu.org
2023-04-12 11:33 ` klaus.doldinger64 at googlemail dot com
2023-04-12 12:21 ` rguenth at gcc dot gnu.org [this message]
2023-04-12 15:43 ` [Bug rtl-optimization/109476] " segher at gcc dot gnu.org
2023-04-12 16:48 ` klaus.doldinger64 at googlemail dot com
2023-04-12 18:43 ` roger at nextmovesoftware dot com
2023-04-12 20:20 ` klaus.doldinger64 at googlemail dot com
2023-04-12 20:56 ` segher at gcc dot gnu.org
2023-04-13  7:01 ` rguenth at gcc dot gnu.org
2023-04-13 12:55 ` klaus.doldinger64 at googlemail dot com
2023-04-13 15:19 ` segher at gcc dot gnu.org
2023-04-13 17:57 ` klaus.doldinger64 at googlemail dot com
2023-04-13 21:49 ` roger at nextmovesoftware dot com
2023-04-14  7:42 ` klaus.doldinger64 at googlemail dot com
2023-04-14  7:46 ` klaus.doldinger64 at googlemail dot com
2023-04-23 20:22 ` roger at nextmovesoftware dot com
2023-04-28 13:22 ` cvs-commit at gcc dot gnu.org
2023-05-07  8:30 ` roger at nextmovesoftware dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109476-4-LRxsIIvFAz@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).