public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression
@ 2023-04-11 19:19 klaus.doldinger64 at googlemail dot com
  2023-04-11 19:27 ` [Bug target/109476] " klaus.doldinger64 at googlemail dot com
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-11 19:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

            Bug ID: 109476
           Summary: Missing optimization for 8bit/8bit multiplication /
                    regression
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: klaus.doldinger64 at googlemail dot com
  Target Milestone: ---

For avr-gcc > 4.6.4 the follwing function

uint16_t mul(const uint8_t a, const uint16_t b) {
    return static_cast<uint8_t>((b >> 8) + 0) * a ;
}

produces suboptimal

mul(unsigned char, unsigned int):
        mov r18,r23
        ldi r19,0
        mov r20,r24
        mul r20,r18
        movw r24,r0
        mul r20,r19
        add r25,r0
        clr __zero_reg__
        ret

whereas avr-gcc 4.6.4 produces optimal

mul(unsigned char, unsigned int):
        mul r23,r24
        movw r24,r0
        clr r1
        ret

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
@ 2023-04-11 19:27 ` klaus.doldinger64 at googlemail dot com
  2023-04-12 11:14 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-11 19:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #1 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
Inetristingly changing the function to 


uint16_t mul(const uint8_t a, const uint16_t b) {
    return static_cast<uint8_t>((b >> 8) + 1) * a ;
}

produces optimal

mul(unsigned char, unsigned int):
        subi r23,lo8(-(1))
        mul r23,r24
        movw r24,r0
        clr __zero_reg__
        ret

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
  2023-04-11 19:27 ` [Bug target/109476] " klaus.doldinger64 at googlemail dot com
@ 2023-04-12 11:14 ` rguenth at gcc dot gnu.org
  2023-04-12 11:33 ` klaus.doldinger64 at googlemail dot com
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-12 11:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I get

_Z3mulhj:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
        mov r22,r24
        mov r24,r23
        ldi r23,0
        ldi r25,0
        rcall __mulhi3

but probably because you didn't specify the actual command to compile and
I'm using the wrong subarchitecture.

What probably regressed is that we're promoting the multiplication to
'int'.  The same happens for the + 1 variant for me though.

Can you share the -march required to have 16bit multiplication?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
  2023-04-11 19:27 ` [Bug target/109476] " klaus.doldinger64 at googlemail dot com
  2023-04-12 11:14 ` rguenth at gcc dot gnu.org
@ 2023-04-12 11:33 ` klaus.doldinger64 at googlemail dot com
  2023-04-12 12:21 ` [Bug rtl-optimization/109476] " rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-12 11:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #3 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
Sorry, I forgot to mention the flags: -Os -mmcu=atmega328 

Maybe CompilerExplorer ist also usefull:

https://godbolt.org/z/zsq6PT1xb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (2 preceding siblings ...)
  2023-04-12 11:33 ` klaus.doldinger64 at googlemail dot com
@ 2023-04-12 12:21 ` rguenth at gcc dot gnu.org
  2023-04-12 15:43 ` segher at gcc dot gnu.org
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-12 12:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |rtl-optimization
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-04-12
             Status|UNCONFIRMED                 |NEW
                 CC|                            |iant at google dot com,
                   |                            |law at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the good case (with + 1) combine succeeds:

Trying 9 -> 11:
    9: r55:HI=zero_extend(r54:QI)
      REG_DEAD r54:QI
   11: r52:HI=r55:HI*r56:HI
      REG_DEAD r56:HI
      REG_DEAD r55:HI
Successfully matched this instruction:
(set (reg:HI 52)
    (mult:HI (zero_extend:HI (reg:QI 54))
        (reg:HI 56 [ a ])))
allowing combination of insns 9 and 11
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 9.
modifying insn i3    11: r52:HI=zero_extend(r54:QI)*r56:HI
      REG_DEAD r54:QI
      REG_DEAD r56:HI
deferring rescan insn with uid = 11.

and then

Trying 10 -> 11:
   10: r56:HI=zero_extend(r61:QI)
      REG_DEAD r61:QI
   11: r52:HI=zero_extend(r54:QI)*r56:HI
      REG_DEAD r54:QI
      REG_DEAD r56:HI
Successfully matched this instruction:
(set (reg:HI 52)
    (mult:HI (zero_extend:HI (reg:QI 54))
        (zero_extend:HI (reg:QI 61))))
allowing combination of insns 10 and 11
original costs 4 + 20 = 24
replacement cost 12
deferring deletion of insn with uid = 10.
modifying insn i3    11: r52:HI=zero_extend(r54:QI)*zero_extend(r61:QI)
      REG_DEAD r61:QI
      REG_DEAD r54:QI
deferring rescan insn with uid = 11.


in the bad case instead

Trying 8 -> 9:
    8: r52:HI=zero_extend(r55:QI)
      REG_DEAD r55:QI
    9: r50:HI=r51:HI*r52:HI
      REG_DEAD r52:HI
      REG_DEAD r51:HI
Successfully matched this instruction:
(set (reg:HI 50)
    (mult:HI (zero_extend:HI (reg:QI 55))
        (reg:HI 51 [ b+1 ])))
allowing combination of insns 8 and 9
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 8.
modifying insn i3     9: r50:HI=zero_extend(r55:QI)*r51:HI
      REG_DEAD r55:QI
      REG_DEAD r51:HI
deferring rescan insn with uid = 9.

Trying 20 -> 9:
   20: r51:HI#1=0
    9: r50:HI=zero_extend(r55:QI)*r51:HI
      REG_DEAD r55:QI
      REG_DEAD r51:HI
Can't combine i2 into i3

that's because the RTL into combine in the bad case is

(insn 19 22 20 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (expr_list:REG_DEAD (reg:QI 54 [ b+1 ])
        (nil)))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 8 20 9 2 (set (reg:HI 52 [ a ])
        (zero_extend:HI (reg/v:QI 48 [ a ]))) "t.ii":4:49 635
{zero_extendqihi2}
     (expr_list:REG_DEAD (reg/v:QI 48 [ a ])
        (nil)))
(insn 9 8 14 2 (set (reg:HI 50)
        (mult:HI (reg:HI 51 [ b+1 ])
            (reg:HI 52 [ a ]))) "t.ii":4:47 328 {*mulhi3_enh_split}
     (expr_list:REG_DEAD (reg:HI 52 [ a ])
        (expr_list:REG_DEAD (reg:HI 51 [ b+1 ])
            (nil))))

so the 'b' operand of the multiplication is now not a zero_extend:HI
but instead a two instruction set.  The first subreg pass produces
this IL, turning

(insn 3 2 4 2 (set (reg/v:HI 49 [ b ])
        (reg:HI 22 r22 [ b ])) "t.ii":3:49 101 {*movhi_split}
     (nil))
(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (reg/v:HI 49 [ b ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

into

(insn 17 2 18 2 (set (reg:QI 53 [ b ])
        (reg:QI 22 r22 [ b ])) "t.ii":3:49 86 {movqi_insn_split}
     (nil))
(insn 18 17 4 2 (set (reg:QI 54 [ b+1 ])
        (reg:QI 23 r23 [ b+1 ])) "t.ii":3:49 86 {movqi_insn_split}
     (nil))
(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))

with -fno-split-wide-types the rotate doesn't get a zero_extend so the
multiplication pattern doesn't match either.

I think that possibly the lower subreg pass should more optimally
handle the situation, creating

(insn ... (set (zero_extend:HI (reg:QI 54 [ b + 1])))

here.  I'm quite sure combine/forwprop cannot combine the seemingly
unrelated subreg sets.  resolve_shift_zext seems to be supposed to
handle this and it receives

(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (concatn/v:HI [
                    (reg:QI 53 [ b ])
                    (reg:QI 54 [ b+1 ])
                ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

maybe for GET_CODE (op) != ASHIFTRT && offset1 == 0 && shift_count <=
BITS_PER_WORD this can be directly emitted as zero_extend (if supported
by the target).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (3 preceding siblings ...)
  2023-04-12 12:21 ` [Bug rtl-optimization/109476] " rguenth at gcc dot gnu.org
@ 2023-04-12 15:43 ` segher at gcc dot gnu.org
  2023-04-12 16:48 ` klaus.doldinger64 at googlemail dot com
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: segher at gcc dot gnu.org @ 2023-04-12 15:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Correct, this certainly can not be done by combine, it see two independent
pseudos here.  For hard registers it *can* do many tricks, but not for
pseudos like this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (4 preceding siblings ...)
  2023-04-12 15:43 ` segher at gcc dot gnu.org
@ 2023-04-12 16:48 ` klaus.doldinger64 at googlemail dot com
  2023-04-12 18:43 ` roger at nextmovesoftware dot com
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-12 16:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #6 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
The following "solution"

constexpr uint16_t mul(const uint8_t a, const uint16_t b) {
    const auto aa = std::bit_cast<std::array<uint8_t, 2>>(b);
    return aa[1] * a;
}

or 

constexpr uint16_t mul(const uint8_t a, const uint16_t b) {
    uint8_t aa[2];
    std::memcpy(aa, &b, 2);
    return aa[1] * a;
}

gives optimal result ;-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (5 preceding siblings ...)
  2023-04-12 16:48 ` klaus.doldinger64 at googlemail dot com
@ 2023-04-12 18:43 ` roger at nextmovesoftware dot com
  2023-04-12 20:20 ` klaus.doldinger64 at googlemail dot com
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-04-12 18:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
Created attachment 54843
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54843&action=edit
proposed patch

Proposed patch.  Does this look reasonable?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (6 preceding siblings ...)
  2023-04-12 18:43 ` roger at nextmovesoftware dot com
@ 2023-04-12 20:20 ` klaus.doldinger64 at googlemail dot com
  2023-04-12 20:56 ` segher at gcc dot gnu.org
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-12 20:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #8 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
Yes. Looks like the patch does its job.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (7 preceding siblings ...)
  2023-04-12 20:20 ` klaus.doldinger64 at googlemail dot com
@ 2023-04-12 20:56 ` segher at gcc dot gnu.org
  2023-04-13  7:01 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: segher at gcc dot gnu.org @ 2023-04-12 20:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
That patch looks fine :-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (8 preceding siblings ...)
  2023-04-12 20:56 ` segher at gcc dot gnu.org
@ 2023-04-13  7:01 ` rguenth at gcc dot gnu.org
  2023-04-13 12:55 ` klaus.doldinger64 at googlemail dot com
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-13  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #7)
> Created attachment 54843 [details]
> proposed patch
> 
> Proposed patch.  Does this look reasonable?

Yes, but doesn't this work for all GET_CODE (op) != ASHIFTRT?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (9 preceding siblings ...)
  2023-04-13  7:01 ` rguenth at gcc dot gnu.org
@ 2023-04-13 12:55 ` klaus.doldinger64 at googlemail dot com
  2023-04-13 15:19 ` segher at gcc dot gnu.org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-13 12:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #11 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
After testing some more code, I got this ICE:

/home/lmeier/Projekte/wmucpp/boards/rcfoc/gimbal_sbus_01.cc: In static member
function 'static void GlobalFsm<BusDevs>::ratePeriodic() [with BusDevs
=BusDevs<External::Bus::NoBus<Devices<0> > >]':
/home/lmeier/Projekte/wmucpp/boards/rcfoc/gimbal_sbus_01.cc:426:5: error:
unrecognizable insn:
426 |     }
|     ^
(insn 1584 1583 25 3 (set (concatn:HI [
(reg:QI 800)
(reg:QI 801 [+1 ])
])
(reg:HI 826)) "../../include0/external/sbus/sbus.h":226:69 -1
(nil))
during RTL pass: subreg1
/home/lmeier/Projekte/wmucpp/boards/rcfoc/gimbal_sbus_01.cc:426:5: internal
compiler error: in extract_insn, at recog.cc:2791
0x79f2cb _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../gcc/rtl-error.cc:108
0x79f2e7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../gcc/rtl-error.cc:116
0x79dc77 extract_insn(rtx_insn*)
../../gcc/recog.cc:2791
0x1a43d91 decompose_multiword_subregs
../../gcc/lower-subreg.cc:1651
0x1a447ca execute
../../gcc/lower-subreg.cc:1790
Please submit a full bug report, with preprocessed source (by using
-freport-bug).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (10 preceding siblings ...)
  2023-04-13 12:55 ` klaus.doldinger64 at googlemail dot com
@ 2023-04-13 15:19 ` segher at gcc dot gnu.org
  2023-04-13 17:57 ` klaus.doldinger64 at googlemail dot com
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: segher at gcc dot gnu.org @ 2023-04-13 15:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
With the modified compiler?  Does it ICE with an unmodified compiler as well?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (11 preceding siblings ...)
  2023-04-13 15:19 ` segher at gcc dot gnu.org
@ 2023-04-13 17:57 ` klaus.doldinger64 at googlemail dot com
  2023-04-13 21:49 ` roger at nextmovesoftware dot com
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-13 17:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #13 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
Yes, the ICE is with the patch. I'm pretty sure that this does not happen
without the patch, but I will that check again.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (12 preceding siblings ...)
  2023-04-13 17:57 ` klaus.doldinger64 at googlemail dot com
@ 2023-04-13 21:49 ` roger at nextmovesoftware dot com
  2023-04-14  7:42 ` klaus.doldinger64 at googlemail dot com
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-04-13 21:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #14 from Roger Sayle <roger at nextmovesoftware dot com> ---
My apologies for the delay/issues.  My bootstrap and regression testing of this
patch (on x86_64-pc-linux-gnu) revealed an issue or two (including the reported
ICE).  My plan was to fix/resolve all these before posting a concrete
submission to gcc-patches.  The general approach is solid (thanks to everyone
that agreed this was the correct place to fix things) but it's the corner cases
(such as RTL sharing) that all need to be addressed prior to a reviewable
submission.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (13 preceding siblings ...)
  2023-04-13 21:49 ` roger at nextmovesoftware dot com
@ 2023-04-14  7:42 ` klaus.doldinger64 at googlemail dot com
  2023-04-14  7:46 ` klaus.doldinger64 at googlemail dot com
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-14  7:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #15 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
Just checked actual gcc 13.0.1 without the patch: then no ICE accurs.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (14 preceding siblings ...)
  2023-04-14  7:42 ` klaus.doldinger64 at googlemail dot com
@ 2023-04-14  7:46 ` klaus.doldinger64 at googlemail dot com
  2023-04-23 20:22 ` roger at nextmovesoftware dot com
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-14  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #16 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
(In reply to Roger Sayle from comment #14)
> My apologies for the delay/issues.  My bootstrap and regression testing of
> this patch (on x86_64-pc-linux-gnu) revealed an issue or two (including the
> reported ICE).  My plan was to fix/resolve all these before posting a
> concrete submission to gcc-patches.  

We all appreciate your great effort in this case! Please don't hesitate to send
here some patches to test with. I'll be happy to test your patches!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (15 preceding siblings ...)
  2023-04-14  7:46 ` klaus.doldinger64 at googlemail dot com
@ 2023-04-23 20:22 ` roger at nextmovesoftware dot com
  2023-04-28 13:22 ` cvs-commit at gcc dot gnu.org
  2023-05-07  8:30 ` roger at nextmovesoftware dot com
  18 siblings, 0 replies; 20+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-04-23 20:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com

--- Comment #17 from Roger Sayle <roger at nextmovesoftware dot com> ---
I've submitted an improved version of my patch for review:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616527.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (16 preceding siblings ...)
  2023-04-23 20:22 ` roger at nextmovesoftware dot com
@ 2023-04-28 13:22 ` cvs-commit at gcc dot gnu.org
  2023-05-07  8:30 ` roger at nextmovesoftware dot com
  18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-28 13:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:650c36ec461a722d9c65e82512b4c3aeec2ffee1

commit r14-335-g650c36ec461a722d9c65e82512b4c3aeec2ffee1
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Apr 28 14:21:53 2023 +0100

    PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.

    This patch fixes PR rtl-optimization/109476, which is a code quality
    regression affecting AVR.  The cause is that the lower-subreg pass is
    sometimes overly aggressive, lowering the LSHIFTRT below:

    (insn 7 4 8 2 (set (reg:HI 51)
            (lshiftrt:HI (reg/v:HI 49 [ b ])
                (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
         (nil))

    into a pair of QImode SUBREG assignments:

    (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
            (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
         (nil))
    (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
            (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
         (nil))

    but this idiom, SETs of SUBREGs, interferes with combine's ability
    to associate/fuse instructions.  The solution, on targets that
    have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
    wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
    is to split/lower LSHIFTRT to a ZERO_EXTEND.

    To answer Richard's question in comment #10 of the bugzilla PR,
    the function resolve_shift_zext is called with one of four RTX
    codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
    LSHIFTRT can the setting of low_part and high_part SUBREGs be
    replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
    extension, so don't set the high_part to zero; if we're splitting
    a ZERO_EXTEND then it doesn't make sense to replace it with a
    ZERO_EXTEND, and for ASHIFT we've played games to swap the
    high_part and low_part SUBREGs, so that we assign the low_part
    to zero (for double word shifts by greater than word size bits).

    2023-04-28  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR rtl-optimization/109476
            * lower-subreg.cc: Include explow.h for force_reg.
            (find_decomposable_shift_zext): Pass an additional SPEED_P
argument.
            If decomposing a suitable LSHIFTRT and we're not splitting
            ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
            instead of setting a high part SUBREG to zero, which helps combine.
            (decompose_multiword_subregs): Update call to resolve_shift_zext.

    gcc/testsuite/ChangeLog
            PR rtl-optimization/109476
            * gcc.target/avr/mmcu/pr109476.c: New test case.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression
  2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
                   ` (17 preceding siblings ...)
  2023-04-28 13:22 ` cvs-commit at gcc dot gnu.org
@ 2023-05-07  8:30 ` roger at nextmovesoftware dot com
  18 siblings, 0 replies; 20+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-05-07  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
   Target Milestone|---                         |14.0
         Resolution|---                         |FIXED

--- Comment #19 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed on mainline.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-05-07  8:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-11 19:19 [Bug c++/109476] New: Missing optimization for 8bit/8bit multiplication / regression klaus.doldinger64 at googlemail dot com
2023-04-11 19:27 ` [Bug target/109476] " klaus.doldinger64 at googlemail dot com
2023-04-12 11:14 ` rguenth at gcc dot gnu.org
2023-04-12 11:33 ` klaus.doldinger64 at googlemail dot com
2023-04-12 12:21 ` [Bug rtl-optimization/109476] " rguenth at gcc dot gnu.org
2023-04-12 15:43 ` segher at gcc dot gnu.org
2023-04-12 16:48 ` klaus.doldinger64 at googlemail dot com
2023-04-12 18:43 ` roger at nextmovesoftware dot com
2023-04-12 20:20 ` klaus.doldinger64 at googlemail dot com
2023-04-12 20:56 ` segher at gcc dot gnu.org
2023-04-13  7:01 ` rguenth at gcc dot gnu.org
2023-04-13 12:55 ` klaus.doldinger64 at googlemail dot com
2023-04-13 15:19 ` segher at gcc dot gnu.org
2023-04-13 17:57 ` klaus.doldinger64 at googlemail dot com
2023-04-13 21:49 ` roger at nextmovesoftware dot com
2023-04-14  7:42 ` klaus.doldinger64 at googlemail dot com
2023-04-14  7:46 ` klaus.doldinger64 at googlemail dot com
2023-04-23 20:22 ` roger at nextmovesoftware dot com
2023-04-28 13:22 ` cvs-commit at gcc dot gnu.org
2023-05-07  8:30 ` roger at nextmovesoftware dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).