public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [committed] Convert xstormy16 to LRA
@ 2023-05-01 23:37 Roger Sayle
  2023-05-02 12:40 ` Paul Koning
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Roger Sayle @ 2023-05-01 23:37 UTC (permalink / raw)
  To: 'Jeff Law'; +Cc: 'GCC Patches', 'Segher Boessenkool'


Jeff Law wrote:
> This patch converts the xstormy16 patch to LRA.  It introduces a code 
> quality regression in the shiftsi testcase, but it also fixes numerous 
> aborts/errors.  IMHO it's a good tradeoff.

I've investigated the shiftsi regression on xstormy16 and the underlying
cause
appears to be an interaction between lower-subreg's "subreg3" pass and the
new LRA.  Previously, reload was not phased by the "clobbers" that are 
introduced by the decompose_multiword_subregs function, but they appear
to interfere with LRA's register assignments.

combine's make_extra_copies introduces a new pseudo-to-pseudo move,
but when subreg3 inserts a naked clobber between the original and the
new move, LRA is recombine theses pseudos back to the same allocno.

The shiftsi.cc regression on xstormy16 is fixed by adding
-fno-split-wide-types.
In fact, if all the regression tests pass, I'd suggest that
flag_split_wide-types = false
should be the default on xstormy16 now that we've moved to LRA.  And if this
works for xstormy16, it might be useful to other targets for the LRA
transition;
it's a difference in behaviour between reload and LRA that could potentially
affect multiple targets.

For reference, xstormy16 has a post-reload define_insn_and_split for movsi
(i.e. a multi-word move).  If this insn was split during split1 (i.e. before
subreg3)
there wouldn't be a problem (no clobber), but alas the target's
xstormy16_split_move
function has several asserts insisting this only get called when
reload_completed.

I hope this is useful.
Cheers,
Roger
--



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-01 23:37 [committed] Convert xstormy16 to LRA Roger Sayle
@ 2023-05-02 12:40 ` Paul Koning
  2023-05-02 13:18   ` Roger Sayle
  2023-05-02 15:57 ` Jeff Law
  2023-05-11 15:05 ` Hans-Peter Nilsson
  2 siblings, 1 reply; 19+ messages in thread
From: Paul Koning @ 2023-05-02 12:40 UTC (permalink / raw)
  To: Roger Sayle; +Cc: Jeff Law, GCC Patches, Segher Boessenkool



> On May 1, 2023, at 7:37 PM, Roger Sayle <roger@nextmovesoftware.com> wrote:
> 
> ...
> The shiftsi.cc regression on xstormy16 is fixed by adding
> -fno-split-wide-types.
> In fact, if all the regression tests pass, I'd suggest that
> flag_split_wide-types = false
> should be the default on xstormy16 now that we've moved to LRA.  And if this
> works for xstormy16, it might be useful to other targets for the LRA
> transition;
> it's a difference in behaviour between reload and LRA that could potentially
> affect multiple targets.

Is there documentation for that flag?  

	paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [committed] Convert xstormy16 to LRA
  2023-05-02 12:40 ` Paul Koning
@ 2023-05-02 13:18   ` Roger Sayle
  2023-05-02 13:49     ` Segher Boessenkool
  2023-05-02 14:11     ` Paul Koning
  0 siblings, 2 replies; 19+ messages in thread
From: Roger Sayle @ 2023-05-02 13:18 UTC (permalink / raw)
  To: 'Paul Koning'
  Cc: 'Jeff Law', 'GCC Patches', 'Segher Boessenkool'


On 02 May 2023 13:40, Paul Koning wrote:
> > On May 1, 2023, at 7:37 PM, Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > ...
> > The shiftsi.cc regression on xstormy16 is fixed by adding
> > -fno-split-wide-types.
> > In fact, if all the regression tests pass, I'd suggest that
> > flag_split_wide-types = false should be the default on xstormy16 now
> > that we've moved to LRA.  And if this works for xstormy16, it might be
> > useful to other targets for the LRA transition; it's a difference in
> > behaviour between reload and LRA that could potentially affect
> > multiple targets.
> 
> Is there documentation for that flag?

Yes, see the section -fsplit-wide-types in
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Interestingly, there's a recent-ish blog describing how
-fno-split-wide-types
reduces executable size on AVR:
https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-size.html
and its interaction with (AVR) register allocation is seen in PR
middle-end/35860.

Cheers,
Roger
--



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-02 13:18   ` Roger Sayle
@ 2023-05-02 13:49     ` Segher Boessenkool
  2023-05-02 16:20       ` Roger Sayle
  2023-05-02 14:11     ` Paul Koning
  1 sibling, 1 reply; 19+ messages in thread
From: Segher Boessenkool @ 2023-05-02 13:49 UTC (permalink / raw)
  To: Roger Sayle
  Cc: 'Paul Koning', 'Jeff Law', 'GCC Patches'

Hi!

On Tue, May 02, 2023 at 02:18:43PM +0100, Roger Sayle wrote:
> On 02 May 2023 13:40, Paul Koning wrote:
> > > On May 1, 2023, at 7:37 PM, Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > > The shiftsi.cc regression on xstormy16 is fixed by adding
> > > -fno-split-wide-types.
> > > In fact, if all the regression tests pass, I'd suggest that
> > > flag_split_wide-types = false should be the default on xstormy16 now
> > > that we've moved to LRA.  And if this works for xstormy16, it might be
> > > useful to other targets for the LRA transition; it's a difference in
> > > behaviour between reload and LRA that could potentially affect
> > > multiple targets.
> > 
> > Is there documentation for that flag?
> 
> Yes, see the section -fsplit-wide-types in
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> 
> Interestingly, there's a recent-ish blog describing how
> -fno-split-wide-types
> reduces executable size on AVR:
> https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-size.html
> and its interaction with (AVR) register allocation is seen in PR
> middle-end/35860.

But what causes the problem?  Is something missing somewhere, or do we
get too high register pressure?

There also is -fsplit-wide-types-early, which might help.

In principle it always is good to describe a machine model as close as
possible to what the machine actually does.  What gets in the way here?


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-02 13:18   ` Roger Sayle
  2023-05-02 13:49     ` Segher Boessenkool
@ 2023-05-02 14:11     ` Paul Koning
  2023-05-02 15:47       ` Segher Boessenkool
  1 sibling, 1 reply; 19+ messages in thread
From: Paul Koning @ 2023-05-02 14:11 UTC (permalink / raw)
  To: Roger Sayle; +Cc: Jeff Law, GCC Patches, Segher Boessenkool



> On May 2, 2023, at 9:18 AM, Roger Sayle <roger@nextmovesoftware.com> wrote:
> 
> 
> On 02 May 2023 13:40, Paul Koning wrote:
>>> On May 1, 2023, at 7:37 PM, Roger Sayle <roger@nextmovesoftware.com>
>> wrote:
>>> 
>>> ...
>>> The shiftsi.cc regression on xstormy16 is fixed by adding
>>> -fno-split-wide-types.
>>> In fact, if all the regression tests pass, I'd suggest that
>>> flag_split_wide-types = false should be the default on xstormy16 now
>>> that we've moved to LRA.  And if this works for xstormy16, it might be
>>> useful to other targets for the LRA transition; it's a difference in
>>> behaviour between reload and LRA that could potentially affect
>>> multiple targets.
>> 
>> Is there documentation for that flag?
> 
> Yes, see the section -fsplit-wide-types in
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Thanks.  So I'm wondering why that would be a problem.

The obvious question is whether it interacts badly with MD file entries that describe wide operations, perhaps with constraints that require things like odd/even register pairs.  But I would assume all that gets handled.

Along the same lines, why would a target, or a user, not do early wide splitting all the time?  The documentation for that option gives no clue why it would ever be bad.

	paul



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-02 14:11     ` Paul Koning
@ 2023-05-02 15:47       ` Segher Boessenkool
  0 siblings, 0 replies; 19+ messages in thread
From: Segher Boessenkool @ 2023-05-02 15:47 UTC (permalink / raw)
  To: Paul Koning; +Cc: Roger Sayle, Jeff Law, GCC Patches

On Tue, May 02, 2023 at 10:11:27AM -0400, Paul Koning wrote:
> > On May 2, 2023, at 9:18 AM, Roger Sayle <roger@nextmovesoftware.com> wrote:
> > Yes, see the section -fsplit-wide-types in
> > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> 
> Thanks.  So I'm wondering why that would be a problem.
> 
> The obvious question is whether it interacts badly with MD file entries that describe wide operations, perhaps with constraints that require things like odd/even register pairs.  But I would assume all that gets handled.
> 
> Along the same lines, why would a target, or a user, not do early wide splitting all the time?  The documentation for that option gives no clue why it would ever be bad.

In <https://inbox.sourceware.org/gcc-patches/0c1cd0e5706087d51c0d981a313786990ddcad89.1562518763.git.segher@kernel.crashing.org/>
(the patch that created -fsplit-wide-types-early) I say "At least for
targets that do not have RTL patterns for operations on multi-register
modes it is a lot better to split patterns earlier, before combine and
all related passes."  If your target does in fact have patterns for
multi-reg modes it presumably wants those to be used preferably.

The patch did not change the default because that always is a lot of
pain.  The cost-benefit analysis did not work out here (for me!)


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-01 23:37 [committed] Convert xstormy16 to LRA Roger Sayle
  2023-05-02 12:40 ` Paul Koning
@ 2023-05-02 15:57 ` Jeff Law
  2023-05-11 15:05 ` Hans-Peter Nilsson
  2 siblings, 0 replies; 19+ messages in thread
From: Jeff Law @ 2023-05-02 15:57 UTC (permalink / raw)
  To: Roger Sayle; +Cc: 'GCC Patches', 'Segher Boessenkool'



On 5/1/23 17:37, Roger Sayle wrote:
> 
> Jeff Law wrote:
>> This patch converts the xstormy16 patch to LRA.  It introduces a code
>> quality regression in the shiftsi testcase, but it also fixes numerous
>> aborts/errors.  IMHO it's a good tradeoff.
> 
> I've investigated the shiftsi regression on xstormy16 and the underlying
> cause
> appears to be an interaction between lower-subreg's "subreg3" pass and the
> new LRA.  Previously, reload was not phased by the "clobbers" that are
> introduced by the decompose_multiword_subregs function, but they appear
> to interfere with LRA's register assignments.
> 
> combine's make_extra_copies introduces a new pseudo-to-pseudo move,
> but when subreg3 inserts a naked clobber between the original and the
> new move, LRA is recombine theses pseudos back to the same allocno.
> 
> The shiftsi.cc regression on xstormy16 is fixed by adding
> -fno-split-wide-types.
> In fact, if all the regression tests pass, I'd suggest that
> flag_split_wide-types = false
> should be the default on xstormy16 now that we've moved to LRA.  And if this
> works for xstormy16, it might be useful to other targets for the LRA
> transition;
> it's a difference in behaviour between reload and LRA that could potentially
> affect multiple targets.
> 
> For reference, xstormy16 has a post-reload define_insn_and_split for movsi
> (i.e. a multi-word move).  If this insn was split during split1 (i.e. before
> subreg3)
> there wouldn't be a problem (no clobber), but alas the target's
> xstormy16_split_move
> function has several asserts insisting this only get called when
> reload_completed.
> 
> I hope this is useful.
It is.

FWIW, turning that on by default for xstormy results in two new fails, 
but also in two new passes:

> Tests that now fail, but worked before (2 tests):
> 
> xstormy16-sim: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline "hooray[^\\n]*inline copy in test"
> xstormy16-sim: gcc.dg/setjmp-1.c spurious clobbered warning (test for bogus messages, line 17)
> 
> Tests that now work, but didn't before (2 tests):
> 
> xstormy16-sim: gcc.target/xstormy16/shiftsi.c --save-temps -fno-inline-functions -L/home/jlaw/jenkins/workspace/xstormy16-elf/gcc/gcc/testsuite/gcc.target/xstormy16  scan-assembler-not mov 
> xstormy16-sim: gcc.target/xstormy16/zextendhisi2.c --save-temps -fno-inline-functions -L/home/jlaw/jenkins/workspace/xstormy16-elf/gcc/gcc/testsuite/gcc.target/xstormy16  scan-assembler mov r3,#0

What I find interesting is the two failures are in generic code. 
Definitely unexpected.

jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [committed] Convert xstormy16 to LRA
  2023-05-02 13:49     ` Segher Boessenkool
@ 2023-05-02 16:20       ` Roger Sayle
  2023-05-02 21:55         ` Segher Boessenkool
  0 siblings, 1 reply; 19+ messages in thread
From: Roger Sayle @ 2023-05-02 16:20 UTC (permalink / raw)
  To: 'Segher Boessenkool'
  Cc: 'Paul Koning', 'Jeff Law', 'GCC Patches'


On 02 May 2023 14:49, Segher Boessenkool wrote:
> On Tue, May 02, 2023 at 02:18:43PM +0100, Roger Sayle wrote:
> > On 02 May 2023 13:40, Paul Koning wrote:
> > > > On May 1, 2023, at 7:37 PM, Roger Sayle
> > > > <roger@nextmovesoftware.com>
> > > wrote:
> > > > The shiftsi.cc regression on xstormy16 is fixed by adding
> > > > -fno-split-wide-types.
> > > > In fact, if all the regression tests pass, I'd suggest that
> > > > flag_split_wide-types = false should be the default on xstormy16
> > > > now that we've moved to LRA.  And if this works for xstormy16, it
> > > > might be useful to other targets for the LRA transition; it's a
> > > > difference in behaviour between reload and LRA that could
> > > > potentially affect multiple targets.
> > >
> > > Is there documentation for that flag?
> >
> > Yes, see the section -fsplit-wide-types in
> > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> >
> > Interestingly, there's a recent-ish blog describing how
> > -fno-split-wide-types reduces executable size on AVR:
> > https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-siz
> > e.html and its interaction with (AVR) register allocation is seen in
> > PR middle-end/35860.
> 
> But what causes the problem?  Is something missing somewhere, or do we get
too
> high register pressure?
> 
> There also is -fsplit-wide-types-early, which might help.
> 
> In principle it always is good to describe a machine model as close as
possible to
> what the machine actually does.  What gets in the way here?

I can describe what GCC is doing, but I don't claim to understand how the
register allocators make the choices they do (nor where the fix should be).

The problematic test case is:
unsigned long foo(unsigned long x) { return x << 1; }
which with (the old) reload (and -fno-split-wide-types) is generated as:
foo:    add r2,r2
        adc r3,r3
        ret
but with (the new) lra is generated as:
        mov r6,r2
        mov r7,r3
        mov r2,r6
        mov r3,r7
        add r2,r6
        adc r3,r7
        ret

where the register allocator has failed to notice that (take advantage of)
the SImode addition can be performed "in-place", from the insn's
constraints:

;; Addition
(define_insn_and_split "addsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (plus:SI (match_operand:SI 1 "register_operand" "%0")
                 (match_operand:SI 2 "nonmemory_operand" "ri")))
   (clobber (reg:BI CARRY_REG))]
  ""
  "#"
  "reload_completed"
  [(pc)]
  { xstormy16_expand_arith (SImode, PLUS, operands[0], operands[1],
                            operands[2]);
    DONE;
  }
  [(set_attr "length" "4")])


Before combine, we have simplicity itself (a two instruction function):
(insn 2 4 3 2 (set (reg/v:SI 26 [ x ])
        (reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 7 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
        (nil)))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

Then combine inserts an additional copy:
(insn 14 4 2 2 (set (reg:SI 29)
        (reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 -1
     (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
        (nil)))
(insn 2 14 3 2 (set (reg/v:SI 26 [ x ])
        (reg:SI 29)) "../../shiftsi.c":4:41 7 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 29)
        (nil)))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

The just before reload, subreg3 "splits the wide types", to produce:
(insn 15 4 16 2 (set (reg:HI 30)
        (reg:HI 2 r2 [ x ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 16 15 17 2 (set (reg:HI 31 [+2 ])
        (reg:HI 3 r3 [ x+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 17 16 18 2 (clobber (reg/v:SI 26 [ x ])) "../../shiftsi.c":4:41 -1
     (nil))
(insn 18 17 19 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 0)
        (reg:HI 30)) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 19 18 3 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 2)
        (reg:HI 31 [+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

Given this input, with the movhi_internals and the clobber, reload
was able to do something sensible, but lra fails to place pseudos 26,
28 in the same place, in pseudos 30 and 31, ideally hard regs 2 and 3.

With the -fno-split-wide-types workaround, lra/reload have the easier
task of assigning registers to the three insn case above.

I don't think it's a problem/fault with the machine description, but
how the clobber above is being interpreted by the different register
allocators.


Back in August 2022, I submitted a x86_64 backend patch that used a
peephole2 to eliminate this type of (subreg-lower) generated clobber:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599419.html
but Uros was hesitant to accept this change, without an RTL expert
explaining why the clobbers were being generated in the first place.

Like my x86_64 patch above, this issue could also probably be cleaned
up by a peephole2 for xstormy16 that straddled/removed the clobber.

My simplistic summary is that when a backend lowers a multi-word
instruction with a splitter, it (typically) does so without introducing
a clobber.  If the clobbers generated by the middle-end's automatic
lowering confuse the register allocator, then this is more efficient.
Ideally, such clobbers should be avoided (if possible) and/or LRA
improved to understand that they don't actually cause an allocno
conflict.  [but I'm in no way a register allocation expert].

Thoughts?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-02 16:20       ` Roger Sayle
@ 2023-05-02 21:55         ` Segher Boessenkool
  0 siblings, 0 replies; 19+ messages in thread
From: Segher Boessenkool @ 2023-05-02 21:55 UTC (permalink / raw)
  To: Roger Sayle
  Cc: 'Paul Koning', 'Jeff Law', 'GCC Patches'

Hi!

On Tue, May 02, 2023 at 05:20:49PM +0100, Roger Sayle wrote:
> On 02 May 2023 14:49, Segher Boessenkool wrote:
> Then combine inserts an additional copy:

Combine makes sure a pseudo-to-pseudo move remains.  Without that,
combine will seize part of RA's job, and butcher it.  It has always done
that, but the 2-2 combine patches made it clearer than before.

> (insn 17 16 18 2 (clobber (reg/v:SI 26 [ x ])) "../../shiftsi.c":4:41 -1
>      (nil))
> (insn 18 17 19 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 0)
>         (reg:HI 30)) "../../shiftsi.c":4:41 6 {movhi_internal}
>      (nil))
> (insn 19 18 3 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 2)
>         (reg:HI 31 [+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
>      (nil))

> I don't think it's a problem/fault with the machine description, but
> how the clobber above is being interpreted by the different register
> allocators.

Insn 17 is trivially dead code and should have been removed.  It
arguably should not have been created in the first place.  Why do we do
that, what is the purpose?

> Back in August 2022, I submitted a x86_64 backend patch that used a
> peephole2 to eliminate this type of (subreg-lower) generated clobber:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599419.html
> but Uros was hesitant to accept this change, without an RTL expert
> explaining why the clobbers were being generated in the first place.

It makes sure any previous value in it is regarded as dead, but since
insns 18 and 19 clearly write to the whole register anyway, this is just
extra work to later undo.  Or not undo, which is the problem here :-/

> Like my x86_64 patch above, this issue could also probably be cleaned
> up by a peephole2 for xstormy16 that straddled/removed the clobber.

Or simply not create such useless clobbers in the first place?

> My simplistic summary is that when a backend lowers a multi-word
> instruction with a splitter, it (typically) does so without introducing
> a clobber.  If the clobbers generated by the middle-end's automatic
> lowering confuse the register allocator, then this is more efficient.
> Ideally, such clobbers should be avoided (if possible) and/or LRA
> improved to understand that they don't actually cause an allocno
> conflict.  [but I'm in no way a register allocation expert].

Yup, I agree with all that.  We should not create dead code, and not be
confused by dead code either :-)


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-01 23:37 [committed] Convert xstormy16 to LRA Roger Sayle
  2023-05-02 12:40 ` Paul Koning
  2023-05-02 15:57 ` Jeff Law
@ 2023-05-11 15:05 ` Hans-Peter Nilsson
  2023-05-11 16:55   ` Paul Koning
  2023-05-12 13:53   ` Hans-Peter Nilsson
  2 siblings, 2 replies; 19+ messages in thread
From: Hans-Peter Nilsson @ 2023-05-11 15:05 UTC (permalink / raw)
  To: Roger Sayle; +Cc: jeffreyalaw, gcc-patches, segher

> From: "Roger Sayle" <roger@nextmovesoftware.com>
> Date: Tue, 2 May 2023 00:37:14 +0100

> Jeff Law wrote:
> > This patch converts the xstormy16 patch to LRA.  It introduces a code 
> > quality regression in the shiftsi testcase, but it also fixes numerous 
> > aborts/errors.  IMHO it's a good tradeoff.
> 
> I've investigated the shiftsi regression on xstormy16 and the underlying
> cause
> appears to be an interaction between lower-subreg's "subreg3" pass and the
> new LRA.  Previously, reload was not phased by the "clobbers" that are 
> introduced by the decompose_multiword_subregs function, but they appear
> to interfere with LRA's register assignments.
> 
> combine's make_extra_copies introduces a new pseudo-to-pseudo move,
> but when subreg3 inserts a naked clobber between the original and the
> new move, LRA is recombine theses pseudos back to the same allocno.
> 
> The shiftsi.cc regression on xstormy16 is fixed by adding
> -fno-split-wide-types.
> In fact, if all the regression tests pass, I'd suggest that
> flag_split_wide-types = false
> should be the default on xstormy16 now that we've moved to LRA.  And if this
> works for xstormy16, it might be useful to other targets for the LRA
> transition;
> it's a difference in behaviour between reload and LRA that could potentially
> affect multiple targets.
> 
> For reference, xstormy16 has a post-reload define_insn_and_split for movsi
> (i.e. a multi-word move).  If this insn was split during split1 (i.e. before
> subreg3)
> there wouldn't be a problem (no clobber), but alas the target's
> xstormy16_split_move
> function has several asserts insisting this only get called when
> reload_completed.
> 
> I hope this is useful.
> Cheers,
> Roger

Yes, very interesting.  Thank you for sharing this.  I've
seen regressions with LRA for CRIS too, for
"double-register-sized" types, which for CRIS, a 32-bit
target, translates to 64-bit types (DFmode and DImode), and
where LRA does a much worse job than reload; spills a lot
more often to stack, even after trying every
register-allocation-related hook I found (and also an LRA
patch which helped only by a fraction, but regressed results
on x86_64-linux, so let's quickly forget it again).

No fix or nicely stated bug entry yet, but at least a
different observation:

Coremark for cris-elf built with -O2 -march=v10, when going
from reload to LRA is slightly faster but a bit bigger (for
example before/after Jeffs r14-383-gfaf8bea79b6256, 5090593
to 5090567 cycles and 48887 to 48901 bytes), a relative
observation which has not changed much since February when I
started working on an LRA transition for CRIS.

But, the case for code with heavy use of "double-register-
sized" types is much worse; up to several percent slower.
My favorite sharable example is
gcc/testsuite/gcc.c-torture/execute/arith-rand-ll.c
(with a few unimportant local tweaks not suitable for
upstreaming but which I'm happy to share with anyone asking)
which around that commit goes from 1295021 to 1317531 cycles
(101.74%) and one percent larger; 4008 to 4048 bytes.

Your suggestion to default to -fno-split-wide-types seemed
too good to be true, and though worth a try, unfortunately
it was.  I'm seeing *horrible* regressions for
double-register codes with the patch below on top of LRA.
Coremark numbers suffer too (different baseline here than
above; closer to today's sources) from 5078989 to 5081968
cycles and from 48537 to 50145 bytes.

But, arith-rand-ll suffers much more: from 1317530 to
2182080 cycles (yes, 165.62%) and from 4044 to 4174 bytes.
(With reload, it's bad too, but "only" regressing 143.67% by
speed.)

Next, I'll turn around completely, and try defaulting to
-fsplit-wide-types-early, which sounds more promising. :)
I don't like throwing defaults around randomly, but trying
out a promising idea this way is easy.

So because of the numbers above, this will never be
committed, just passed for reference.  I believe this is the
correct way to default to -fno-split-wide-types:

-- >8 --
[PATCH] CRIS: Default to -fno-split-wide-types

	* common/config/cris/cris-common.cc (cris_option_optimization_table):
	New.  Default to -fno-split-wide-types.
	(TARGET_OPTION_OPTIMIZATION_TABLE): Define.
---
 gcc/common/config/cris/cris-common.cc | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/gcc/common/config/cris/cris-common.cc b/gcc/common/config/cris/cris-common.cc
index b08d6014102d..cf00c1414651 100644
--- a/gcc/common/config/cris/cris-common.cc
+++ b/gcc/common/config/cris/cris-common.cc
@@ -26,6 +26,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "flags.h"
 
+/* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
+
+static const struct default_options cris_option_optimization_table[] =
+  {
+    { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 0 },
+    { OPT_LEVELS_NONE, 0, NULL, 0 }
+  };
+
 /* TARGET_HANDLE_OPTION worker.  We just store the values into local
    variables here.  Checks for correct semantics are in
    cris_option_override.  */
@@ -90,5 +98,7 @@ cris_handle_option (struct gcc_options *opts,
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT | CRIS_SUBTARGET_DEFAULT)
 #undef TARGET_HANDLE_OPTION
 #define TARGET_HANDLE_OPTION cris_handle_option
+#undef TARGET_OPTION_OPTIMIZATION_TABLE
+#define TARGET_OPTION_OPTIMIZATION_TABLE cris_option_optimization_table
 
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-11 15:05 ` Hans-Peter Nilsson
@ 2023-05-11 16:55   ` Paul Koning
  2023-05-11 18:15     ` Jeff Law
  2023-05-12 13:53   ` Hans-Peter Nilsson
  1 sibling, 1 reply; 19+ messages in thread
From: Paul Koning @ 2023-05-11 16:55 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Roger Sayle, jeffreyalaw, gcc-patches, segher



> On May 11, 2023, at 11:05 AM, Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> ...
> Yes, very interesting.  Thank you for sharing this.  I've
> seen regressions with LRA for CRIS too, for
> "double-register-sized" types, which for CRIS, a 32-bit
> target, translates to 64-bit types (DFmode and DImode), and
> where LRA does a much worse job than reload; spills a lot
> more often to stack, even after trying every
> register-allocation-related hook I found (and also an LRA
> patch which helped only by a fraction, but regressed results
> on x86_64-linux, so let's quickly forget it again).

That observation makes me a bit worried.  While CRIS may not be a priority platform, that description makes it sound like a case that would be significant in any 32 bit platform, which would include priority ones like i386 and ARM.

If that's true, I wonder about dropping Reload.  While I understand it's been years since LRA was first introduced, wouldn't we even so want to go by the rule that a newer replacement mechanism doesn't replace an older one  until the replacement demonstrates comparable or better output compared with the older one?

	paul



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-11 16:55   ` Paul Koning
@ 2023-05-11 18:15     ` Jeff Law
  2023-05-11 18:32       ` Hans-Peter Nilsson
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Law @ 2023-05-11 18:15 UTC (permalink / raw)
  To: Paul Koning, Hans-Peter Nilsson; +Cc: Roger Sayle, gcc-patches, segher



On 5/11/23 10:55, Paul Koning wrote:
> 
> 
>> On May 11, 2023, at 11:05 AM, Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>
>> ...
>> Yes, very interesting.  Thank you for sharing this.  I've
>> seen regressions with LRA for CRIS too, for
>> "double-register-sized" types, which for CRIS, a 32-bit
>> target, translates to 64-bit types (DFmode and DImode), and
>> where LRA does a much worse job than reload; spills a lot
>> more often to stack, even after trying every
>> register-allocation-related hook I found (and also an LRA
>> patch which helped only by a fraction, but regressed results
>> on x86_64-linux, so let's quickly forget it again).
> 
> That observation makes me a bit worried.  While CRIS may not be a priority platform, that description makes it sound like a case that would be significant in any 32 bit platform, which would include priority ones like i386 and ARM.
If I understood things correctly, it seems to impact more when the 
target exposes double-word patterns but doesn't actually have 
instructions for those operations.  That's an implementation pattern 
we've largely been moving away from over the last decade or so.

Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-11 18:15     ` Jeff Law
@ 2023-05-11 18:32       ` Hans-Peter Nilsson
  0 siblings, 0 replies; 19+ messages in thread
From: Hans-Peter Nilsson @ 2023-05-11 18:32 UTC (permalink / raw)
  To: Jeff Law; +Cc: paulkoning, roger, gcc-patches, segher

> Date: Thu, 11 May 2023 12:15:20 -0600
> From: Jeff Law <jeffreyalaw@gmail.com>

> On 5/11/23 10:55, Paul Koning wrote:
> > 
> > 
> >> On May 11, 2023, at 11:05 AM, Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> ...
> >> Yes, very interesting.  Thank you for sharing this.  I've
> >> seen regressions with LRA for CRIS too, for
> >> "double-register-sized" types, which for CRIS, a 32-bit
> >> target, translates to 64-bit types (DFmode and DImode), and
> >> where LRA does a much worse job than reload; spills a lot
> >> more often to stack, even after trying every
> >> register-allocation-related hook I found (and also an LRA
> >> patch which helped only by a fraction, but regressed results
> >> on x86_64-linux, so let's quickly forget it again).
> > 
> > That observation makes me a bit worried.  While CRIS may not be a priority platform, that description makes it sound like a case that would be significant in any 32 bit platform, which would include priority ones like i386 and ARM.
> If I understood things correctly, it seems to impact more when the 
> target exposes double-word patterns but doesn't actually have 
> instructions for those operations.  That's an implementation pattern 
> we've largely been moving away from over the last decade or so.

That description doesn't really match CRIS though.  The "ax"
prefix used in DImode patterns links the next instruction to
include the carry.  Thus better than an "open-coded"
version.  In comparison, CRIS doesn't define separable
patterns (anddi3, iordi3 etc.)

But, there's a movdi expander and splitter - with a long
reload-related comment.  Perhaps I can do away with that
even though having some arithmetic and compares in DImode.
Thanks for the hint.

brgds, H-P

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-11 15:05 ` Hans-Peter Nilsson
  2023-05-11 16:55   ` Paul Koning
@ 2023-05-12 13:53   ` Hans-Peter Nilsson
  2023-05-12 14:01     ` Hans-Peter Nilsson
  2023-05-12 14:04     ` Roger Sayle
  1 sibling, 2 replies; 19+ messages in thread
From: Hans-Peter Nilsson @ 2023-05-12 13:53 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: roger, jeffreyalaw, gcc-patches, segher

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Thu, 11 May 2023 17:05:40 +0200

> Next, I'll turn around completely, and try defaulting to
> -fsplit-wide-types-early, which sounds more promising. :)
> I don't like throwing defaults around randomly, but trying
> out a promising idea this way is easy.

Absolutely nothing changed (not counting now running
"subreg2" and generating a dump-file), compared to the
default.  Besides coremark and local micro-benchmarks I
inspected running arith-rand-ll.c with -O2 and briefly
stepped through the passes with gdb: the costs guiding the
splits are fine, properly enabling the splits, but not all
DImode registers are naturally "splittable"; looks like the
ones used in non-decomposable operations remain.  It seems
all splittable opportunities are dealt with by the first
pass ("subreg1").  I guess this pass has the most impact for
targets that have few or no DImode operations at all.

But why is the option called -fsplit-wide-types-early when
what it does is enabling a "subreg2" pass, there being
"subreg1" and "subreg3" enabled with -fsplit-wide-types?  It
should rather be called -fsplit-wide-types-second! :)

Looking at its placement in passes.def makes me wonder what
magic properties targets have that benefit from it.

Anyway, Roger mentioned that the clobbers emitted by the
lower-subreg passes were apparently damaging, so I'll try
this out "for fun", on the assumption that they're actually
unnecessary.  I don't think actually removing them has been
attempted?

The patch below seems to substantially lower register
pressure for arith-rand-ll for CRIS, but I've only inspected
the assembly source (not even compared the result to the
reload version).  Quoting it for reference only, and if it
"works" (passes regtest for cris-elf and x86-64-linux) I
think I'll resubmit as a proper patch:

--- lower-subreg.cc.orig	2023-04-29 02:53:39.000000000 +0200
+++ lower-subreg.cc	2023-05-12 15:35:25.574668930 +0200
@@ -1086,9 +1086,6 @@ resolve_simple_move (rtx set, rtx_insn *
     {
       unsigned int i;
 
-      if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
-	emit_clobber (dest);
-
       for (i = 0; i < words; ++i)
 	{
 	  rtx t = simplify_gen_subreg_concatn (word_mode, dest,

brgds, H-P

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-12 13:53   ` Hans-Peter Nilsson
@ 2023-05-12 14:01     ` Hans-Peter Nilsson
  2023-05-12 14:04     ` Roger Sayle
  1 sibling, 0 replies; 19+ messages in thread
From: Hans-Peter Nilsson @ 2023-05-12 14:01 UTC (permalink / raw)
  To: roger; +Cc: jeffreyalaw, gcc-patches, segher

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Fri, 12 May 2023 15:53:49 +0200

> Anyway, Roger mentioned that the clobbers emitted by the
> lower-subreg passes were apparently damaging, so I'll try
> this out "for fun", on the assumption that they're actually
> unnecessary.  I don't think actually removing them has been
> attempted?

> --- lower-subreg.cc.orig	2023-04-29 02:53:39.000000000 +0200
> +++ lower-subreg.cc	2023-05-12 15:35:25.574668930 +0200

Bah, then I noticed r14-554-gd8a6945c6ea22e, committed
several days ago.  I should have checked up-to-date sources...
Thanks Roger!

Now off to measure the impact.  Maybe up to par with reload
now? :)

brgds, H-P

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [committed] Convert xstormy16 to LRA
  2023-05-12 13:53   ` Hans-Peter Nilsson
  2023-05-12 14:01     ` Hans-Peter Nilsson
@ 2023-05-12 14:04     ` Roger Sayle
  2023-05-13  0:56       ` Hans-Peter Nilsson
  1 sibling, 1 reply; 19+ messages in thread
From: Roger Sayle @ 2023-05-12 14:04 UTC (permalink / raw)
  To: 'Hans-Peter Nilsson'; +Cc: jeffreyalaw, gcc-patches, segher


Hi H-P,
This patch should now already be on trunk:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2
b27953c8cd
Many thanks to Jeff for the review/approval.
There have been no reported adverse effects so far.
Please let me/us know if this has helped CRIS.

Cheers,
Roger
--

-----Original Message-----
From: Hans-Peter Nilsson <hp@axis.com> 
Sent: 12 May 2023 14:54
To: Hans-Peter Nilsson <hp@axis.com>
Cc: roger@nextmovesoftware.com; jeffreyalaw@gmail.com;
gcc-patches@gcc.gnu.org; segher@kernel.crashing.org
Subject: Re: [committed] Convert xstormy16 to LRA

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Thu, 11 May 2023 17:05:40 +0200

> Next, I'll turn around completely, and try defaulting to 
> -fsplit-wide-types-early, which sounds more promising. :) I don't like 
> throwing defaults around randomly, but trying out a promising idea 
> this way is easy.

Absolutely nothing changed (not counting now running "subreg2" and
generating a dump-file), compared to the default.  Besides coremark and
local micro-benchmarks I inspected running arith-rand-ll.c with -O2 and
briefly stepped through the passes with gdb: the costs guiding the splits
are fine, properly enabling the splits, but not all DImode registers are
naturally "splittable"; looks like the ones used in non-decomposable
operations remain.  It seems all splittable opportunities are dealt with by
the first pass ("subreg1").  I guess this pass has the most impact for
targets that have few or no DImode operations at all.

But why is the option called -fsplit-wide-types-early when what it does is
enabling a "subreg2" pass, there being "subreg1" and "subreg3" enabled with
-fsplit-wide-types?  It should rather be called -fsplit-wide-types-second!
:)

Looking at its placement in passes.def makes me wonder what magic properties
targets have that benefit from it.

Anyway, Roger mentioned that the clobbers emitted by the lower-subreg passes
were apparently damaging, so I'll try this out "for fun", on the assumption
that they're actually unnecessary.  I don't think actually removing them has
been attempted?

The patch below seems to substantially lower register pressure for
arith-rand-ll for CRIS, but I've only inspected the assembly source (not
even compared the result to the reload version).  Quoting it for reference
only, and if it "works" (passes regtest for cris-elf and x86-64-linux) I
think I'll resubmit as a proper patch:

--- lower-subreg.cc.orig	2023-04-29 02:53:39.000000000 +0200
+++ lower-subreg.cc	2023-05-12 15:35:25.574668930 +0200
@@ -1086,9 +1086,6 @@ resolve_simple_move (rtx set, rtx_insn *
     {
       unsigned int i;
 
-      if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
-	emit_clobber (dest);
-
       for (i = 0; i < words; ++i)
 	{
 	  rtx t = simplify_gen_subreg_concatn (word_mode, dest,

brgds, H-P


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-12 14:04     ` Roger Sayle
@ 2023-05-13  0:56       ` Hans-Peter Nilsson
  2023-05-13  1:11         ` Hans-Peter Nilsson
  0 siblings, 1 reply; 19+ messages in thread
From: Hans-Peter Nilsson @ 2023-05-13  0:56 UTC (permalink / raw)
  To: Roger Sayle; +Cc: jeffreyalaw, gcc-patches, segher, vmakarov

> From: "Roger Sayle" <roger@nextmovesoftware.com>
> Date: Fri, 12 May 2023 15:04:03 +0100

> Hi H-P,
> This patch should now already be on trunk:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2
> b27953c8cd
> Many thanks to Jeff for the review/approval.
> There have been no reported adverse effects so far.
> Please let me/us know if this has helped CRIS.

TL;DR:
It sure helped for the big offender; arith-rand-ll with LRA
is now 1.3% faster!  Not everything is rosy though.

It's not a complete win for CRIS with LRA, as other of my
mini-regression tests (like, a trivial loop with 0<=i<=10
fprintf (f, "%s: %d\n", "Hello, world", i); with newlib),
where it got 0.04% larger.  On average though, 0.5% smaller.
Actually, it's more accurate to state it in terms of the
code that changed: newlib's _vfprintf_r improved slightly
performance-wise on some paths (1 cycle per call), but
regressed on others (7 cycles per call).  It got 0.2%
bigger.  Libgcc's __moddi3 regressed slightly (1 cycle per
11 calls) but stayed the same size.  The main function in
arith-rand-ll (where I guess all small functions got
inlined) improved 3% by size, performance as above.

Those clobbers must have helped reload, because with reload
there was a 0.15% performance regression for arith-rand-ll
(and 0.2% by size because main got 1.6% larger) with this
patch.  But size was smaller or equal for all other tests
using reload with this patch, if only by 0.04% on average.
 Or in other numbers: _vfprintf_r got 4 cycles faster per
call on some paths and 2 on others.  It got 0.05% bigger.  A
similar function, __vfiprintf_r, shrank by 0.5%.  Go figure.
For all but arith-rand-ll, this patch was a wash for reload.
 (Yes: the case with reload is now artificial; for that, I
was using master before Jeff's commit and added my own
commits since then, to keep track of my fight against LRA
regressions; the current score is that code is still 0.34%
slower and 0.17% larger with this applied just for LRA.)

Can we ask Vladimir for a statement; perhaps LRA could just
have handled those clobbers better (referring to the commit)?

brgds, H-P

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [committed] Convert xstormy16 to LRA
  2023-05-13  0:56       ` Hans-Peter Nilsson
@ 2023-05-13  1:11         ` Hans-Peter Nilsson
  0 siblings, 0 replies; 19+ messages in thread
From: Hans-Peter Nilsson @ 2023-05-13  1:11 UTC (permalink / raw)
  To: roger; +Cc: jeffreyalaw, gcc-patches, segher, vmakarov

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Sat, 13 May 2023 02:56:39 +0200
> 
> > From: "Roger Sayle" <roger@nextmovesoftware.com>
> > Date: Fri, 12 May 2023 15:04:03 +0100
> 
> > Hi H-P,
> > This patch should now already be on trunk:
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2
> > b27953c8cd
> > Many thanks to Jeff for the review/approval.
> > There have been no reported adverse effects so far.
> > Please let me/us know if this has helped CRIS.
> 
> TL;DR:
> It sure helped for the big offender; arith-rand-ll with LRA
> is now 1.3% faster!  Not everything is rosy though.

Coremark regressed slightly for LRA, cris-elf, -O2
-march=v10; with/without the patch: 100.0014% by speed,
100.037% by size.

brgds, H-P


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [committed] Convert xstormy16 to LRA
@ 2023-05-01 13:42 Jeff Law
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Law @ 2023-05-01 13:42 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 216 bytes --]

This patch converts the xstormy16 patch to LRA.  It introduces a code 
quality regression in the shiftsi testcase, but it also fixes numerous 
aborts/errors.  IMHO it's a good tradeoff.

Committed to the trunk,
Jeff

[-- Attachment #2: P --]
[-- Type: text/plain, Size: 1045 bytes --]

commit 977a3be3ccbc7f177316b2b349523023cac37bcd
Author: Jeff Law <jlaw@ventanamicro>
Date:   Mon May 1 07:40:38 2023 -0600

    Convert xstormy16 to LRA
    
    This patch converts the xstormy16 patch to LRA.  It introduces a code
    quality regression in the shiftsi testcase, but it also fixes numerous
    aborts/errors.  IMHO it's a good tradeoff.
    
    gcc/
    
            * config/stormy16/stormy16.cc (TARGET_LRA_P): Remove defintion.

diff --git a/gcc/config/stormy16/stormy16.cc b/gcc/config/stormy16/stormy16.cc
index 98f87fa8251..81b32dbc958 100644
--- a/gcc/config/stormy16/stormy16.cc
+++ b/gcc/config/stormy16/stormy16.cc
@@ -2894,9 +2894,6 @@ xstormy16_push_rounding (poly_int64 bytes)
 #undef  TARGET_PREFERRED_OUTPUT_RELOAD_CLASS
 #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS xstormy16_preferred_reload_class
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P	xstormy16_legitimate_address_p
 #undef TARGET_MODE_DEPENDENT_ADDRESS_P

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-13  1:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-01 23:37 [committed] Convert xstormy16 to LRA Roger Sayle
2023-05-02 12:40 ` Paul Koning
2023-05-02 13:18   ` Roger Sayle
2023-05-02 13:49     ` Segher Boessenkool
2023-05-02 16:20       ` Roger Sayle
2023-05-02 21:55         ` Segher Boessenkool
2023-05-02 14:11     ` Paul Koning
2023-05-02 15:47       ` Segher Boessenkool
2023-05-02 15:57 ` Jeff Law
2023-05-11 15:05 ` Hans-Peter Nilsson
2023-05-11 16:55   ` Paul Koning
2023-05-11 18:15     ` Jeff Law
2023-05-11 18:32       ` Hans-Peter Nilsson
2023-05-12 13:53   ` Hans-Peter Nilsson
2023-05-12 14:01     ` Hans-Peter Nilsson
2023-05-12 14:04     ` Roger Sayle
2023-05-13  0:56       ` Hans-Peter Nilsson
2023-05-13  1:11         ` Hans-Peter Nilsson
  -- strict thread matches above, loose matches on Subject: below --
2023-05-01 13:42 Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).