public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Segher Boessenkool'" <segher@kernel.crashing.org>
Cc: "'Paul Koning'" <paulkoning@comcast.net>,
	"'Jeff Law'" <jeffreyalaw@gmail.com>,
	"'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: RE: [committed] Convert xstormy16 to LRA
Date: Tue, 2 May 2023 17:20:49 +0100	[thread overview]
Message-ID: <008101d97d12$12068200$36138600$@nextmovesoftware.com> (raw)
In-Reply-To: <20230502134928.GA19790@gate.crashing.org>


On 02 May 2023 14:49, Segher Boessenkool wrote:
> On Tue, May 02, 2023 at 02:18:43PM +0100, Roger Sayle wrote:
> > On 02 May 2023 13:40, Paul Koning wrote:
> > > > On May 1, 2023, at 7:37 PM, Roger Sayle
> > > > <roger@nextmovesoftware.com>
> > > wrote:
> > > > The shiftsi.cc regression on xstormy16 is fixed by adding
> > > > -fno-split-wide-types.
> > > > In fact, if all the regression tests pass, I'd suggest that
> > > > flag_split_wide-types = false should be the default on xstormy16
> > > > now that we've moved to LRA.  And if this works for xstormy16, it
> > > > might be useful to other targets for the LRA transition; it's a
> > > > difference in behaviour between reload and LRA that could
> > > > potentially affect multiple targets.
> > >
> > > Is there documentation for that flag?
> >
> > Yes, see the section -fsplit-wide-types in
> > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> >
> > Interestingly, there's a recent-ish blog describing how
> > -fno-split-wide-types reduces executable size on AVR:
> > https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-siz
> > e.html and its interaction with (AVR) register allocation is seen in
> > PR middle-end/35860.
> 
> But what causes the problem?  Is something missing somewhere, or do we get
too
> high register pressure?
> 
> There also is -fsplit-wide-types-early, which might help.
> 
> In principle it always is good to describe a machine model as close as
possible to
> what the machine actually does.  What gets in the way here?

I can describe what GCC is doing, but I don't claim to understand how the
register allocators make the choices they do (nor where the fix should be).

The problematic test case is:
unsigned long foo(unsigned long x) { return x << 1; }
which with (the old) reload (and -fno-split-wide-types) is generated as:
foo:    add r2,r2
        adc r3,r3
        ret
but with (the new) lra is generated as:
        mov r6,r2
        mov r7,r3
        mov r2,r6
        mov r3,r7
        add r2,r6
        adc r3,r7
        ret

where the register allocator has failed to notice that (take advantage of)
the SImode addition can be performed "in-place", from the insn's
constraints:

;; Addition
(define_insn_and_split "addsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (plus:SI (match_operand:SI 1 "register_operand" "%0")
                 (match_operand:SI 2 "nonmemory_operand" "ri")))
   (clobber (reg:BI CARRY_REG))]
  ""
  "#"
  "reload_completed"
  [(pc)]
  { xstormy16_expand_arith (SImode, PLUS, operands[0], operands[1],
                            operands[2]);
    DONE;
  }
  [(set_attr "length" "4")])


Before combine, we have simplicity itself (a two instruction function):
(insn 2 4 3 2 (set (reg/v:SI 26 [ x ])
        (reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 7 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
        (nil)))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

Then combine inserts an additional copy:
(insn 14 4 2 2 (set (reg:SI 29)
        (reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 -1
     (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
        (nil)))
(insn 2 14 3 2 (set (reg/v:SI 26 [ x ])
        (reg:SI 29)) "../../shiftsi.c":4:41 7 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 29)
        (nil)))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

The just before reload, subreg3 "splits the wide types", to produce:
(insn 15 4 16 2 (set (reg:HI 30)
        (reg:HI 2 r2 [ x ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 16 15 17 2 (set (reg:HI 31 [+2 ])
        (reg:HI 3 r3 [ x+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 17 16 18 2 (clobber (reg/v:SI 26 [ x ])) "../../shiftsi.c":4:41 -1
     (nil))
(insn 18 17 19 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 0)
        (reg:HI 30)) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 19 18 3 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 2)
        (reg:HI 31 [+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

Given this input, with the movhi_internals and the clobber, reload
was able to do something sensible, but lra fails to place pseudos 26,
28 in the same place, in pseudos 30 and 31, ideally hard regs 2 and 3.

With the -fno-split-wide-types workaround, lra/reload have the easier
task of assigning registers to the three insn case above.

I don't think it's a problem/fault with the machine description, but
how the clobber above is being interpreted by the different register
allocators.


Back in August 2022, I submitted a x86_64 backend patch that used a
peephole2 to eliminate this type of (subreg-lower) generated clobber:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599419.html
but Uros was hesitant to accept this change, without an RTL expert
explaining why the clobbers were being generated in the first place.

Like my x86_64 patch above, this issue could also probably be cleaned
up by a peephole2 for xstormy16 that straddled/removed the clobber.

My simplistic summary is that when a backend lowers a multi-word
instruction with a splitter, it (typically) does so without introducing
a clobber.  If the clobbers generated by the middle-end's automatic
lowering confuse the register allocator, then this is more efficient.
Ideally, such clobbers should be avoided (if possible) and/or LRA
improved to understand that they don't actually cause an allocno
conflict.  [but I'm in no way a register allocation expert].

Thoughts?



  reply	other threads:[~2023-05-02 16:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-01 23:37 Roger Sayle
2023-05-02 12:40 ` Paul Koning
2023-05-02 13:18   ` Roger Sayle
2023-05-02 13:49     ` Segher Boessenkool
2023-05-02 16:20       ` Roger Sayle [this message]
2023-05-02 21:55         ` Segher Boessenkool
2023-05-02 14:11     ` Paul Koning
2023-05-02 15:47       ` Segher Boessenkool
2023-05-02 15:57 ` Jeff Law
2023-05-11 15:05 ` Hans-Peter Nilsson
2023-05-11 16:55   ` Paul Koning
2023-05-11 18:15     ` Jeff Law
2023-05-11 18:32       ` Hans-Peter Nilsson
2023-05-12 13:53   ` Hans-Peter Nilsson
2023-05-12 14:01     ` Hans-Peter Nilsson
2023-05-12 14:04     ` Roger Sayle
2023-05-13  0:56       ` Hans-Peter Nilsson
2023-05-13  1:11         ` Hans-Peter Nilsson
  -- strict thread matches above, loose matches on Subject: below --
2023-05-01 13:42 Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='008101d97d12$12068200$36138600$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=paulkoning@comcast.net \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).