From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=5cLz=AX=nextmovesoftware.com=roger@sourceware.org>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69])
	by sourceware.org (Postfix) with ESMTPS id A62CB3858D1E
	for <gcc-patches@gcc.gnu.org>; Tue,  2 May 2023 16:20:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A62CB3858D1E
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type:
	MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:Cc:To:From:Sender
	:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:
	Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:
	List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
	bh=0jmHM7G5jc3STYJblkimVHd7RIopOc0/f7znmvodFCA=; b=GwwvNWbY2tj21B5AwULZWhv4UP
	KZXZfMmYh4LHzbgJ2KY1h2BLRw1ylMD5dK79j4UamnxJ3USxe0ypyvydUjRet9B3h9mMRyFOJQLU3
	F4DAGf401Q0Kd8QbhAUYgYaEDM1BpFnatNDAqtMWHsi+Em7eoCJ/ZeQBoSp8Ryfn++mZJQApT0hYa
	0SIobnd6VJUd9n/i9zMcie+yZxhg6Nk/g9vDjKvcvp3AuYSDPgoc4177T+dEcLbAw0nBcQNYWThCo
	fSTqiJs6kVmiaq3SslxC+vVy7BKfjiFwLkFqhdHW0p0qBjoHJhysGWUPpl4wJ90lxNz7Jc//LGPXD
	OYwwv70A==;
Received: from [185.62.158.67] (port=56139 helo=Dell)
	by server.nextmovesoftware.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.96)
	(envelope-from <roger@nextmovesoftware.com>)
	id 1ptskI-0001Eb-1M;
	Tue, 02 May 2023 12:20:54 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Segher Boessenkool'" <segher@kernel.crashing.org>
Cc: "'Paul Koning'" <paulkoning@comcast.net>,
	"'Jeff Law'" <jeffreyalaw@gmail.com>,
	"'GCC Patches'" <gcc-patches@gcc.gnu.org>
References: <009601d97c85$de708170$9b518450$@nextmovesoftware.com> <0FD01909-1F42-4E50-8D0D-2EDB80C34C0A@comcast.net> <021801d97cf8$9fc91770$df5b4650$@nextmovesoftware.com> <20230502134928.GA19790@gate.crashing.org>
In-Reply-To: <20230502134928.GA19790@gate.crashing.org>
Subject: RE: [committed] Convert xstormy16 to LRA
Date: Tue, 2 May 2023 17:20:49 +0100
Message-ID: <008101d97d12$12068200$36138600$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQG28Urg0PW1413oa1Hy1htPI2COKwI6TCJsATMouzMCUr310a9eMTeg
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 02 May 2023 14:49, Segher Boessenkool wrote:
> On Tue, May 02, 2023 at 02:18:43PM +0100, Roger Sayle wrote:
> > On 02 May 2023 13:40, Paul Koning wrote:
> > > > On May 1, 2023, at 7:37 PM, Roger Sayle
> > > > <roger@nextmovesoftware.com>
> > > wrote:
> > > > The shiftsi.cc regression on xstormy16 is fixed by adding
> > > > -fno-split-wide-types.
> > > > In fact, if all the regression tests pass, I'd suggest that
> > > > flag_split_wide-types = false should be the default on xstormy16
> > > > now that we've moved to LRA.  And if this works for xstormy16, it
> > > > might be useful to other targets for the LRA transition; it's a
> > > > difference in behaviour between reload and LRA that could
> > > > potentially affect multiple targets.
> > >
> > > Is there documentation for that flag?
> >
> > Yes, see the section -fsplit-wide-types in
> > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> >
> > Interestingly, there's a recent-ish blog describing how
> > -fno-split-wide-types reduces executable size on AVR:
> > https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-siz
> > e.html and its interaction with (AVR) register allocation is seen in
> > PR middle-end/35860.
> 
> But what causes the problem?  Is something missing somewhere, or do we get
too
> high register pressure?
> 
> There also is -fsplit-wide-types-early, which might help.
> 
> In principle it always is good to describe a machine model as close as
possible to
> what the machine actually does.  What gets in the way here?

I can describe what GCC is doing, but I don't claim to understand how the
register allocators make the choices they do (nor where the fix should be).

The problematic test case is:
unsigned long foo(unsigned long x) { return x << 1; }
which with (the old) reload (and -fno-split-wide-types) is generated as:
foo:    add r2,r2
        adc r3,r3
        ret
but with (the new) lra is generated as:
        mov r6,r2
        mov r7,r3
        mov r2,r6
        mov r3,r7
        add r2,r6
        adc r3,r7
        ret

where the register allocator has failed to notice that (take advantage of)
the SImode addition can be performed "in-place", from the insn's
constraints:

;; Addition
(define_insn_and_split "addsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (plus:SI (match_operand:SI 1 "register_operand" "%0")
                 (match_operand:SI 2 "nonmemory_operand" "ri")))
   (clobber (reg:BI CARRY_REG))]
  ""
  "#"
  "reload_completed"
  [(pc)]
  { xstormy16_expand_arith (SImode, PLUS, operands[0], operands[1],
                            operands[2]);
    DONE;
  }
  [(set_attr "length" "4")])


Before combine, we have simplicity itself (a two instruction function):
(insn 2 4 3 2 (set (reg/v:SI 26 [ x ])
        (reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 7 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
        (nil)))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

Then combine inserts an additional copy:
(insn 14 4 2 2 (set (reg:SI 29)
        (reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 -1
     (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
        (nil)))
(insn 2 14 3 2 (set (reg/v:SI 26 [ x ])
        (reg:SI 29)) "../../shiftsi.c":4:41 7 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 29)
        (nil)))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

The just before reload, subreg3 "splits the wide types", to produce:
(insn 15 4 16 2 (set (reg:HI 30)
        (reg:HI 2 r2 [ x ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 16 15 17 2 (set (reg:HI 31 [+2 ])
        (reg:HI 3 r3 [ x+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 17 16 18 2 (clobber (reg/v:SI 26 [ x ])) "../../shiftsi.c":4:41 -1
     (nil))
(insn 18 17 19 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 0)
        (reg:HI 30)) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 19 18 3 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 2)
        (reg:HI 31 [+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
     (nil))
(insn 6 3 11 2 (parallel [
            (set (reg:SI 28)
                (plus:SI (reg/v:SI 26 [ x ])
                    (reg/v:SI 26 [ x ])))
            (clobber (reg:BI 16 carry))
        ]) "../../shiftsi.c":4:52 35 {addsi3}
     (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
        (expr_list:REG_UNUSED (reg:BI 16 carry)
            (nil))))

Given this input, with the movhi_internals and the clobber, reload
was able to do something sensible, but lra fails to place pseudos 26,
28 in the same place, in pseudos 30 and 31, ideally hard regs 2 and 3.

With the -fno-split-wide-types workaround, lra/reload have the easier
task of assigning registers to the three insn case above.

I don't think it's a problem/fault with the machine description, but
how the clobber above is being interpreted by the different register
allocators.


Back in August 2022, I submitted a x86_64 backend patch that used a
peephole2 to eliminate this type of (subreg-lower) generated clobber:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599419.html
but Uros was hesitant to accept this change, without an RTL expert
explaining why the clobbers were being generated in the first place.

Like my x86_64 patch above, this issue could also probably be cleaned
up by a peephole2 for xstormy16 that straddled/removed the clobber.

My simplistic summary is that when a backend lowers a multi-word
instruction with a splitter, it (typically) does so without introducing
a clobber.  If the clobbers generated by the middle-end's automatic
lowering confuse the register allocator, then this is more efficient.
Ideally, such clobbers should be avoided (if possible) and/or LRA
improved to understand that they don't actually cause an allocno
conflict.  [but I'm in no way a register allocation expert].

Thoughts?