From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=RMy4=AP=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 761C33858D32
	for <gcc-patches@gcc.gnu.org>; Mon, 24 Apr 2023 09:37:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 761C33858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA008D75;
	Mon, 24 Apr 2023 02:38:00 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 250FE3F5A1;
	Mon, 24 Apr 2023 02:37:16 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,"gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  nd <nd@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  nd <nd@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH] RFC: New compact syntax for insn and insn_split in Machine Descriptions
References: <patch-17151-tamar@arm.com> <mptttx96tn7.fsf@arm.com>
	<VI1PR08MB5325D6C8720DA268A76B1792FF679@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Mon, 24 Apr 2023 10:37:14 +0100
In-Reply-To: <VI1PR08MB5325D6C8720DA268A76B1792FF679@VI1PR08MB5325.eurprd08.prod.outlook.com>
	(Tamar Christina's message of "Mon, 24 Apr 2023 09:05:42 +0000")
Message-ID: <mpt8reh62ph.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-24.5 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Friday, April 21, 2023 6:19 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>
>> Subject: Re: [PATCH] RFC: New compact syntax for insn and insn_split in
>> Machine Descriptions
>> 
>> Tamar Christina <tamar.christina@arm.com> writes:
>> > Hi All,
>> >
>> > This patch adds support for a compact syntax for specifying
>> > constraints in instruction patterns. Credit for the idea goes to Richard
>> Earnshaw.
>> >
>> > I am sending up this RFC to get feedback for it's inclusion in GCC 14.
>> > With this new syntax we want a clean break from the current
>> > limitations to make something that is hopefully easier to use and maintain.
>> >
>> > The idea behind this compact syntax is that often times it's quite
>> > hard to correlate the entries in the constrains list, attributes and instruction
>> lists.
>> >
>> > One has to count and this often is tedious.  Additionally when
>> > changing a single line in the insn multiple lines in a diff change,
>> > making it harder to see what's going on.
>> >
>> > This new syntax takes into account many of the common things that are
>> done in MD
>> > files.   It's also worth saying that this version is intended to deal with the
>> > common case of a string based alternatives.   For C chunks we have some
>> ideas
>> > but those are not intended to be addressed here.
>> >
>> > It's easiest to explain with an example:
>> >
>> > normal syntax:
>> >
>> > (define_insn_and_split "*movsi_aarch64"
>> >   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,
>> r,  r,  r, w,r,w, w")
>> > 	(match_operand:SI 1 "aarch64_mov_operand"  "
>> r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
>> >   "(register_operand (operands[0], SImode)
>> >     || aarch64_reg_or_zero (operands[1], SImode))"
>> >   "@
>> >    mov\\t%w0, %w1
>> >    mov\\t%w0, %w1
>> >    mov\\t%w0, %w1
>> >    mov\\t%w0, %1
>> >    #
>> >    * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\",
>> operands[1]);
>> >    ldr\\t%w0, %1
>> >    ldr\\t%s0, %1
>> >    str\\t%w1, %0
>> >    str\\t%s1, %0
>> >    adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
>> >    adr\\t%x0, %c1
>> >    adrp\\t%x0, %A1
>> >    fmov\\t%s0, %w1
>> >    fmov\\t%w0, %s1
>> >    fmov\\t%s0, %s1
>> >    * return aarch64_output_scalar_simd_mov_immediate (operands[1],
>> SImode);"
>> >   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL
>> (operands[1]), SImode)
>> >     && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
>> >    [(const_int 0)]
>> >    "{
>> >        aarch64_expand_mov_immediate (operands[0], operands[1]);
>> >        DONE;
>> >     }"
>> >   ;; The "mov_imm" type for CNT is just a placeholder.
>> >   [(set_attr "type"
>> "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,
>> >
>> load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
>> >    (set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
>> >    (set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
>> > ]
>> > )
>> >
>> > New syntax:
>> >
>> > (define_insn_and_split "*movsi_aarch64"
>> >   [(set (match_operand:SI 0 "nonimmediate_operand")
>> > 	(match_operand:SI 1 "aarch64_mov_operand"))]
>> >   "(register_operand (operands[0], SImode)
>> >     || aarch64_reg_or_zero (operands[1], SImode))"
>> >   "@@ (cons: 0 1; attrs: type arch length)
>> >    [=r, r  ; mov_reg  , *   , 4] mov\t%w0, %w1
>> >    [k , r  ; mov_reg  , *   , 4] ^
>> >    [r , k  ; mov_reg  , *   , 4] ^
>> >    [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
>> >    [r , n  ; mov_imm  , *   , *] #
>> >    [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ('cnt',
>> '%x0', operands[1]);
>> >    [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
>> >    [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
>> >    [m , rZ ; store_4  , *   , 4] str\t%w1, %0
>> >    [m , w  ; store_4  , fp  , 4] str\t%s1, %0
>> >    [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
>> >    [r , Usa; adr      , *   , 4] adr\t%x0, %c1
>> >    [r , Ush; adr      , *   , 4] adrp\t%x0, %A1
>> >    [w , rZ ; f_mcr    , fp  , 4] fmov\t%s0, %w1
>> >    [r , w  ; f_mrc    , fp  , 4] fmov\t%w0, %s1
>> >    [w , w  ; fmov     , fp  , 4] fmov\t%s0, %s1
>> >    [w , Ds ; neon_move, simd, 4] <<
>> aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
>> >   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL
>> (operands[1]), SImode)
>> >     && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
>> >   [(const_int 0)]
>> >   {
>> >     aarch64_expand_mov_immediate (operands[0], operands[1]);
>> >     DONE;
>> >   }
>> >   ;; The "mov_imm" type for CNT is just a placeholder.
>> > )
>> >
>> > The patch contains some more rewritten examples for both Arm and
>> > AArch64.  I have included them for examples in this RFC but the final
>> > version posted in GCC 14 will have these split out.
>> >
>> > The main syntax rules are as follows (See docs for full rules):
>> >   - Template must start with "@@" to use the new syntax.
>> >   - "@@" is followed by a layout in parentheses which is "cons:" followed by
>> >     a list of match_operand/match_scratch IDs, then a semicolon, then the
>> >     same for attributes ("attrs:"). Both sections are optional (so you can
>> >     use only cons, or only attrs, or both), and cons must come before attrs
>> >     if present.
>> >   - Each alternative begins with any amount of whitespace.
>> >   - Following the whitespace is a comma-separated list of constraints and/or
>> >     attributes within brackets [], with sections separated by a semicolon.
>> >   - Following the closing ']' is any amount of whitespace, and then the actual
>> >     asm output.
>> >   - Spaces are allowed in the list (they will simply be removed).
>> >   - All alternatives should be specified: a blank list should be
>> >     "[,,]", "[,,;,]" etc., not "[]" or "" (however genattr may segfault if
>> >     you leave certain attributes empty, I have found).
>> >   - The actual constraint string in the match_operand or match_scratch, and
>> >     the attribute string in the set_attr, must be blank or an empty string
>> >     (you can't combine the old and new syntaxes).
>> >   - The common idion * return can be shortened by using <<.
>> >   - Any unexpanded iterators left during processing will result in an error at
>> >     compile time.   If for some reason <> is needed in the output then these
>> >     must be escaped using \.
>> >   - Inside a @@ block '' is treated as "" when there are multiple characters
>> >     inside the single quotes.  This version does not handle multi byte literals
>> >     like specifying characters as their numerical encoding, like \003 nor does
>> >     it handle unicode, especially multibyte encodings.  This feature may be
>> more
>> >     trouble than it's worth so have no finished it off, however this means one
>> >     can use 'foo' instead of \"foo\" to denote a multicharacter string.
>> >   - Inside an @@ block any unexpanded iterators will result in a compile time
>> >     fault instead of incorrect assembly being generated at runtime.  If the
>> >     literal <> is needed in the output this needs to be escaped with \<\>.
>> >   - This check is not performed inside C blocks (lines starting with *).
>> >   - Instead of copying the previous instruction again in the next pattern, one
>> >     can use ^ to refer to the previous asm string.
>> 
>> Thanks for doing this.  The new syntax seems like a clear improvement for
>> complex patterns like movs.
>> 
>> Some comments/suggestions:
>> 
>> - From a style perspective, out-of-order constraints should IMO be strongly
>>   discouraged.  The asm string uses %0, %1, %2 etc. to refer to operands,
>>   and having that directly after a list that puts the constraints in
>>   a different order (such as [%2, %0, %1]) would IMO be very confusing.
>> 
>>   I agree there might be cases where dropping constraints makes sense.
>>   But I think in general we should encourage all constraints to be
>>   specified, and be specified in order.  And that's likely to be the
>>   natural choice in an overwhelming majority of cases anyway.
>> 
>>   So how about having a simpler syntax for the first line when all
>>   constraints are specified in order?  Maybe just "cons" (without the
>>   colon or numbers).
>> 
>> - I'm not too keen on the '' thing.  It sounded from internal
>>   discussion like backslashes and quoting were a problem generally.
>> 
>>   Would it work to quote the new form in {@ ... } instead?  There should
>>   be no compatibility problem with that, since @ isn't a standard C++
>>   lexing token.
>
> Fair enough, did you mean {@<string>} or @'string' ? 

I meant quote that whole asm block in {@...} rather than "@@...".  I.e.:

   {@ [cons: =0, 1; attrs: type, arch, length]
    [r,  r  ; mov_reg  , *   , 4] mov\t%w0, %w1
    [k , r  ; mov_reg  , *   , 4] ^
    [r , k  ; mov_reg  , *   , 4] ^
    [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
    [r , n  ; mov_imm  , *   , *] #
    [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]);
    [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
    [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
    [m , rZ ; store_4  , *   , 4] str\t%w1, %0
    [m , w  ; store_4  , fp  , 4] str\t%s1, %0
    [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
    [r , Usa; adr      , *   , 4] adr\t%x0, %c1
    [r , Ush; adr      , *   , 4] adrp\t%x0, %A1
    [w , rZ ; f_mcr    , fp  , 4] fmov\t%s0, %w1
    [r , w  ; f_mrc    , fp  , 4] fmov\t%w0, %s1
    [w , w  ; fmov     , fp  , 4] fmov\t%s0, %s1
    [w , Ds ; neon_move, simd, 4] << arch64_output_scalar_simd_mov_immediate (operands[1], SImode);
   }

That will also help if we do want to support C++ code blocks in future.

Thanks,
Richard