From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4BDA03858D32 for ; Mon, 24 Apr 2023 08:33:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4BDA03858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C9867D75; Mon, 24 Apr 2023 01:34:20 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 35DF23F5A1; Mon, 24 Apr 2023 01:33:36 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,gcc-patches@gcc.gnu.org, nd@arm.com, richard.earnshaw@arm.com, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, nd@arm.com, richard.earnshaw@arm.com Subject: Re: [PATCH] RFC: New compact syntax for insn and insn_split in Machine Descriptions References: Date: Mon, 24 Apr 2023 09:33:34 +0100 In-Reply-To: (Richard Sandiford's message of "Fri, 21 Apr 2023 18:18:36 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-24.6 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Sandiford writes: > Tamar Christina writes: >> Hi All, >> >> This patch adds support for a compact syntax for specifying constraints in >> instruction patterns. Credit for the idea goes to Richard Earnshaw. >> >> I am sending up this RFC to get feedback for it's inclusion in GCC 14. >> With this new syntax we want a clean break from the current limitations to make >> something that is hopefully easier to use and maintain. >> >> The idea behind this compact syntax is that often times it's quite hard to >> correlate the entries in the constrains list, attributes and instruction lists. >> >> One has to count and this often is tedious. Additionally when changing a single >> line in the insn multiple lines in a diff change, making it harder to see what's >> going on. >> >> This new syntax takes into account many of the common things that are done in MD >> files. It's also worth saying that this version is intended to deal with the >> common case of a string based alternatives. For C chunks we have some ideas >> but those are not intended to be addressed here. >> >> It's easiest to explain with an example: >> >> normal syntax: >> >> (define_insn_and_split "*movsi_aarch64" >> [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m, r, r, r, w,r,w, w") >> (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))] >> "(register_operand (operands[0], SImode) >> || aarch64_reg_or_zero (operands[1], SImode))" >> "@ >> mov\\t%w0, %w1 >> mov\\t%w0, %w1 >> mov\\t%w0, %w1 >> mov\\t%w0, %1 >> # >> * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]); >> ldr\\t%w0, %1 >> ldr\\t%s0, %1 >> str\\t%w1, %0 >> str\\t%s1, %0 >> adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1] >> adr\\t%x0, %c1 >> adrp\\t%x0, %A1 >> fmov\\t%s0, %w1 >> fmov\\t%w0, %s1 >> fmov\\t%s0, %s1 >> * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);" >> "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode) >> && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))" >> [(const_int 0)] >> "{ >> aarch64_expand_mov_immediate (operands[0], operands[1]); >> DONE; >> }" >> ;; The "mov_imm" type for CNT is just a placeholder. >> [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4, >> load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move") >> (set_attr "arch" "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd") >> (set_attr "length" "4,4,4,4,*, 4,4, 4,4, 4,8,4,4, 4, 4, 4, 4") >> ] >> ) >> >> New syntax: >> >> (define_insn_and_split "*movsi_aarch64" >> [(set (match_operand:SI 0 "nonimmediate_operand") >> (match_operand:SI 1 "aarch64_mov_operand"))] >> "(register_operand (operands[0], SImode) >> || aarch64_reg_or_zero (operands[1], SImode))" >> "@@ (cons: 0 1; attrs: type arch length) >> [=r, r ; mov_reg , * , 4] mov\t%w0, %w1 >> [k , r ; mov_reg , * , 4] ^ >> [r , k ; mov_reg , * , 4] ^ >> [r , M ; mov_imm , * , 4] mov\t%w0, %1 >> [r , n ; mov_imm , * , *] # >> [r , Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ('cnt', '%x0', operands[1]); >> [r , m ; load_4 , * , 4] ldr\t%w0, %1 >> [w , m ; load_4 , fp , 4] ldr\t%s0, %1 >> [m , rZ ; store_4 , * , 4] str\t%w1, %0 >> [m , w ; store_4 , fp , 4] str\t%s1, %0 >> [r , Usw; load_4 , * , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1] >> [r , Usa; adr , * , 4] adr\t%x0, %c1 >> [r , Ush; adr , * , 4] adrp\t%x0, %A1 >> [w , rZ ; f_mcr , fp , 4] fmov\t%s0, %w1 >> [r , w ; f_mrc , fp , 4] fmov\t%w0, %s1 >> [w , w ; fmov , fp , 4] fmov\t%s0, %s1 >> [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);" >> "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode) >> && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))" >> [(const_int 0)] >> { >> aarch64_expand_mov_immediate (operands[0], operands[1]); >> DONE; >> } >> ;; The "mov_imm" type for CNT is just a placeholder. >> ) >> >> The patch contains some more rewritten examples for both Arm and AArch64. I >> have included them for examples in this RFC but the final version posted in >> GCC 14 will have these split out. >> >> The main syntax rules are as follows (See docs for full rules): >> - Template must start with "@@" to use the new syntax. >> - "@@" is followed by a layout in parentheses which is "cons:" followed by >> a list of match_operand/match_scratch IDs, then a semicolon, then the >> same for attributes ("attrs:"). Both sections are optional (so you can >> use only cons, or only attrs, or both), and cons must come before attrs >> if present. >> - Each alternative begins with any amount of whitespace. >> - Following the whitespace is a comma-separated list of constraints and/or >> attributes within brackets [], with sections separated by a semicolon. >> - Following the closing ']' is any amount of whitespace, and then the actual >> asm output. >> - Spaces are allowed in the list (they will simply be removed). >> - All alternatives should be specified: a blank list should be >> "[,,]", "[,,;,]" etc., not "[]" or "" (however genattr may segfault if >> you leave certain attributes empty, I have found). >> - The actual constraint string in the match_operand or match_scratch, and >> the attribute string in the set_attr, must be blank or an empty string >> (you can't combine the old and new syntaxes). >> - The common idion * return can be shortened by using <<. >> - Any unexpanded iterators left during processing will result in an error at >> compile time. If for some reason <> is needed in the output then these >> must be escaped using \. >> - Inside a @@ block '' is treated as "" when there are multiple characters >> inside the single quotes. This version does not handle multi byte literals >> like specifying characters as their numerical encoding, like \003 nor does >> it handle unicode, especially multibyte encodings. This feature may be more >> trouble than it's worth so have no finished it off, however this means one >> can use 'foo' instead of \"foo\" to denote a multicharacter string. >> - Inside an @@ block any unexpanded iterators will result in a compile time >> fault instead of incorrect assembly being generated at runtime. If the >> literal <> is needed in the output this needs to be escaped with \<\>. >> - This check is not performed inside C blocks (lines starting with *). >> - Instead of copying the previous instruction again in the next pattern, one >> can use ^ to refer to the previous asm string. > > Thanks for doing this. The new syntax seems like a clear improvement > for complex patterns like movs. > > Some comments/suggestions: > > - From a style perspective, out-of-order constraints should IMO be strongly > discouraged. The asm string uses %0, %1, %2 etc. to refer to operands, > and having that directly after a list that puts the constraints in > a different order (such as [%2, %0, %1]) would IMO be very confusing. > > I agree there might be cases where dropping constraints makes sense. > But I think in general we should encourage all constraints to be > specified, and be specified in order. And that's likely to be the > natural choice in an overwhelming majority of cases anyway. > > So how about having a simpler syntax for the first line when all > constraints are specified in order? Maybe just "cons" (without the > colon or numbers). Alternatively: leading "=" and "+" characters describe the operand as a whole, rather than individual alternatives. So maybe "=" and "+" should be in the "header" (the first line) rather than the first "row"/alternative. That would then be a justification for keeping the operand numbers even for the simple case. E.g.: cons: =0 1 Also, it would be good if the header and rows were consistent about whether they use comma separators or whitespace separators. At the moment, the headers are whitespace-separated while the rows are comma-separated. On that: it might be possible to use whitespace-separated columns even for constraints, since an empty constraint should be equivalent to X. That would look like: [r r ; mov_reg * 4] mov\t%w0, %w1 [k r ; mov_reg * 4] ^ [r k ; mov_reg * 4] ^ [r M ; mov_imm * 4] mov\t%w0, %1 [r n ; mov_imm * *] # [r Usv; mov_imm sve 4] << aarch64_output_sve_cnt_immediate ('cnt', '%x0', operands[1]); [r m ; load_4 * 4] ldr\t%w0, %1 [w m ; load_4 fp 4] ldr\t%s0, %1 [m rZ ; store_4 * 4] str\t%w1, %0 [m w ; store_4 fp 4] str\t%s1, %0 [r Usw; load_4 * 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1] [r Usa; adr * 4] adr\t%x0, %c1 [r Ush; adr * 4] adrp\t%x0, %A1 [w rZ ; f_mcr fp 4] fmov\t%s0, %w1 [r w ; f_mrc fp 4] fmov\t%w0, %s1 [w w ; fmov fp 4] fmov\t%s0, %s1 [w Ds ; neon_move simd 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);" I'm not sure whether that's better or worse though. Keeping commas is fine by me FWIW, as long as we do it in the header too. Thanks, Richard