From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 5E49E385843E for ; Fri, 8 Jul 2022 07:15:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5E49E385843E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=UqjvU2ORSu33PHWpGmoY1gSt9a0PqS0cZGwWNQOn1ZA=; b=EkUAjWXDLAragHRgVVkaOCI3Qi Mofu+Y7Xu0GTglk+k69LiD4FkBx65eBXyxD5bvUdgHLgxoA1kEXyxZMZx9LmkXpfkGKAYdmqQnXRs ADe/uHnJXhZLVrfN9eFsudxeI/Ilob0btWXDttlMVqgMcOMQB6T27nCQc8oIUV91MoQXBWftLlMQo X7YaksDy1OzVt7HERdHi4dfPSkHoUAHH6/hWnVlcgztolWvU37xBR+cKrFMFEI2QMPXm1hewHa40w zQu7GiPr00inj9+trFWgO4JdTBh+kTL4VifLlAjm1mSJ8WcIvLdEfnbtcn9IZSKwyi+8pR2trigbJ hD93RrRg==; Received: from host86-130-134-60.range86-130.btcentralplus.com ([86.130.134.60]:57517 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o9iCd-0003sC-F3; Fri, 08 Jul 2022 03:15:03 -0400 From: "Roger Sayle" To: Cc: "'Segher Boessenkool'" , "'Uros Bizjak'" Subject: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md. Date: Fri, 8 Jul 2022 08:14:58 +0100 Message-ID: <003b01d8929a$70be2cc0$523a8640$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_003C_01D892A2.D28505C0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdiSmeKSuVR3bWRVR4CF9SMLDo6e3A== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MEDICAL_SUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2022 07:15:06 -0000 This is a multipart message in MIME format. ------=_NextPart_000_003C_01D892A2.D28505C0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch adds support for x86's single-byte encoded stc (set carry flag) and clc (clear carry flag) instructions to i386.md. The motivating example is the simple code snippet: unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) { return __builtin_ia32_addcarryx_u32 (1, a, b, c); } which uses the target built-in to generate an adc instruction, adding together A and B with the incoming carry flag already set. Currently for this mainline GCC generates (with -O2): movl $1, %eax addb $-1, %al adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret where the first two instructions (to load 1 into a byte register and then add 255 to it) are the idiom used to set the carry flag. This is a little inefficient as x86 has a "stc" instruction for precisely this purpose. With the attached patch we now generate: stc adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret The central part of the patch is the addition of x86_stc and x86_clc define_insns, represented as "(set (reg:CCC FLAGS_REG) (const_int 1))" and "(set (reg:CCC FLAGS_REG) (const_int 0))" respectively, then using x86_stc appropriately in the ix86_expand_builtin. Alas this change exposes two latent bugs/issues in the compiler. The first is that there are several peephole2s in i386.md that propagate the flags register, but take its mode from the SET_SRC rather than preserve the mode of the original SET_DEST. The other, which is being discussed with Segher, is that the middle-end's simplify-rtx inappropriately tries to interpret/optimize MODE_CC comparisons, converting the above adc into an add, as it mistakenly believes (ltu:SI (const_int 1) (const_int 0))" is always const0_rtx even when the mode of the comparison is MODE_CCC. I believe Segher will review (and hopefully approve) the middle-end chunk of this patch independently, but hopefully this backend patch provides the necessary context to explain why that change is needed. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2022-07-08 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_builtin) : Use new x86_stc or negqi_ccc_1 instructions to set the carry flag. * config/i386/i386.md (x86_clc): New define_insn. (x86_stc): Likewise, new define_insn to set the carry flag. (*setcc_qi_negqi_ccc_1_): New define_insn_and_split to recognize (and eliminate) the carry flag being copied to itself. (neg_ccc_1): Renamed from *neg_ccc_1 for gen function. (define_peephole2): Use match_operand of flags_reg_operand to capture and preserve the mode of FLAGS_REG. (define_peephole2): Likewise. * simplify-rtx.cc (simplify_const_relational_operation): Handle case where both operands of a MODE_CC comparison have been simplified to constant integers. gcc/testsuite/ChangeLog * gcc.target/i386/stc-1.c: New test case. Thanks in advance (both Uros and Segher), Roger -- > -----Original Message----- > From: Segher Boessenkool > Sent: 07 July 2022 23:39 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Be careful with MODE_CC in > simplify_const_relational_operation. > > Hi! > > On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote: > > I think it's fair to describe RTL's representation of condition flags > > using MODE_CC as a little counter-intuitive. > > "A little challenging", and you should see that as a good thing, as a puzzle to > crack :-) > > > For example, the i386 > > backend represents the carry flag (in adc instructions) using RTL of > > the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs to > > be taken not to treat this like a normal RTX expression, after all LTU > > (less-than-unsigned) against const0_rtx would normally always be > > false. > > A comparison of a MODE_CC thing against 0 means the result of a > *previous* comparison (or other cc setter) is looked at. Usually it simply looks > at some condition bits in a flags register. It does not do any actual comparison: > that has been done before (if at all even). > > > Hence, MODE_CC comparisons need to be treated with caution, and > > simplify_const_relational_operation returns early (to avoid > > problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC. > > Not just to avoid problems: there simply isn't enough information to do a > correct job. > > > However, consider the (currently) hypothetical situation, where the > > RTL optimizers determine that a previous instruction unconditionally > > sets or clears the carry flag, and this gets propagated by combine > > into the above expression, we'd end up with something that looks like > > (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says. > > Fortunately, simplify_const_relational_operation is passed the > > original mode of the comparison (cmp_mode, the original mode of op0) > > which can be checked for MODE_CC, even when op0 is now VOIDmode > > (const_int) after the substitution. Defending against this is clearly > > the right thing to do. > > > > More controversially, rather than just abort > > simplification/optimization in this case, we can use the comparison > > operator to infer/select the semantics of the CC_MODE flag. > > Hopefully, whenever a backend uses LTU, it represents the (set) carry > > flag (and behaves like i386.md), in which case the result of the simplified > expression is the first operand. > > [If there's no standardization of semantics across backends, then we > > should always just return 0; but then miss potential optimizations]. > > On PowerPC, ltu means the result of an unsigned comparison (we have > instructions for that, cmpl[wd][i] mainly) was "smaller than". It does not mean > anything is unsigned smaller than zero. It also has nothing to do with carries, > which are done via a different register (the XER). > > > + /* Handle MODE_CC comparisons that have been simplified to > > + constants. */ > > + if (GET_MODE_CLASS (mode) == MODE_CC > > + && op1 == const0_rtx > > + && CONST_INT_P (op0)) > > + { > > + /* LTU represents the carry flag. */ > > + if (code == LTU) > > + return op0 == const0_rtx ? const0_rtx : const_true_rtx; > > + return 0; > > + } > > + > > /* We can't simplify MODE_CC values since we don't know what the > > actual comparison is. */ > > ^^^ > This comment is 100% true. We cannot simplify any MODE_CC comparison > without having more context. The combiner does have that context when it > tries to combine the CC setter with the CC consumer, for example. > > Do you have some piece of motivating example code? > > > Segher ------=_NextPart_000_003C_01D892A2.D28505C0 Content-Type: text/plain; name="patchcy4.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchcy4.txt" diff --git a/gcc/config/i386/i386-expand.cc = b/gcc/config/i386/i386-expand.cc=0A= index 8bc5430..2b6d24d 100644=0A= --- a/gcc/config/i386/i386-expand.cc=0A= +++ b/gcc/config/i386/i386-expand.cc=0A= @@ -13531,8 +13531,6 @@ rdseed_step:=0A= arg3 =3D CALL_EXPR_ARG (exp, 3); /* unsigned int *sum_out. */=0A= =0A= op1 =3D expand_normal (arg0);=0A= - if (!integer_zerop (arg0))=0A= - op1 =3D copy_to_mode_reg (QImode, convert_to_mode (QImode, op1, 1));=0A= =0A= op2 =3D expand_normal (arg1);=0A= if (!register_operand (op2, mode0))=0A= @@ -13550,7 +13548,7 @@ rdseed_step:=0A= }=0A= =0A= op0 =3D gen_reg_rtx (mode0);=0A= - if (integer_zerop (arg0))=0A= + if (op1 =3D=3D const0_rtx)=0A= {=0A= /* If arg0 is 0, optimize right away into add or sub=0A= instruction that sets CCCmode flags. */=0A= @@ -13560,7 +13558,14 @@ rdseed_step:=0A= else=0A= {=0A= /* Generate CF from input operand. */=0A= - emit_insn (gen_addqi3_cconly_overflow (op1, constm1_rtx));=0A= + if (!CONST_INT_P (op1))=0A= + {=0A= + op1 =3D convert_to_mode (QImode, op1, 1);=0A= + op1 =3D copy_to_mode_reg (QImode, op1);=0A= + emit_insn (gen_negqi_ccc_1 (op1, op1));=0A= + }=0A= + else=0A= + emit_insn (gen_x86_stc ());=0A= =0A= /* Generate instruction that consumes CF. */=0A= op1 =3D gen_rtx_REG (CCCmode, FLAGS_REG);=0A= diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index 5b53841..1cc3989 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -1765,6 +1765,22 @@=0A= (set_attr "bdver1_decode" "direct")=0A= (set_attr "mode" "SI")])=0A= =0A= +(define_insn "x86_clc"=0A= + [(set (reg:CCC FLAGS_REG) (const_int 0))]=0A= + ""=0A= + "stc"=0A= + [(set_attr "length" "1")=0A= + (set_attr "length_immediate" "0")=0A= + (set_attr "modrm" "0")])=0A= +=0A= +(define_insn "x86_stc"=0A= + [(set (reg:CCC FLAGS_REG) (const_int 1))]=0A= + ""=0A= + "stc"=0A= + [(set_attr "length" "1")=0A= + (set_attr "length_immediate" "0")=0A= + (set_attr "modrm" "0")])=0A= +=0A= ;; Pentium Pro can do both steps in one go.=0A= ;; (these instructions set flags directly)=0A= =0A= @@ -7735,6 +7751,14 @@=0A= "#"=0A= "&& 1"=0A= [(const_int 0)])=0A= +=0A= +(define_insn_and_split "*setcc_qi_negqi_ccc_1_"=0A= + [(set (reg:CCC FLAGS_REG)=0A= + (ltu:CCC (reg:CC_CCC FLAGS_REG) (const_int 0)))]=0A= + "ix86_pre_reload_split ()"=0A= + "#"=0A= + "&& 1"=0A= + [(const_int 0)])=0A= =0C=0A= ;; Overflow setting add instructions=0A= =0A= @@ -11218,7 +11242,7 @@=0A= [(set_attr "type" "negnot")=0A= (set_attr "mode" "SI")])=0A= =0A= -(define_insn "*neg_ccc_1"=0A= +(define_insn "neg_ccc_1"=0A= [(set (reg:CCC FLAGS_REG)=0A= (ne:CCC=0A= (match_operand:SWI 1 "nonimmediate_operand" "0")=0A= @@ -15072,7 +15096,7 @@=0A= ;; Convert setcc + movzbl to xor + setcc if operands don't overlap.=0A= =0A= (define_peephole2=0A= - [(set (reg FLAGS_REG) (match_operand 0))=0A= + [(set (match_operand 4 "flags_reg_operand") (match_operand 0))=0A= (set (match_operand:QI 1 "register_operand")=0A= (match_operator:QI 2 "ix86_comparison_operator"=0A= [(reg FLAGS_REG) (const_int 0)]))=0A= @@ -15086,13 +15110,12 @@=0A= (set (strict_low_part (match_dup 5))=0A= (match_dup 2))]=0A= {=0A= - operands[4] =3D gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);=0A= operands[5] =3D gen_lowpart (QImode, operands[3]);=0A= ix86_expand_clear (operands[3]);=0A= })=0A= =0A= (define_peephole2=0A= - [(parallel [(set (reg FLAGS_REG) (match_operand 0))=0A= + [(parallel [(set (match_operand 5 "flags_reg_operand") (match_operand = 0))=0A= (match_operand 4)])=0A= (set (match_operand:QI 1 "register_operand")=0A= (match_operator:QI 2 "ix86_comparison_operator"=0A= @@ -15110,14 +15133,13 @@=0A= (set (strict_low_part (match_dup 6))=0A= (match_dup 2))]=0A= {=0A= - operands[5] =3D gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);=0A= operands[6] =3D gen_lowpart (QImode, operands[3]);=0A= ix86_expand_clear (operands[3]);=0A= })=0A= =0A= (define_peephole2=0A= - [(set (reg FLAGS_REG) (match_operand 0))=0A= - (parallel [(set (reg FLAGS_REG) (match_operand 1))=0A= + [(set (match_operand 6 "flags_reg_operand") (match_operand 0))=0A= + (parallel [(set (match_operand 7 "flags_reg_operand") (match_operand = 1))=0A= (match_operand 5)])=0A= (set (match_operand:QI 2 "register_operand")=0A= (match_operator:QI 3 "ix86_comparison_operator"=0A= @@ -15138,8 +15160,6 @@=0A= (set (strict_low_part (match_dup 8))=0A= (match_dup 3))]=0A= {=0A= - operands[6] =3D gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);=0A= - operands[7] =3D gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);=0A= operands[8] =3D gen_lowpart (QImode, operands[4]);=0A= ix86_expand_clear (operands[4]);=0A= })=0A= @@ -15147,7 +15167,7 @@=0A= ;; Similar, but match zero extend with andsi3.=0A= =0A= (define_peephole2=0A= - [(set (reg FLAGS_REG) (match_operand 0))=0A= + [(set (match_operand 4 "flags_reg_operand") (match_operand 0))=0A= (set (match_operand:QI 1 "register_operand")=0A= (match_operator:QI 2 "ix86_comparison_operator"=0A= [(reg FLAGS_REG) (const_int 0)]))=0A= @@ -15161,13 +15181,12 @@=0A= (set (strict_low_part (match_dup 5))=0A= (match_dup 2))]=0A= {=0A= - operands[4] =3D gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);=0A= operands[5] =3D gen_lowpart (QImode, operands[3]);=0A= ix86_expand_clear (operands[3]);=0A= })=0A= =0A= (define_peephole2=0A= - [(parallel [(set (reg FLAGS_REG) (match_operand 0))=0A= + [(parallel [(set (match_operand 5 "flags_reg_operand") (match_operand = 0))=0A= (match_operand 4)])=0A= (set (match_operand:QI 1 "register_operand")=0A= (match_operator:QI 2 "ix86_comparison_operator"=0A= @@ -15186,14 +15205,13 @@=0A= (set (strict_low_part (match_dup 6))=0A= (match_dup 2))]=0A= {=0A= - operands[5] =3D gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);=0A= operands[6] =3D gen_lowpart (QImode, operands[3]);=0A= ix86_expand_clear (operands[3]);=0A= })=0A= =0A= (define_peephole2=0A= - [(set (reg FLAGS_REG) (match_operand 0))=0A= - (parallel [(set (reg FLAGS_REG) (match_operand 1))=0A= + [(set (match_operand 6 "flags_reg_operand") (match_operand 0))=0A= + (parallel [(set (match_operand 7 "flags_reg_operand") (match_operand = 1))=0A= (match_operand 5)])=0A= (set (match_operand:QI 2 "register_operand")=0A= (match_operator:QI 3 "ix86_comparison_operator"=0A= @@ -15215,8 +15233,6 @@=0A= (set (strict_low_part (match_dup 8))=0A= (match_dup 3))]=0A= {=0A= - operands[6] =3D gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);=0A= - operands[7] =3D gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);=0A= operands[8] =3D gen_lowpart (QImode, operands[4]);=0A= ix86_expand_clear (operands[4]);=0A= })=0A= diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc=0A= index fa20665..73ec5c7 100644=0A= --- a/gcc/simplify-rtx.cc=0A= +++ b/gcc/simplify-rtx.cc=0A= @@ -6026,6 +6026,18 @@ simplify_const_relational_operation (enum = rtx_code code,=0A= return 0;=0A= }=0A= =0A= + /* Handle MODE_CC comparisons that have been simplified to=0A= + constants. */=0A= + if (GET_MODE_CLASS (mode) =3D=3D MODE_CC=0A= + && op1 =3D=3D const0_rtx=0A= + && CONST_INT_P (op0))=0A= + {=0A= + /* LTU represents the carry flag. */=0A= + if (code =3D=3D LTU)=0A= + return op0 =3D=3D const0_rtx ? const0_rtx : const_true_rtx;=0A= + return 0;=0A= + }=0A= +=0A= /* We can't simplify MODE_CC values since we don't know what the=0A= actual comparison is. */=0A= if (GET_MODE_CLASS (GET_MODE (op0)) =3D=3D MODE_CC)=0A= diff --git a/gcc/testsuite/gcc.target/i386/stc-1.c = b/gcc/testsuite/gcc.target/i386/stc-1.c=0A= new file mode 100644=0A= index 0000000..857c939=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/stc-1.c=0A= @@ -0,0 +1,21 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +typedef unsigned int u32;=0A= +=0A= +unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)=0A= +{=0A= + return __builtin_ia32_addcarryx_u32 (1, a, b, c);=0A= +}=0A= +=0A= +unsigned int bar (unsigned int b, unsigned int *c)=0A= +{=0A= + return __builtin_ia32_addcarryx_u32 (1, 2, b, c);=0A= +}=0A= +=0A= +unsigned int baz (unsigned int a, unsigned int *c)=0A= +{=0A= + return __builtin_ia32_addcarryx_u32 (1, a, 3, c);=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler "stc" } } */=0A= ------=_NextPart_000_003C_01D892A2.D28505C0--